How to Carry Out Keyword Clustering via Serpstat

What is keyword clustering?
What is keyword clustering?
But, unfortunately, there are no open bases that contain this info. Even API Knowledge Graph cannot cope with this task. Thus, keyword clustering is carried out based on SERP results through comparing the search results for different keywords.
Downsides of using common keyword clustering algorithms
Downsides of using common keyword clustering algorithms
- Soft
- Moderate
- Hard
The hard one is used the most, so we'll focus on it. Here is how it works:

This algorithm has a significant disadvantage ─ clusters are formed by the minimum number of matches. To prove it I have an example of incorrect work of this algorithm. Let's take 3 keywords with сonnection strength 3 and here is what we get:

That's why I use my clustering algorithm based on keywords' сonnection strength depending on search results specifics.
How does my algorithm work?
How does my algorithm work?



Cluster 2 has only the main part because there is a maximum connection strength for the keyword #4 while keyword #5 has better connection strength with keyword #4, which already forms the basis of cluster #2.
I'll try to explain it by showing the weight of every URL in brackets.


By using connection strength during clustering not only the number of mutual URLs, but the features of search engines are taken into account. This allows getting more qualitative keywords' clusters. It will be useful for you while designing the site's structure, writing an article or working on PPC campaign.
This algorithm can be improved to make clustering even more accurate:
1. Decreasing the weight of the main pages
The weight of main pages is usually much higher than the weight of other ones because of its structure and number of links. Take top-1000 sites with the highest Serpstat's visibility and compare the number of keywords the main and other pages are ranking to see for yourself.
2. Decreasing the connection strength in case there are several pages of the same website in top 5.
If the niche leaders can move the different pages of their site to the top, the connection strength of these keywords is not so high.
Script based on Serpstat's database
Script based on Serpstat's database
You've already seen this script in my last article "Expired domains' Search: how to find drops and identify potential drops". I just added the clustering feature.

- Input is a phrase, a domain or a page for which the script will get phrases from Serpstat base;
- Input Type — here you select the input type the script will run with. It depends on what function of API Serpstat will be used;
- Search region is a search engine for which the analysis will be carried out. For example, for the US Google, you need to set the g_us. The entire list of available search engines can be found here;
- Search limits — the maximum number of phrases from the organic issue, which will participate in the analysis;
- Pagination Size — the parameter required for pagination when working with API Serpstat, because keywords, url_keywords, and domain_keywords functions may give a maximum of 1000 phrases. If you have a key limit of less than 1000, then it's better to use the same page size as the search limit;
- Max volume is a max frequency of phrases from both databases, which will participate in the analysis. If you want only LV keywords, you can set 20. For example, to search for blogs and satellites I set the maximum frequency of not more than 80;
- API token — here you need to enter your token for API access. It can be found on your profile page;
- Function — this script implements a number of functions.
○ Get list of domains. You may just copy this list and work with it as you want;
○ Find relevant forums slightly improved search engine of topical forums;
○ Clustering.
The clustering process takes quite a long time. That is one of the reasons why the results are not displayed in Google Sheets.
After a while you'll see the spreadsheet where the yellow lines stand for the clusters' additional parts.
Here you see the result for "Clash of Clans" keyword. If I were writing an article about the Clash of Clans game strategies, I would surely take into account that the keywords "strategy" and "tips" have a significant connection strength. Classical algorithms are unlikely to let you know this.

P. S.
P. S.
I don't claim that my algorithm and clustering script are perfect. But if you work with Serpstat's database often, it will help you to save time on processing the data manually. I hope this algorithm will be useful for you. If you have any questions, feel free ask them in comments.
Cases, lifehacks, researches and useful articles
Don’t you have time to follow the news? No worries!
Our editor Stacy will choose articles that will definitely help you with your work. Join our cozy community :)
By clicking the button, you agree to our privacy policy.