22727 2 2
SEO 10 min read

How to Carry Out Keyword Clustering via Serpstat

How to Carry Out Keyword Clustering via Serpstat 16261788067052
How to Carry Out Keyword Clustering via Serpstat 16261788067053
Dmitriy Mazuryan
SEO Specialist at Netpeak Agency
Keyword clustering is an essential part of creating a semantic core of the site. Doing such work manually via Microsoft Excel or Google Sheets takes a lot of time. In this article, I'll share my personal clustering algorithm that will help speed up the clustering process.

What is keyword clustering?

Keyword clustering is a practice SEO specialists use to segment target search terms into groups (clusters) relevant to each page of the website. The keywords should be grouped based on the properties of objects these keywords describe and the context of their use.

But, unfortunately, there are no open bases that contain this info. Even API Knowledge Graph cannot cope with this task. Thus, keyword clustering is carried out based on SERP results through comparing the search results for different keywords.

Downsides of using common keyword clustering algorithms

There are 3 main algorithms of keyword grouping:

  • Soft
  • Moderate
  • Hard

The hard one is used the most, so we'll focus on it. Here is how it works:
A minimum number of pairs for which the keywords can be combined into a group is set;
Keywords are sorted by frequency in descending order;
The keywords are compared starting with the most frequent one;
If the total number of URLs in search results is more or equal to the minimum, the phrases are paired.
Here is a visual representation of the algorithm's work:
How to Carry Out Keyword Clustering via Serpstat 16261788067053
For more information about keyword clustering and standard algorithms visit Wikipedia.

This algorithm has a significant disadvantage ─ clusters are formed by the minimum number of matches. To prove it I have an example of incorrect work of this algorithm. Let's take 3 keywords with сonnection strength 3 and here is what we get:
How to Carry Out Keyword Clustering via Serpstat 16261788067053
As you can see the keyword #1 and keyword #2 will be in the same cluster. While keyword #3 will be grouped with keyword #1 having no mutual URL with it. Or it will form the new cluster without keyword #2. Anyway, the clustering won't be precise.

That's why I use my clustering algorithm based on keywords' сonnection strength depending on search results specifics.

How does my algorithm work?

Every URL has its own weight depending on its position in SERP. The weight number are identical to those used by Serpstat when calculating CTR based on positions.
How to Carry Out Keyword Clustering via Serpstat 16261788067053
Keywords' сonnection strength is a sum of mutual URLs' weights. While mutual URLs' weight is a sum of URLs' weights of this cluster.
Each cluster has two parts: the main and the additional one. The main part is formed from the keywords with the maximum connection strength but more that 2.5. While the additional is formed from the keywords which connection strength is not a maximum one but is also more than 2.5.
This algorithm helps to carry out more accurate keyword clustering and understand the connection strength between each keyword of the cluster at the same time. As a result, we get connection strength matrix whereby keywords clusters will be formed. Here is an example of how such matrix looks like:
How to Carry Out Keyword Clustering via Serpstat 16261788067053
Based on this matrix we get two clusters where keyword #1 and keyword #3 form the basis:
How to Carry Out Keyword Clustering via Serpstat 16261788067053
Keyword #1 and keyword #2 form the basis of cluster's #1 main part because of the highest connection strength between them. While the additional part of this cluster includes keyword #4 because the connection strength between the keyword #1 and keyword #4 isn't the maximum one for keyword #4, but is more than 2,5.

Cluster 2 has only the main part because there is a maximum connection strength for the keyword #4 while keyword #5 has better connection strength with keyword #4, which already forms the basis of cluster #2.

I'll try to explain it by showing the weight of every URL in brackets.
How to Carry Out Keyword Clustering via Serpstat 16261788067054
In this case, the connection strengths matrix is the following:
How to Carry Out Keyword Clustering via Serpstat 16261788067054
Keyword #2 and keyword #3 form the cluster's basis but keyword #3 still enters the cluster's additional part with keyword #1.

By using connection strength during clustering not only the number of mutual URLs, but the features of search engines are taken into account. This allows getting more qualitative keywords' clusters. It will be useful for you while designing the site's structure, writing an article or working on PPC campaign.

This algorithm can be improved to make clustering even more accurate:

1. Decreasing the weight of the main pages

The weight of main pages is usually much higher than the weight of other ones because of its structure and number of links. Take top-1000 sites with the highest Serpstat's visibility and compare the number of keywords the main and other pages are ranking to see for yourself.

2. Decreasing the connection strength in case there are several pages of the same website in top 5.

If the niche leaders can move the different pages of their site to the top, the connection strength of these keywords is not so high.

Script based on Serpstat's database

Serpstat's database contains tens of millions of Google tops. I created a small script for keywords clustering based on this algorithm and API Serpstat.

You've already seen this script in my last article "Expired domains' Search: how to find drops and identify potential drops". I just added the clustering feature.
How to Carry Out Keyword Clustering via Serpstat 16261788067054
how to find drops and identify potential drops
How to Carry Out Keyword Clustering via Serpstat 16261788067054
  • Input is a phrase, a domain or a page for which the script will get phrases from Serpstat base;
  • Input Type — here you select the input type the script will run with. It depends on what function of API Serpstat will be used;
  • Search region is a search engine for which the analysis will be carried out. For example, for the US Google, you need to set the g_us. The entire list of available search engines can be found here;
  • Search limits — the maximum number of phrases from the organic issue, which will participate in the analysis;
  • Pagination Size — the parameter required for pagination when working with API Serpstat, because keywords, url_keywords, and domain_keywords functions may give a maximum of 1000 phrases. If you have a key limit of less than 1000, then it's better to use the same page size as the search limit;
  • Max volume is a max frequency of phrases from both databases, which will participate in the analysis. If you want only LV keywords, you can set 20. For example, to search for blogs and satellites I set the maximum frequency of not more than 80;
  • API token — here you need to enter your token for API access. It can be found on your profile page;
  • Function — this script implements a number of functions.
○ Find drops via WHOIS — unique domains table based on the Whois data;

○ Get list of domains. You may just copy this list and work with it as you want;

○ Find relevant forums slightly improved search engine of topical forums;

○ Clustering.

The clustering process takes quite a long time. That is one of the reasons why the results are not displayed in Google Sheets.

After a while you'll see the spreadsheet where the yellow lines stand for the clusters' additional parts.

Here you see the result for "Clash of Clans" keyword. If I were writing an article about the Clash of Clans game strategies, I would surely take into account that the keywords "strategy" and "tips" have a significant connection strength. Classical algorithms are unlikely to let you know this.
How to Carry Out Keyword Clustering via Serpstat 16261788067054
I prefer to run the keywords from Serpstat database through this script. If you have access to Serpstat's API and text analytics, you can do the same.

P. S.

As tradition requires, I'm sharing my scripts with you.
The online version is hosted on a regular weak server, and it won't cope with parsing large numbers (>10,000) of keywords. So I recommend downloading the source code and using it on your own service for more reliability.
Note: If script doesn't return any data you probably didn't fill the form correctly or your API token is inactive.

I don't claim that my algorithm and clustering script are perfect. But if you work with Serpstat's database often, it will help you to save time on processing the data manually. I hope this algorithm will be useful for you. If you have any questions, feel free ask them in comments.

Rate the article on a five-point scale

The article has already been rated by 4 people on average 4.6 out of 5
Found an error? Select it and press Ctrl + Enter to tell us

Share this article with your friends

Are you sure?

Introducing Serpstat

Find out about the main features of the service in a convenient way for you!

Please send a request, and our specialist will offer you education options: a personal demonstration, a trial period, or materials for self-study and increasing expertise — everything for a comfortable start to work with Serpstat.




We are glad of your comment
I agree to Serpstat`s Privacy Policy.

Thank you, we have saved your new mailing settings.

Report a bug


We use cookies to make Serpstat better. By clicking "Accept cookies", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. Learn more

Open support chat
mail pocket flipboard Messenger telegramm