Keyword clustering

Keyword clustering is the process of grouping a set of keywords in such a way that keywords in the same group (called a cluster) are more similar to each other than to those in other groups. The level of similarity/differentiation depends on the set parameters.  

Why do you need keyword clustering?
- grouping keywords by meaning to a page, section or site directory;
- optimizing your site with automatic analysis of the semantic core;
- creating clusters of the keywords for specific directories and page sections;
- creating a site structure by distributing keywords into clusters;
- expanding topics for which there are few keywords.

The most fundamental drawback of existing keyword clustering tools is that the resulting clusters may either contain keywords without a strong semantic similarity, or the analyzed data set produces too many clusters that could have been merged into larger clusters.
The first issue arises while using Weak
 cluster type, the second is caused by Strong type. Also, both these clustering types share a common drawback — scattering of clusters with a similar topic.

Unlike many competitors’ solutions, Serpstat clustering tool doesn’t set a cluster center — a keyword with the highest search volume which is compared with other keywords to detect the number of matching URLs in SERP, and Serpstat is looking for connections among all clustered keywords.

Let’s look in detail at the main settings of the tool.
In fact, there are only two of them: Strength and Cluster type.

- weak — tells the system that in order to be combined into a cluster, the keywords must have at least 3 common URLs in top-30 search results for a keyword
- medium  the keywords must have at least 8 common URLs in top-30 search results for a keyword
- strong — sets 12 common URLs as a condition for keywords merging into a single cluster.

The next clustering parameter choice is cluster type:
- soft cluster type — tells the system that a cluster can be created if at least one pair of keywords has 3, 8 or 12 common URLs in top-30 search results (depending on the previous weak/medium/strong choice).

Weak (soft) cluster type

hard cluster type — requires all keywords in a cluster to have 3, 8 or 12 common URLs in top-30 search results for a keyword (the requirement for the number of common keywords is defined on the previous step where you selected weak/medium/strong clustering). The resulting clusters contain synonymous keywords with a high semantic similarity. At the same time, this clustering method produces lots of clusters as the keywords can be merged into a cluster only if they are closely related.

Strong (hard) cluster type

Strength shows how closely a keyword is semantically related to the cluster’s topic on a scale from 0 to 100.
Upon clustering completion, a portion of the initial set of keywords can be seen in the "Unsorted keywords" directory. These are objects that haven’t been added into any cluster. One reason for this can be that the keywords have no semantic similarity to the topic of the analyzed keyword set and should be removed from the dataset. An alternative solution is to create separate pages for these keywords or move them to one of the created clusters if you believe they belong there.

Which clustering method is right for you?
The decision should be based on the semantic similarity of the objects from your dataset.
If the keywords are initially closely related, for example, sneakers of different brands, you may want to choose "Strong"+"Hard" or "Strong"+"Soft" so that only the closest synonyms are combined into a cluster. You’ll get lots of clusters to use for separate pages or specific categories.
In the case of various products and services, for example, a keyword collection for multi-product store or medical center with a full range of health-care services, it’s worth selecting "Weak"+"Soft". The choice of "Strong"+"Soft" will produce more clusters and a possibility to get more topic-specific clusters.

Setting up a clustering project
Go to the "Clustering, Text analysis" block in the side menu:

Click "Add a new project" button:

Creating a new clustering project

  1. Name your project and input a domain name (optional);
  2. Choose a search engine and country; region and town are optional (for Google);
  3. Choose strength, cluster type. (Learn more about the clustering tool in this article);
  4. Input a list of keywords or upload them as a file (csv, txt resolution). The size of the imported file must not exceed 30 mb. Also, while adding keywords in the clustering project, try not to use keys with special characters, or spaces, as when adding such keys the system can return an error to you.;
  5. Click on "Save" button to create your project and start clustering.

Serpstat will start clustering keywords, the process may take some time. The result of clustering will look like this:

The project of clustering

We received a Project page with a list of clusters  groups of semantically related keywords, where:
1. The number of clusters created in this project is displayed in brackets next to the name of the "Clusters" column.
2. Next, you will see the names of the clusters in the "Clusters" column. By pointing the cursor at the cluster and clicking on the pencil, you can change the name of the automatically created cluster.
3. The number in the "Keywords" column shows the number of keywords in each cluster.
4. The "Add cluster" button allows you to create a new cluster (enter a name and press Enter).
5. Unsorted keywords are pinned and they are always in the upper part (they are not involved in sorting clusters).

In the right window you will find a list of all keywords of the selected cluster and its internal parameters.

Indicators of the cluster

1. Volume — how often people search this keyword per month.
2. Connection strength — provides how close that keyword is to the cluster's topic on a scale from 0 to 100%. 
3. Homogeneity shows the semantic consistency of a cluster (from 0 to 100%).
4. If you specified a domain while creating a project, we'll look at your website's pages and display the page which is the closest to the cluster's topic in the URL field. If you didn't input a domain, you can add a URL manually when launching text analysis.
5. Metatop is a list of major competitors in SERP for keywords from a cluster. The higher a page’s rank in the meta-top, the more relevant it is to the cluster’s topic:

Metatop of the cluster

Each cluster has a drop-down menu:

Menu of cluster operations

1. The "Add keywords" button opens a window where you can add some keywords to the existing cluster.
2. Click "Delete keywords" button to delete checked keywords from the cluster.
3. The "Move to unsorted" button moves the selected keywords to the "Unsorted keywords" directory.
4. If you need to move seleсted keywords to another cluster click "Move keywords to..", and also you can move selected keywords to a new cluster  just create it here:

Adding a new cluster

5. The "Delete cluster" button deletes the cluster from your project.

In the right corner you can also see information on the credits left (1) for the Instruments (at the end of this guide you will find information about spending credits).

Available operations in the project

It is possible to search for keywords (2) you will see those keywords whose name matches or contains the entered word (regardless of case when searching, it does not matter where you use or not use capital letters).

Next there are buttons with the following functions (3):
- text analysis can be launched of any of the clusters. To do this, press the "TA" button, which will open a pop-up for selecting clusters;
- the "Refresh clustering" button restarts the current clustering results;
- open the "Project settings" to change the current settings, but after that you need to restart clustering for the project to get the updated data;
- export is available in various formats separately for the whole project or for the certain cluster (open the needed cluster and for the export select "Cluster"):

* CSV Open Office, Libre Office
* CSV Microsoft Excel (CP1251) — only for the whole cluster
* XLSX Microsoft Excel (up to 10K)
* XLS Microsoft Excel (up to 10K)

The following sorts in ascending/descending order are available in the report:
1) according to the project:
- name of the cluster (in alphabetical order);
- number of keywords in the cluster;
2) in the cluster:
- keywords (by name);
- volume;
- connection strength.

Spending credits
How the credits for Clustering are spent:
1 keyword = 5 credits.
Number of keywords * 5 = number of spent credits.

For example: When you add 20 keywords to the project, you spend 100 credits:
20 * 5 = 100 credits.

Please note: Credits for Clustering, Text analysis, Domains batch analysis and Keywords batch analysis are general.

If you have any questions about the report, there are 2 buttons in the upper right corner:

1. Send feedback — opens the support chat for sending suggestions for improving or reporting an issue.
2. Tutorial — the button will direct you to the tutorial for the report.

We recommend you familiarize with the article about clustering on our blog and also with the use cases.

If you still have any questions, you can go to our FAQ or contact the tech support chat.
If you'd like to get advice on Serpstat's features, order your free 30-minute demo.

Share this article with your friends

Are you sure?

Introducing Serpstat

Find out about the main features of the service in a convenient way for you!

Please send a request, and our specialist will offer you education options: a personal demonstration, a trial period, or materials for self-study and increasing expertise — everything for a comfortable start to work with Serpstat.




We are glad of your comment
I agree to Serpstat`s Privacy Policy.

Thank you, we have saved your new mailing settings.

Open support chat