Get access to 30+ marketing and SEO tools. analyze competitors, keywords, and backlinks for free..
Sign in Sign Up

We use cookies to make Serpstat better. By clicking "Accept cookies", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. Learn more

Keyword clustering and Text analysis

Check out a featured guide on the Keyword clustering and Text analysis tools. Keyword clustering and Text analysis consist of two tools:
1. Keyword clustering - a mass grouping of uploaded keywords based on their semantic similarity.
2. Text analysis - URL analysis and recommendations on the page SEO (under development currently).

Keyword clustering is the process of grouping a set of keywords in such a way that keywords in the same group (called a cluster) are more similar to each other than to those in other groups. The level of similarity/differentiation depends on the set parameters.  

Why do you need keyword clustering?
- Grouping of semantically related keywords;
- Reliable automatic analysis of a set of keywords;
- Collecting the right keywords for specific pages;
- Creating a site’s SEO architecture;
- Searching for keywords that are in no way related to the topics of clustered keywords.
The most fundamental drawback of existing keyword clustering tools is that the resulting clusters may either contain keywords without a strong semantic similarity, or the analyzed data set produces too many clusters that could have been merged into larger clusters (the first issue arises while using Soft clustering, the second is caused by Hard method). Also, both these clustering types share a common drawback — scattering of clusters with a similar topic.
Unlike many competitors’ solutions, Serpstat employs intelligent hierarchical clustering where clusters are combined in a supercluster. This being said, no preliminary data collecting like keyword search volumes required, you only need to upload a list of keywords and choose the region and clustering parameters. Serpstat clustering tool doesn’t set a cluster center (a keyword with the highest search volume which is compared with other keywords to detect the number of matching URLs in SERP) — Serpstat is looking for connections among all clustered keywords.

Let’s look in detail at the main settings of the tool.
In fact, there are only two of them: Weak/Strong and Soft/Hard.
Weak parameter tells the system that in order to be combined into a cluster, the keywords must have at least 3 common URLs in Top 30 search results for a keyword, while Strong sets 7 common URLs as a condition for keywords merging into a single cluster.
The next clustering parameter choice is Soft/Hard.
Soft tells the system that a cluster can be created if at least one pair of keywords has 3 or 7 common URLs in Top 30 search results (depending on the previous Weak/Strong choice).

Hard requires all keywords in a cluster to have 3 or 7 common URLs in top 30 search results for a keyword (the requirement for the number of common keywords is defined on the previous step where you selected Weak or Strong clustering). The resulting clusters contain synonymous keywords with a high semantic similarity. At the same time, this clustering method produces lots of clusters as the keywords can be merged into a cluster only if they are closely related.

Strength shows how closely a keyword is semantically related to the cluster’s topic on a scale from 0 to 1.
Upon clustering completion, a portion of the initial set of keywords can be seen in the Unsorted directory. These are objects that haven’t got to any cluster. One reason for this can be that the keywords have no semantic similarity to the topic of the analyzed keyword set and should be removed from the dataset. An alternative solution is to create separate pages for these keywords or move them to one of the created clusters if you believe they belong there.

Which clustering method is right for you?
The decision should be based on the semantic similarity of the objects from your dataset.
If the keywords are initially closely related, for example, sneakers of different brands, you may want to choose Strong+Hard or Strong+Soft so that only the closest synonyms are combined into a cluster. You’ll get lots of clusters to use for separate pages or specific categories.
In the case of various products and services, for example, a keyword collection for multi-product store or medical center with a full range of health-care services, it’s worth selecting Weak+Soft. The choice of Strong+Soft will produce more clusters and a possibility to get more topic-specific clusters.

Meta-top is a list of major competitors in SERP for keywords from a cluster. The higher a page’s rank in the meta-top, the more relevant it is to the cluster’s topic.

Setting up a clustering project
Go to the Tools section and open Keyword clustering and Text analysis

Click Create a project.

Name your project and input a domain name (optional).


Input a list of keywords or upload them from a file.

Choose a search engine and region.

Finally, choose Linkage strength, Type of grouping and click Finish.

The resulting clusters will look like this:

Where 3 is a cluster, 2 — supercluster, and 1 — protocluster.
Supercluster is a set of clusters. It combines keywords with a high semantic similarity score, but slightly less similar than keywords in a cluster.
Protocluster is a set of superclusters. Generally, protocluster is made up of superclusters related to a specific category of objects. For example, if you’re developing SEO architecture for a multi-product store, then one protocluster may contain superclusters associated with different types of refrigerators, and the other — microwave ovens of different brands. Protoclusters are designed to streamline the work with superclusters.

Here's the breakdown of the above figure:
1. Every keyword from a cluster has its connection strength. It provides a hint of how close that keyword is to the cluster's topic on a scale from 0 to 1. 
2. Homogeneity shows the semantic consistency of a cluster of a scale from 0 to 1.
3. If you specified a domain while creating a project, we'll look at your website's pages and display the page which is the closest to the cluster's topic in the URL field. If you didn't  input a domain, you can add a URL manually by clicking Add URLYou can launch Text analysis for any keyword cluster.
Each cluster has a drop-down menu:

1. Add keywords — opens a window where you can add some keywords to the existing cluster.
2. Toggle metatop — opens a list of direct competitors for keywords from the cluster. The higher a page is listed, the more relevant it's to the cluster's topic.
3. Search keywords — opens a search box where you can look for specific keywords in the cluster.
4. Delete keywords — deletes checked keywords from the cluster.
5. Delete group — deletes the cluster from your project.

Text analysis
Serpstat Text analysis (hereinafter TA) is designed to provide recommendations on how to improve your on-page SEO — what changes or amendments you need to make on your page to better optimize it for keywords from a cluster or what keywords you should insert into page contents if you’re doing a page SEO from scratch. It is available for the following languages: Russian, Ukrainian, English, German, Bulgarian.
TA analyzes the text on the landing page (if a URL has been specified), the list of keywords from a cluster and a set of pages from the Top 15 search results for keywords from the list. We assume that the search engine considers the text on those pages relevant to the researched search queries if the pages are displayed in Top 15 search results.
If a target URL is specified, the TA tool analyses text content of your page and suggests lexical items to be added to the page. The suggestions are based on the text content of top pages for keywords from the cluster.
If you didn’t specify the URL, recommendations are made upon researching the largest group of related competitors - in this case, Serpstat can’t know for sure that a proper group of competitor URLs has been selected; for example, we’re not able to identify properly if informational or commercial pages are your direct competitors, since the search results for keywords from the cluster may contain different types of pages, and the right group of pages can only be selected through analysis of text on your page. Also, the report won’t display your relevance to keywords in comparison with competitors’ relevance
Serpstat TA algorithms stand out for their ability to eliminate semantic noise and prevent distortion of text analytics results by irrelevant search queries or search results. Serpstat splits up a set of top pages for keywords from the cluster into groups based on their content: videos, informational or e-commerce pages, catalogs etc, and identifies which group your landing page belongs to. Upon that, TA analyzes the text content of a selected group of URLs and provides suggestions on what on-page text units can be added or modified to boost rankings.  

This intelligent selection of analysis objects allows avoiding inclusion of irrelevant URLs into the researched dataset. Filtering can even proceed to the level where pages that contain videos are included or excluded from the data set depending on whether our page’s main content is a video, for which we’re selecting a title, description, page Title, etc.). In contrast, other text analytics tools analyze the whole set of URLs from the SERP without paying attention to the page topic which impacts the recommendations validity in a negative way; imagine you’re researching the relevance of your e-commerce page to the keyword ‘buy best laptop’. The SERP for that keyword will most likely contain mixed results: e-commerce pages, informational posts, videos, etc. Serpstat TA will analyze your page text and omit irrelevant informational pages during text analysis. The resulting suggestions will be based only on e-commerce competitors.
Serpstat TA splits the text area of a page into three major parts: Title, H1, Body. SEO recommendations are also made for specific areas. Page relevance to the cluster keywords comes as a standalone metric for each of the researched keyword. In simplistic terms, TA analysis proceeds the following wa: first, Serpstat collects unique keywords from the pages’ respective areas (Title, H1, Body), then creates groups of pages based on their topical similarity, and lastly, provides recommendations for on-page SEO and page-to-keyword relevance scores.
In our TA, we abandoned a common practice of suggesting a number of keyword occurrences on a page in favor of a relative keyword importance score to a particular topic in percentage.
Choose a keyword cluster you’d like to analyze, input a URL and click Start analysis.

Upon completion, click See results.

Share this article with your friends

Sign In Free Sign Up

You’ve reached your query limit.

Or email
Forgot password?
Or email
Back To Login

Don’t worry! Just fill in your email and we’ll send over your password.

Are you sure?


To complete your registration you need to enter your phone number


We sent confirmation code to your phone number

Your phone Resend code Queries left

Something went wrong.

Contact our support team
Or confirm the registration using the Telegram bot Follow this link
Please pick the project to work on

Introducing Serpstat

Find out about the main features of the service in a convenient way for you!

Please send a request, and our specialist will offer you education options: a personal demonstration, a trial period, or materials for self-study and increasing expertise — everything for a comfortable start to work with Serpstat.




We are glad of your comment
Upgrade your plan

Upgrade your plan

Export is not available for your account. Please upgrade to Lite or higher to get access to the tool. Learn more

Sign Up Free

Thank you, we have saved your new mailing settings.

View Editing


You have run out of limits

You have reached the limit for the number of created projects. You cannot create new projects unless you increase the limits or delete existing projects.

I want more limits
Open support chat