This site uses cookies and other tracking technologies to make possible your usage of the website, assist with navigation and your ability to provide feedback, analyse your use of our products and services, assist with our promotional and marketing efforts, and provide better user experience.

By using the website, you agree to our Privacy policy

Close

Report a bug

Cancel
896 37
News 14 min read March 5, 2020

Most Wanted: We Launched Our Own Link Index With New Architecture

Most Wanted: We Launched Our Own Link Index With A New Architecture
Stacy Mine
Alex Danilin
Lead Product Manager at Serpstat
Despite a pack of awesome updates that have been released in Serpstat recently, we kept secret the most important and the most ambitious thing - the release of our own link index with a new architecture.

Let's see what its advantages are, and analyze the theory of information search to understand what we created.
Hello Spring: Serpstat Sale Up To 35%
Serpstat Sring Sale is your only oppotrunity to buy a Serpstat subscription for a year with 35% off and for 6 months with 20% off. You can buy a subscription directly on the site.

Backlinks index and its benefits

Backlinks Index is a vast and lively backlinks database. The whole process of building our own index took us about a year, and finally, we opened public access to its results. Now we index the links ourselves, regularly update the data so that only the latest garbage-free data is always available in the index. Serpstat Page Rank and Serpstat Trust Rank metrics now show more accurate results.

5 advantages of our index

1
We managed to build a backlink index without separating historical and recent data. Typically, split architecture index updates require lengthy updates to merge new and old data. Our data in the index is updated continuously, and there is no need to do global updates. The index is always fresh.

You don't need to wait until the changes become available in the interface and switch between different indexes to see the current picture on your sites for all the time. Data will be updated as our robots cross the entire Internet and stored in one index.
2
The current architecture allows us to make selections from the database much faster. This means that you will receive data in the interface more quickly than it was before. We also accelerated the interface itself, which adds speed to working with reports.
3
Due to the limitations that existed in the previous architecture, we could not implement many functional elements necessary for working with data, such as filters and sortings. Now we have bypassed most of these restrictions and will gradually release this functionality.
4
We can identify links from malicious sites. So far in the interface, you can see only the number of such domains referring to the site. We will roll out more detailed information on them in future releases.
5
Links from Blogspot, Wordpress, uCoz and other blog platforms get into our database within 24 hours from the moment they are added to the sites. We use a separate method for crawling sites on such platforms, which allows us to allocate more resources to crawl other websites and show the referring domains and subdomains separately.
6
By adapting the crawling depth to the host performance and flexible selection of the start points of the scan, we bypass useful sites and don't fit into an infinite number of doorways. We started building the link index from all the domains that are in the Google search index according to our data, and now we continue to expand.
7
Serpstat Domain Rank based on the data from the new index will more accurately display the domain quality.
Quick And Easy Backlink Analysis With Serpstat API Is Now Available
As an example of data analysis in our index, we will share with you a selection of spam domains. We identified them by a large number of external links, spam content, links to malicious sites and spam anchor list. Feel free to upload this list to the Google Disavow links tool.

How to build a link index

Now let's talk about the technical side, that is, how you can build your own link index. Suppose you have a website that can be represented as a scheme:
The arrows in the scheme indicate that there's a link to another page from the page. For example, the Category 1 page refers to Page 1, Page 2, and the Main page. From the Main page, there is a link to Category 1 page, therefore on the scheme, the arrow between them is bidirectional.

Such a scheme is a visual representation, but it makes no sense to store link connections between pages in the form of a scheme. The same information can be presented in a more compact form, from which it will be possible to restore it at any time.

Let's convert the scheme into a table: in the rows and columns we indicate all the pages of our site; at the intersection of the row and column we will set 1 if there is a link to the page in the column from the page in the row, and 0 if there is no such link.

It looks like this:
We check the table for compliance with the scheme using the example of the same Category 1 page. According to the table, this page refers to the Main page, Page 1 and Page 2. As a result, we got a binary contingency table that describes the structure of our site. According to such a table, it is convenient, for example, to consider Page rank. An example of calculation for our scheme is here.

Suppose we want to store the structure of our site on a hard drive. The advantage of such a table over the visual representation is that it takes up much less storage space. Moreover, you are able not to store zeros.

As a result, our table will look like this:
For one site, everything looks good so far. But let's imagine that there are 10 other sites with 5 pages each. Pages can link to other pages of your site as well as to pages of other websites. As a result, our Internet model has 50 pages, which means 50 columns and 50 rows in the contingency table.

There are usually much more links between pages of one site than links from pages from one site to pages of another. Our contingency table in some places will be densely filled with units, but most of it will be empty. We could fold the table row by row to the list of pages that this link goes to.

On the example of our table, it would look like this:

Main page → Category 1, Category 2, Page 2
Category 1 → Main page, Page 1, Page 2
Category 2 → Main page
Page 1 → Main page, Page 2
Page 2 → Main page, Page 1

We would get the so-called direct index. But let's look at this visual and try to answer the question that excites many SEO-specialists: which pages link to Page 2? We will have to go through all the lists and see if Page 2 is among them. This is easy to do when there are five such lists. But there are billions of pages on the Internet and checking so many lists turns into a very time-consuming task.

To get an answer to the question that worries us so much, we can fold the contingency table by columns. As a result, we get lists of pages that link to the page:

Main page ← Category 1, Category 2, Page 1, Page 2
Category 1 ← Main page
Category 2 ← Main page
Page 1 ← Category 1, Page 2
Page 2 ← Main page, Category 1, Page 1

Now, to find the answer, it's enough for us to find among all the lists only the list we need for the Page 2, and we don't need to go through the contents of each list. So we got the backlink index. It is in this form that Serpstat stores link data to your site. This is a very simplified model, but the basic principles in it are correct.
What Everyone Must Know About Backlink Quality, Link Penalties And Bad Links

What's new in Serpstat link index

Back to the billions again. If you want to add a new page to our scheme and link to it from existing ones, this will be easy to do. Just as quickly, you can create pages on your sites and link them to other pages. Now multiply this action by the number of websites on the Internet and see the global scale of all changes.

To keep the index up to date in such a dynamic environment, our Serpstatbot/1.0 bot (advanced backlink tracking bot; abuse@serpstatbot.com) follows the rules in robots.txt and other basic rules. More details here.

Things to consider when analyzing link mass

Billions of pages can't be easy. Unfortunately, we cannot combine all data from two indexes into one. This is a complex and resource-consuming task. Therefore, we will immediately warn you about some restrictions that will be in the new index:
1
We store historical data since about the beginning of 2019; we cannot show older data. We will continuously add new data and expand history.
2
There may be differences in data for specific domains, both up and down. Now we have 223 million hosts and 884 billion links in our database, and we are working to ensure that everything becomes bigger :)
3
Some metrics will not be available in the new interfaces. We will gradually release them.
4
API methods for working with links will go to the old index for now. The same goes for the plugin. We will fix this in the next releases.

Conclusion

The release of the new index is a significant page in Serpstat history. We are encouraged by the opportunities that the new architecture and its technical implementation offer us.

We have many plans to finalize both the index itself and the interface. Therefore, we have a big wish - give feedback on our new index. We can personally communicate if you have any comments. This can affect our development priorities and speed up the release of the functionality that you need when analyzing links.

Learn how to get the most out of Serpstat

Want to get a personal demo, trial period or bunch of successful use cases?

Send a request and our expert will contact you ;)

Rate the article on a five-point scale

The article has already been rated by 2 people on average 5 out of 5
Found an error? Select it and press Ctrl + Enter to tell us

Share this article with your friends

Sign In Free Sign Up

You’ve reached your query limit.

Or email
Forgot password?
Or email
Back To Login

Don’t worry! Just fill in your email and we’ll send over your password.

Are you sure?

Awesome!

To complete your registration you need to enter your phone number

Back

We sent confirmation code to your phone number

Your phone Resend code Queries left

Something went wrong.

Contact our support team
Or confirm the registration using the Telegram bot Follow this link
Please pick the project to work on

Personal demonstration

Serpstat is all about saving time, and we want to save yours! One of our specialists will contact you and discuss options going forward.

These may include a personal demonstration, a trial period, comprehensive training articles & webinar recordings, and custom advice from a Serpstat specialist. It is our goal to make you feel comfortable while using Serpstat.

Name

Email

Phone

We are glad of your comment
Upgrade your plan

Upgrade your plan

Export is not available for your account. Please upgrade to Lite or higher to get access to the tool. Learn more

Sign Up Free

Спасибо, мы с вами свяжемся в ближайшее время

Invite
View Editing

E-mail
Message
Optional
E-mail
Message
Optional

You have run out of limits

You have reached the limit for the number of created projects. You cannot create new projects unless you increase the limits or delete existing projects.

I want more limits