10361 82
Serpstat updates 13 min read March 5, 2020

Most Wanted: We Launched Our Own Link Index With New Architecture

Most Wanted: We Launched Our Own Link Index With A New Architecture
Stacy Mine
Alex Danilin
Lead Product Manager at Serpstat
Despite a pack of awesome updates that have been released in Serpstat recently, we kept secret the most important and the most ambitious thing - the release of our own link index with a new architecture.

Let's see what its advantages are, and analyze the theory of information search to understand what we created.

Backlinks index and its benefits

Backlinks Index is a vast and lively backlinks database. The whole process of building our own index took us about a year, and finally, we opened public access to its results. Now we index the links ourselves, regularly update the data so that only the latest garbage-free data is always available in the index.

5 advantages of our index

1
We managed to build a backlink index without separating historical and recent data. Typically, split architecture index updates require lengthy updates to merge new and old data. Our data in the index is updated continuously, and there is no need to do global updates. The index is always fresh.

You don't need to wait until the changes become available in the interface and switch between different indexes to see the current picture on your sites for all the time. Data will be updated as our robots cross the entire Internet and stored in one index.
2
The current architecture allows us to make selections from the database much faster. This means that you will receive data in the interface more quickly than it was before. We also accelerated the interface itself, which adds speed to working with reports.
3
Due to the limitations that existed in the previous architecture, we could not implement many functional elements necessary for working with data, such as filters and sortings. Now we have bypassed most of these restrictions and will gradually release this functionality.
4
We can identify links from malicious sites. So far in the interface, you can see only the number of such domains referring to the site. We will roll out more detailed information on them in future releases.
5
Links from Blogspot, Wordpress, uCoz and other blog platforms get into our database within 24 hours from the moment they are added to the sites. We use a separate method for crawling sites on such platforms, which allows us to allocate more resources to crawl other websites and show the referring domains and subdomains separately.
6
By adapting the crawling depth to the host performance and flexible selection of the start points of the scan, we bypass useful sites and don't fit into an infinite number of doorways. We started building the link index from all the domains that are in the Google search index according to our data, and now we continue to expand.
7
Serpstat Domain Rank based on the data from the new index will more accurately display the domain quality.
Quick And Easy Backlink Analysis With Serpstat API Is Now Available
As an example of data analysis in our index, we will share with you a selection of spam domains. We identified them by a large number of external links, spam content, links to malicious sites and spam anchor list. Feel free to upload this list to the Google Disavow links tool.

How to build a link index

Now let's talk about the technical side, that is, how you can build your own link index. Suppose you have a website that can be represented as a scheme:
Most Wanted: We Launched Our Own Link Index With New Architecture 16261788391319
The arrows in the scheme indicate that there's a link to another page from the page. For example, the Category 1 page refers to Page 1, Page 2, and the Main page. From the Main page, there is a link to Category 1 page, therefore on the scheme, the arrow between them is bidirectional.

Such a scheme is a visual representation, but it makes no sense to store link connections between pages in the form of a scheme. The same information can be presented in a more compact form, from which it will be possible to restore it at any time.

Let's convert the scheme into a table: in the rows and columns we indicate all the pages of our site; at the intersection of the row and column we will set 1 if there is a link to the page in the column from the page in the row, and 0 if there is no such link.
We check the table for compliance with the scheme using the example of the same Category 1 page. According to the table, this page refers to the Main page, Page 1 and Page 2. As a result, we got a binary contingency table that describes the structure of our site. According to such a table, it is convenient, for example, to consider Page rank. An example of calculation for our scheme is here.

Suppose we want to store the structure of our site on a hard drive. The advantage of such a table over the visual representation is that it takes up much less storage space. Moreover, you are able not to store zeros.
For one site, everything looks good so far. But let's imagine that there are 10 other sites with 5 pages each. Pages can link to other pages of your site as well as to pages of other websites. As a result, our Internet model has 50 pages, which means 50 columns and 50 rows in the contingency table.

There are usually much more links between pages of one site than links from pages from one site to pages of another. Our contingency table in some places will be densely filled with units, but most of it will be empty. We could fold the table row by row to the list of pages that this link goes to.

On the example of our table, it would look like this:

Main page → Category 1, Category 2, Page 2
Category 1 → Main page, Page 1, Page 2
Category 2 → Main page
Page 1 → Main page, Page 2
Page 2 → Main page, Page 1

We would get the so-called direct index. But let's look at this visual and try to answer the question that excites many SEO-specialists: which pages link to Page 2? We will have to go through all the lists and see if Page 2 is among them. This is easy to do when there are five such lists. But there are billions of pages on the Internet and checking so many lists turns into a very time-consuming task.

To get an answer to the question that worries us so much, we can fold the contingency table by columns. As a result, we get lists of pages that link to the page:

Main page ← Category 1, Category 2, Page 1, Page 2
Category 1 ← Main page
Category 2 ← Main page
Page 1 ← Category 1, Page 2
Page 2 ← Main page, Category 1, Page 1

Now, to find the answer, it's enough for us to find among all the lists only the list we need for the Page 2, and we don't need to go through the contents of each list. So we got the backlink index. It is in this form that Serpstat stores link data to your site. This is a very simplified model, but the basic principles in it are correct.
What Everyone Must Know About Backlink Quality, Link Penalties And Bad Links

Serpstat link index

Back to the billions again. If you want to add a new page to our scheme and link to it from existing ones, this will be easy to do. Just as quickly, you can create pages on your sites and link them to other pages. Now multiply this action by the number of websites on the Internet and see the global scale of all changes.

To keep the index up to date in such a dynamic environment, our Serpstatbot/1.0 bot (advanced backlink tracking bot; abuse@serpstatbot.com) follows the rules in robots.txt and other basic rules. More details here.

Conclusion

The release of the new index is a significant page in Serpstat history. We are encouraged by the opportunities that the new architecture and its technical implementation offer us.

We have many plans to finalize both the index itself and the interface. Therefore, we have a big wish - give feedback on our new index. We can personally communicate if you have any comments. This can affect our development priorities and speed up the release of the functionality that you need when analyzing links.

Speed up your search marketing growth with Serpstat!

Keyword and backlink opportunities, competitors' online strategy, daily rankings and SEO-related issues.

A pack of tools for reducing your time on SEO tasks.

Get free 7-day trial

Rate the article on a five-point scale

The article has already been rated by 3 people on average 4.5 out of 5
Found an error? Select it and press Ctrl + Enter to tell us

Share this article with your friends

Are you sure?

Introducing Serpstat

Find out about the main features of the service in a convenient way for you!

Please send a request, and our specialist will offer you education options: a personal demonstration, a trial period, or materials for self-study and increasing expertise — everything for a comfortable start to work with Serpstat.

Name

Email

Phone

We are glad of your comment
I agree to Serpstat`s Privacy Policy.

Thank you, we have saved your new mailing settings.

Report a bug

Cancel
Open support chat
mail pocket flipboard Messenger telegramm