Start Exploring Keyword Ideas

Use Serpstat to find the best keywords for your website

How-to, September 24, 2019 | 12684 | 9 min read – Read later

What pages should be closed from indexing

The website content should be informative and useful for the user, and relevant pages should be open for search crawling. However, there are cases when indexing a page is undesirable and may reduce the optimization effect.

Reasons to close pages from indexing

The website owner is willing that the potential customer finds his web resource in the search results, and the search engine, in turn, is willing to provide the user with valuable and relevant information. Only those pages that make sense in the search results should be open for indexing.

Let's consider the reasons why you should remove indexing from the website or individual pages:

Content doesn't carry any semantic meaning for the search engine and users or misleads them

Such content can include technical and administrative website pages, as well as personal data information. Besides, some pages can give the illusion of duplicate content, which is a violation and may lead to penalties on the entire resource.

Irrational use of the crawling budget

A crawling budget is a certain number of website pages that a search engine can crawl. We are interested in spending server resources only on valuable and high-quality pages. In order to quickly and efficiently index important resource content, you need to close unnecessary content from crawling.

What pages should be removed from indexing

Website pages under development

If the project is still under development, it's better to close the website from search engines. It is advised that you allow access to crawl full and optimized pages, which are recommended to be displayed in the search results. When developing a website on a test server, you should restrict access to it by using a robots.txt file, no index or password.

Website copies

When you set up a site copy, it is important to correctly specify the mirror using 301 redirects or the rel = "canonical" attribute in order to maintain the ranking of an existing resource and inform the search engine: where is the source website and where is its clone. Hiding a working resource from indexing is strongly undesirable. By doing that you risk resetting the website age and the acquired reputation.

Printed pages

Printed pages may be useful to the visitor. The necessary information can be printed in the adapted text format: article, product information, company location map.

In fact, the printed page is a copy of its main version. If this page is open for indexing, the search robot can choose it as a priority and consider it more relevant. To properly optimize a website with a large number of pages, you should remove indexed pages for printing.

To close the link to the document, you can use the content output using AJAX, close the pages using the meta tag <meta name = "robots" content = "noindex, follow" />, or close all indexed pages from indexing in robots.txt.

Unnecessary documents

Besides pages with the main content, there may be PDF, DOC, XLS documents available for reading and downloading on the website. Along with the pages in the search results, you can see the pdf-file headers.

Perhaps the content of these files doesn't meet the needs of the target audience of the website. Or documents appear in the search results above the html pages of the website. In this case, indexing documents is undesirable, and it's better to close them from crawling in the robots.txt file.

User forms and elements

This includes all pages that are useful to customers, but don't carry information value for other users and, as a result, search engines. This can be a registration and application forms, basket, personal account. Access to such pages should be limited.

Website technical data

Technical pages are only for official use by the administrator. For example, a login form to the control panel.

Personal customer information

This data may contain not only the name and surname of the registered user but also contact and payment information saved after order placement. This information should be protected from viewing.

Sorting pages

The structural features of these pages make them look similar. In order to reduce the risk of sanctions from search engines for duplicate content, we recommend removing them from indexing.

Pagination pages

Although these pages partially duplicate the content of the main page, it's not recommended to remove them from indexing; instead, you need to set the rel = "canonical" attribute, the rel = "prev" and rel = "next" attributes, specify which parameters break the pages in the "URL parameters" section in the Google Search Console, or intentionally optimize them.

How to close pages from indexing

Robots meta tag with noindex value in html file

If there is a noindex attribute in the html-code of the page, that's a signal for the search engine not to index it in search results. To use meta tags, you need to add <meta name = "robots" content = "noindex, follow" /> to the <head> header of the corresponding html document.

When you are using this method, the page will be closed for crawling even if there are external links to it.

To close the text from indexing (or a separate piece of text), and not the entire page, use the html tag: <noindex> text </noindex>.

Robots.txt file

You can block access to all selected pages in this document or tell search engines not to index the website.

You can limit the indexing of pages through the robots.txt file in the following way:

User-agent: * #search engine name 
Disallow: /catalog/ #partial or full page URL to be closed

To use this method in an efficient way, you should check to see if there are external links to the section of the website that you want to hide, and also change all internal links leading to it.

.htaccess configuration file

You can restrict access to the website with a password using this document. You must specify the Username of users who can have access to the necessary pages and documents in the .htpasswd password file. Next, specify the path to this file using a special code in the .htaccess file.

AuthType Basic
AuthName "Password Protected Area"
AuthUserFile path to the file with password
Require valid-user

Removing URLs through Webmaster Services

In the Google Search Console, you can remove the page from the search results by specifying the URL in a special form and indicating the reason why it must be removed. This option is available in the Google Index section. It may take some time to process the request.

Removing URLs from an index in the Google Search Console

Conclusion

Index management is an important SEO step. It should not only optimize the efficiency of pages for traffic but also hide content that does not have any benefit for indexing.

Restricting access to certain pages and documents will save search engine resources and speed up indexing of the entire website.

This article is a part of Serpstat's Checklist tool

" title = "What pages should be closed from indexing in Google 16261788341697" />

Checklist is a ready-to-do list that helps to keep reporting of the work progress on a specific project. The tool contains templates with an extensive list of project development parameters where you can also add your own items and plans.

Try Checklist now

Speed up your search marketing growth with Serpstat!

Keyword and backlink opportunities, competitors' online strategy, daily rankings and SEO-related issues.

A pack of tools for reducing your time on SEO tasks.

Get free 7-day trial

Rate the article on a five-point scale

The article has already been rated by 1 people on average 2 out of 5

Found an error? Select it and press Ctrl + Enter to tell us

Discover More SEO Tools

Tools for Keywords

Keywords Research Tools – uncover untapped potential in your niche

Serpstat Features

SERP SEO Tool – the ultimate solution for website optimization

Keyword Difficulty Tool

Stay ahead of the competition and dominate your niche with our keywords difficulty tool

Check Page for SEO

On-page SEO checker – identify technical issues, optimize and drive more traffic to your website