This site uses cookies and other tracking technologies to make possible your usage of the website, assist with navigation and your ability to provide feedback, analyse your use of our products and services, assist with our promotional and marketing efforts, and provide better user experience.

By using the website, you agree to our Privacy policy

Accept and continue

Report a bug

Cancel
127
How-to 9 min read September 24, 2019

What pages should be closed from indexing

The website content should be informative and useful for the user, and relevant pages should be open for search crawling. However, there are cases when indexing a page is undesirable and may reduce the optimization effect.

Reasons to close pages from indexing

The website owner is willing that the potential customer finds his web resource in the search results, and the search engine, in turn, is willing to provide the user with valuable and relevant information. Only those pages that make sense in the search results should be open for indexing.

Let's consider the reasons why you should remove indexing from the website or individual pages:
1
Content doesn't carry any semantic meaning for the search engine and users or misleads them

Such content can include technical and administrative website pages, as well as personal data information. Besides, some pages can give the illusion of duplicate content, which is a violation and may lead to penalties on the entire resource.
2
Irrational use of the crawling budget

A crawling budget is a certain number of website pages that a search engine can crawl. We are interested in spending server resources only on valuable and high-quality pages. In order to quickly and efficiently index important resource content, you need to close unnecessary content from crawling.
Crowd Budget Spending

What pages should be removed from indexing

Website pages under development

If the project is still under development, it's better to close the website from search engines. It is advised that you allow access to crawl full and optimized pages, which are recommended to be displayed in the search results. When developing a website on a test server, you should restrict access to it by using a robots.txt file, no index or password.

Website copies

When you set up a site copy, it is important to correctly specify the mirror using 301 redirects or the rel = "canonical" attribute in order to maintain the ranking of an existing resource and inform the search engine: where is the source website and where is its clone. Hiding a working resource from indexing is strongly undesirable. By doing that you risk resetting the website age and the acquired reputation.

Printed pages

Printed pages may be useful to the visitor. The necessary information can be printed in the adapted text format: article, product information, company location map.

In fact, the printed page is a copy of its main version. If this page is open for indexing, the search robot can choose it as a priority and consider it more relevant. To properly optimize a website with a large number of pages, you should remove indexed pages for printing.

To close the link to the document, you can use the content output using AJAX, close the pages using the meta tag <meta name = "robots" content = "noindex, follow" />, or close all indexed pages from indexing in robots.txt.

Unnecessary documents

Besides pages with the main content, there may be PDF, DOC, XLS documents available for reading and downloading on the website. Along with the pages in the search results, you can see the pdf-file headers.

Perhaps the content of these files doesn't meet the needs of the target audience of the website. Or documents appear in the search results above the html pages of the website. In this case, indexing documents is undesirable, and it's better to close them from crawling in the robots.txt file.
Indexing a pdf file on a site

User forms and elements

This includes all pages that are useful to customers, but don't carry information value for other users and, as a result, search engines. This can be a registration and application forms, basket, personal account. Access to such pages should be limited.

Website technical data

Technical pages are only for official use by the administrator. For example, a login form to the control panel.
Indexing the admin panel on the site

Personal customer information

This data may contain not only the name and surname of the registered user but also contact and payment information saved after order placement. This information should be protected from viewing.

Sorting pages

The structural features of these pages make them look similar. In order to reduce the risk of sanctions from search engines for duplicate content, we recommend removing them from indexing.

Pagination pages

Although these pages partially duplicate the content of the main page, it's not recommended to remove them from indexing; instead, you need to set the rel = "canonical" attribute, the rel = "prev" and rel = "next" attributes, specify which parameters break the pages in the "URL parameters" section in the Google Search Console, or intentionally optimize them.

How to close pages from indexing

Robots meta tag with noindex value in html file

If there is a noindex attribute in the html-code of the page, that's a signal for the search engine not to index it in search results. To use meta tags, you need to add <meta name = "robots" content = "noindex, follow" /> to the <head> header of the corresponding html document.

When you are using this method, the page will be closed for crawling even if there are external links to it.

To close the text from indexing (or a separate piece of text), and not the entire page, use the html tag: <noindex> text </noindex>.

Robots.txt file

You can block access to all selected pages in this document or tell search engines not to index the website.

You can limit the indexing of pages through the robots.txt file in the following way:
User-agent: * #search engine name 
Disallow: /catalog/ #partial or full page URL to be closed
To use this method in an efficient way, you should check to see if there are external links to the section of the website that you want to hide, and also change all internal links leading to it.

.htaccess configuration file

You can restrict access to the website with a password using this document. You must specify the Username of users who can have access to the necessary pages and documents in the .htpasswd password file. Next, specify the path to this file using a special code in the .htaccess file.
AuthType Basic
AuthName "Password Protected Area"
AuthUserFile path to the file with password
Require valid-user

Removing URLs through Webmaster Services

In the Google Search Console, you can remove the page from the search results by specifying the URL in a special form and indicating the reason why it must be removed. This option is available in the Google Index section. It may take some time to process the request.
Removing URLs from an index in the Google Search Console

Conclusion

Index management is an important SEO step. It should not only optimize the efficiency of pages for traffic but also hide content that does not have any benefit for indexing.

Restricting access to certain pages and documents will save search engine resources and speed up indexing of the entire website.

This article is a part of Serpstat's Checklist tool
Checklist at Serpstat
Checklist is a ready-to-do list that helps to keep reporting of the work progress on a specific project. The tool contains templates with an extensive list of project development parameters where you can also add your own items and plans.
Try Checklist now

Learn how to get the most out of Serpstat

Want to get a personal demo, trial period or bunch of successful use cases?

Send a request and our expert will contact you ;)

Rate the article on a five-point scale

The article has already been rated by 0 people on average out of 5
Found an error? Select it and press Ctrl + Enter to tell us

Share this article with your friends

Sign In Free Sign Up

You’ve reached your query limit.

Or email
Forgot password?
Or email
Back To Login

Don’t worry! Just fill in your email and we’ll send over your password.

Are you sure?

Awesome!

To complete your registration you need to enter your phone number

Back

We sent confirmation code to your phone number

Your phone Resend code Queries left

Something went wrong.

Contact our support team
Or confirm the registration using the Telegram bot Follow this link
Please pick the project to work on

Personal demonstration

Serpstat is all about saving time, and we want to save yours! One of our specialists will contact you and discuss options going forward.

These may include a personal demonstration, a trial period, comprehensive training articles & webinar recordings, and custom advice from a Serpstat specialist. It is our goal to make you feel comfortable while using Serpstat.

Name

Email

Phone

We are glad of your comment
Upgrade your plan

Upgrade your plan

Export is not available for your account. Please upgrade to Lite or higher to get access to the tool. Learn more

Sign Up Free

Спасибо, мы с вами свяжемся в ближайшее время

Invite
View Editing

E-mail
Message
Optional
E-mail
Message
Optional

You have run out of limits

You have reached the limit for the number of created projects. You cannot create new projects unless you increase the limits or delete existing projects.

I want more limits