This site uses cookies and other tracking technologies to make possible your usage of the website, assist with navigation and your ability to provide feedback, analyse your use of our products and services, assist with our promotional and marketing efforts, and provide better user experience.

By using the website, you agree to our Privacy policy

Accept and continue

Report a bug

Cancel
37
How-to 8 min read August 23, 2019

How to specify robots.txt directives for Google robots

Robots.txt is a document with the TXT extension which contains recommendations for robots of various search engines, and this document is located in the root folder of your web resource.

Why robots.txt should be on a website

Robots.txt commands are directives which allow or disallow scanning particular sections of the web resource. With this file, you can allow or limit scanning of your web resource or its particular pages by search engine robots. Here's an example of how directives work on a website:
Why do you need robots.txt file
The picture shows that access to certain folders, and sometimes individual files, does not allow scanning by search engine robots. Originally, the directives in the file are advisory and can be ignored by the search robot, but they normally take this instruction into account. Technical support also warns webmasters that alternative methods are sometimes required to prevent indexing
The limitations of robots.txt
What pages should be closed?

Normally, technical pages are closed for indexing. Cart pages, personal data and customer profiles should also be protected from indexing. They should also be protected from indexing via the robots.txt file.
How is the file created, and what directives are used?

The document is created via Wordpad or Notepad++, and it should have the ".txt" extension. Add the necessary directives and save the document. Next, upload it to the root of your website. Now let's talk more about the contents of the file.

There are two types of commands:

  • allow scanning (Disallow);
  • close scanning access (Allow);

The following things are additionally specified:

  • the crawl-delay;
  • the host;
  • map of the website pages (sitemap.xml).

Characters in robots.txt

The slash "/" is used to select the whole website.

The symbol "*" means any character sequence. Thus, you can specify that scanning is allowed up to a certain folder or file:
Disallow: */trackback
The symbol "$" means the end of the line.

You can address the search engine bot via the User-Agent + the bot name to which you apply the rule, for example:
User-agent: Google
But the "User-agent:*" will mean addressing all bots, Google and other ones. Addressing the bot, you need to know its specifics as each algorithm aims to resolve certain tasks. The specifics of the most used search engines are described below.

Checking robots.txt in Google

The names used for the Google crawlers:

  • Googlebot is a crawler indexing pages of a website;
  • Googlebot Image scans pictures and images;
  • Googlebot Video scans all video content;
  • AdsBot Google analyzes the quality of all advertising published on desktop pages;
  • AdsBot Google Mobile analyzes the quality of all advertising published on mobile website pages;
  • Googlebot News assesses pages before they go to Google News;
  • AdsBot Google Mobile Apps assesses the quality of advertising of Android applications, similarly to AdsBot.

Having learned the names of search robots and management commands, let's move on to analyzing an example of how to compose a document. Well, let's turn to the search bot for Google and completely disallow the website scanning. The command will be displayed like this:

User-agent: GoogleDisallow: /

Now, let's allow all bots to index the website to provide an example:

User-agent: *Allow: /

Let's put down the link to the sitemap and host of your website. As a result, we'll get the robots.txt for https:

User-agent: *Allow: /
Host: https://example.com
Sitemap: https://example.com/sitemap.xml

Thus, we reported that our site could be scanned without any restrictions, and we also indicated the host and sitemap. If you need to limit scanning, use the Disallow command. For example, block access to the technical components of the website:

User-agent: *Disallow: /wp-login.phpDisallow: /wp-register.phpDisallow: /feed/Disallow: /cgi-binDisallow: /wp-admin
Host: https://example.com
Sitemap: https://example.com/sitemap.xml

If your website uses the HTTP protocol instead of HTTPS, don't forget to change the contents of the lines.

Here's an example of the real file for a web resource:
Example of robots.txt file
We reported with this method that all search engines have limited access to crawl specified folders. Remember that the document is case sensitive. Folders with the same character set will not be the same if you use capital letters in different ways. For example "example", "Example", and "EXAMPLE". A common mistake for beginners is to use capital letters in the file name, for example, "Robots.txt" (which is wrong), instead of "robots.txt".

Checking the accuracy of robots.txt

The document should be located only in the root folder. Placing it in the "Admin", "Content" and similar subfolders is wrong. The system will not take this file into account, and all the work will be done in vain. Be sure to upload the document correctly by going to the main page of the site and adding "/robots.txt" to the website address. Then press Enter and see if the page has loaded. The link will look like this: yoursiteadress.com/robots.txt.
Checking the robots.txt file
The 404 error page returned in response means that you saved the file incorrectly. There are built-in tools from Google that you can use to verify the correct operation of the directives themselves. For instance, the Search Console can verify file accuracy.

Go to the panel and select the tobots.txt Tester tool in the left-hand menu:
robots.txt tester in Google Search Console
In the window that opens, you can paste the copied text from the file and start scanning. Documents that are not yet uploaded to the root folder of the website are inspected this way.
How to test robots.txt file in Google Search Console
Check the correctness of the existing "robots.txt" document by specifying the path to it as shown in the screenshot:


robots.txt correctness check in Google Search Console

Conclusion

Robots.txt is necessary to limit scanning of certain pages of your website that do not need to be included in the index since they are originally technical. To create such a document, you can use Wordpad or Notepad ++.

Write down what search robots you are addressing and send them a command as described above.

Next, verify the file accuracy through built-in Google tools. If no errors occur, save the file to the root folder and once again check its availability by clicking on the link yoursiteadress.com/robots.txt. If the link is active, everything is done correctly.

Remember that originally, directives are advisory, and you need to use other methods to completely ban page indexing.

This article is a part of Serpstat's Checklist tool
Checklist at Serpstat
Checklist is a ready-to-do list that helps to keep reporting of the work progress on a specific project. The tool contains templates with an extensive list of project development parameters where you can also add your own items and plans.
Try Checklist now

Learn how to get the most out of Serpstat

Want to get a personal demo, trial period or bunch of successful use cases?

Send a request and our expert will contact you ;)

Rate the article on a five-point scale

The article has already been rated by 0 people on average out of 5
Found an error? Select it and press Ctrl + Enter to tell us
Subscribe to our newsletter
Keep up to date with our latest news, events and blog posts!

Share this article with your friends

Sign In Free Sign Up

You’ve reached your query limit.

Or email
Forgot password?
Or email
Back To Login

Don’t worry! Just fill in your email and we’ll send over your password.

Are you sure?

Awesome!

To complete your registration you need to enter your phone number

Back

We sent confirmation code to your phone number

Your phone Resend code Queries left

Something went wrong.

Contact our support team
Or confirm the registration using the Telegram bot Follow this link
Please pick the project to work on

Personal demonstration

Serpstat is all about saving time, and we want to save yours! One of our specialists will contact you and discuss options going forward.

These may include a personal demonstration, a trial period, comprehensive training articles & webinar recordings, and custom advice from a Serpstat specialist. It is our goal to make you feel comfortable while using Serpstat.

Name

Email

Phone

We are glad of your comment
Upgrade your plan

Upgrade your plan

Export is not available for your account. Please upgrade to Lite or higher to get access to the tool. Learn more

Sign Up Free

Спасибо, мы с вами свяжемся в ближайшее время

Invite
View Editing

E-mail
Message
Optional
E-mail
Message
Optional

You have run out of limits

You have reached the limit for the number of created projects. You cannot create new projects unless you increase the limits or delete existing projects.

I want more limits