This site uses cookies and other tracking technologies to make possible your usage of the website, assist with navigation and your ability to provide feedback, analyse your use of our products and services, assist with our promotional and marketing efforts, and provide better user experience.

By using the website, you agree to our Privacy policy

Accept and continue

Report a bug

Cancel
79
How-to 12 min read November 1, 2019

How to fix crawl errors in Google

Crawl errors appear when the search engine cannot access the page of the website. This happens because of the errors in the server settings, CMS failures, changes in the URL structure, and other reasons. Errors can affect the website ranking in search results, as well as the users' attitude to the resource.

Where to check the website crawl?

The Google Search Console tools have everything you need to check your website. The console contains reports on web resource crawl statistics, a number of impressions and clicks, as well as average position in the search.

You may find errors in crawl reports. It also gives a brief explanation for the webmaster: reasons why they appeared and how to solve them.

Information in the Search Console is delayed. Therefore, it can display information about errors that have already been fixed several weeks ago.

In addition to consoles, it is necessary to check the website using other crawling and auditing methods.

Crawling errors in the Search Console

Google divides errors into two types:

  • site errors: appear if the bot cannot bypass the entire resource;
  • URL errors: indicate an issue on separate pages.

The first should be fixed as quickly as possible: they affect the promotion process. If the website is not optimized, fixing these errors will have little effect on the ranking in search results.

Website errors

This is how the webmaster panel looks like without errors:
Site Errors in Google Search Console
If there are problems, you will see blocks with details specified:
Coverage in Google Search Console
The screenshot below shows that Sitemap.xml is a URL that is blocked by the noindex tag. Just remove this tag to set up a crawl:
Details about crawl errors in Google Search Console
But errors appear for other reasons as well.

DNS errors

DNS is a domain name system, which data is used by robots when visiting resources. If DNS errors occur, then the search engine cannot connect to the site, and users cannot find and open it.

Google doesn't report any error at once: it may return a couple of times, and if all attempts failed, it would display a message in the console.

Google developers say that most DNS errors don't affect promotion because they don't interfere with crawling. But they still should be fixed asap, otherwise, users may leave the website due to slow page loading.
URL unreachable errors in Google Search Console
How to fix DNS errors?
1
Contact the DNS provider through the hosting provider or DNS registrar and find out if there are any problems on their side.
2
Use the "View as Googlebot" tool: the crawl function will show that the website connection is normal. Currently, this function is available only in the old version of the console, but at the end of March 2019, Google will disable it, and it's still a secret whether it will remain in its current view.
3
See if the server gives HTTP 404 and 500 error codes via web-sniffer. Based on the results, fix the errors with the help of a developer.
4
Make sure the website is accessible to visitors. Use the downforeveryoneorjustme service. It helps to check the availability of the website, not only on your device.
Down for Everyone or Just Me check service
5
Configure website availability monitoring through special services. This will help to receive notifications that the website is unavailable in any convenient way.

Server errors

Errors occur if the server takes a long time to process a request for information about a page from a search robot. The main reasons include:
1
Inappropriate server.
2
The server can be down from DDOS attacks.
3
The server may fail to cope if it's crawled by Google or someone using an auditor. See point 1 in the list.
4
Hosters can artificially limit the amount of processed traffic per month.
It is necessary to take a responsible approach when choosing a hosting provider; it must ensure uninterrupted operation in any situation.

Server errors also need to be fixed first. If the website is currently running (check through the Google crawler, which will be available until March 2019), and an error message appears in the console, it may have been detected earlier.

The webmaster's task is to make sure that the situation doesn't happen again. If the new version of the console does not introduce a similar tool, use crawler software. For example, Netpeak Spider.

What can happen?

  • timeout: occurs if the connection timed out, error code 408;

  • truncated headers: the robot connected, but didn't completely receive the server response headers, error code 304;

  • connection reset: the request was processed by the server, but the bot didn't manage to get the result, error code 205;

  • truncated response body: not completely received due to premature shutdown, error code 206;

  • connection failure: occurs if the CDN or the content delivery network cannot connect to the web servers, error code 522. In other words, the computer cannot connect to the server;

  • the lack of response means that the server or proxy server didn't receive a response from the upstream server to complete its request, error code 504;

  • the timeout period expired: the robot cannot connect for the time period set by the system, error code 502. That is, the timeout period expired before the operation was completed. Either the server didn't respond, because the time elapsed before the connection was successful, or all connections have already been used.


The difference between this point and the first timeout is that it has connected to the host, but the desired result hasn't been received. This is not a connection issue: it can be both in the request and in the host itself.
502 Bad Gateway error
When displaying these errors, check through the Google crawler in the console whether the search engine can now access the resource. If you fixed everything, but the error occurred again, you should contact the hoster. This happens due to incorrect settings or server overload.

Access error to robots.txt

This error appears when the file is not available for the search robot. An "unavailable" error occurs when the firewall is blocking Google. If it is not fixed, crawling will be delayed.

The root document robots.txt for a web resource is created in order to prescribe directives and close visits to search robots to technical web pages and other pages that don't contain useful and unique information. Also, adjust the crawling process and provide a path to the sitemap for a better crawl.

This file allows reducing the number of requests to your server, respectively, the website load.
Google perceives the information in the file as a directive, not a direct indication.
To fix the issue, check if robots.txt is configured correctly:

  • whether the sections and documents prohibited to process are specified correctly;
  • whether the file is available or gives a 404 server response.

URL errors

If there are URL errors, there is an issue of crawling the page. Similar pages will not be displayed in the search. To establish this fact, you should read the report from the Google Search Console "URL Errors".
URL errors in Google Search Console
These errors appear when Googlebot is unable to process individual pages due to incorrect redirects (chains of endless redirects, redirects to broken pages), and errors of a non-updated sitemap.xml. The report can be obtained in the Search Console. To do this, go to the Coverage section from the main menu, as shown in the screenshot above.

It is easier to fix such issues: when analyzing them, you can see the specific pages that have issues.

URL errors display a list from most important to minor issues. They need to be fixed since the robot has a certain "crawling budget": if it spends all its time viewing non-existent pages, the website won't have useful pages crawled (or it will take a long time to index them).

"Soft 404" error. It appears when:

  • the page that was deleted doesn't return an HTTP 404 response code at the request of the user or bot;
  • non-existent page redirects users to an irrelevant page. For example, if you set a redirect from the bikes category to the motorcycles category;
  • when the page is blank, there is no content on it.

In order to fix these errors, you should do the following:
1
Remove broken links.
2
Configure redirects correctly.
3
Configure 404 response code for non-existent pages.
4
Close from indexing or delete pages that don't have content.
Similar errors occur when setting up a 301 redirect to irrelevant URLs. Google may misinterpret them. At the same time, it is undesirable to redirect many outdated pages to the main page of the web resource, since it's better to put links to similar pages or similar content. So it is more likely that the user will receive the correct result to his request.

404 error.
It occurs when the robot crawls a non-existent page because it was referenced in other documents, including sitemap.xml. There are internal and external 404 errors:

  • if the link to the remote page is inside the website, developers can remove it themselves;

  • if the link is external, developers, together with the SEO specialist or content manager, can configure 301 redirects in the .htaccess file to transfer its link weight to any relevant page.

Access denied. This error appears when the robot doesn't have access to the URL. For example, the directives are used in the robots.txt file, a ban on crawling the entire resource or individual directories and sections. Or the hoster has blocked access to the website.

To fix this issue, just remove the reason that prevents access:

  • enable opening the page without authorization;
  • generate the robots.txt file correctly and make it available for the search robot;
  • check through Google's crawler how the search engine sees the website from its part.

Transition suspended errors. It usually occurs by redirect errors as well as JavaScript. How to fix them:

  • check redirect chains through online services, for example, redirectdetective.com. Note that the number of redirects should be minimal, it is advisable to limit it to one;

  • work with the website structure: at least one static link should lead to each page. To do this, check everything manually or use crawler systems/crawler tools if your site has more than 1000 pages;

  • replace the redirected URLs found in the service with the destination URL by recording in the Sitemap.

Conclusion

Crawler errors can appear due to the fault of the webmaster or for other reasons: hosting and domain, CMS, content managers, communication with other API, services, databases, and other issues.

In order to preserve the search engine ranking, you should regularly check on the presence of any errors and fix them as soon as possible.

This article is a part of Serpstat's Checklist tool
Checklist at Serpstat
Checklist is a ready-to-do list that helps to keep reporting of the work progress on a specific project. The tool contains templates with an extensive list of project development parameters where you can also add your own items and plans.
Try Checklist now

Learn how to get the most out of Serpstat

Want to get a personal demo, trial period or bunch of successful use cases?

Send a request and our expert will contact you ;)

Rate the article on a five-point scale

The article has already been rated by 1 people on average 5 out of 5
Found an error? Select it and press Ctrl + Enter to tell us

Share this article with your friends

Sign In Free Sign Up

You’ve reached your query limit.

Or email
Forgot password?
Or email
Back To Login

Don’t worry! Just fill in your email and we’ll send over your password.

Are you sure?

Awesome!

To complete your registration you need to enter your phone number

Back

We sent confirmation code to your phone number

Your phone Resend code Queries left

Something went wrong.

Contact our support team
Or confirm the registration using the Telegram bot Follow this link
Please pick the project to work on

Personal demonstration

Serpstat is all about saving time, and we want to save yours! One of our specialists will contact you and discuss options going forward.

These may include a personal demonstration, a trial period, comprehensive training articles & webinar recordings, and custom advice from a Serpstat specialist. It is our goal to make you feel comfortable while using Serpstat.

Name

Email

Phone

We are glad of your comment
Upgrade your plan

Upgrade your plan

Export is not available for your account. Please upgrade to Lite or higher to get access to the tool. Learn more

Sign Up Free

Спасибо, мы с вами свяжемся в ближайшее время

Invite
View Editing

E-mail
Message
Optional
E-mail
Message
Optional

You have run out of limits

You have reached the limit for the number of created projects. You cannot create new projects unless you increase the limits or delete existing projects.

I want more limits