|How-to||– 12 min read –||November 1, 2019|
How to fix crawl errors in Google
Where to check the website crawl?
You may find errors in crawl reports. It also gives a brief explanation for the webmaster: reasons why they appeared and how to solve them.
Information in the Search Console is delayed. Therefore, it can display information about errors that have already been fixed several weeks ago.
In addition to consoles, it is necessary to check the website using other crawling and auditing methods.
Crawling errors in the Search Console
- site errors: appear if the bot cannot bypass the entire resource;
- URL errors: indicate an issue on separate pages.
The first should be fixed as quickly as possible: they affect the promotion process. If the website is not optimized, fixing these errors will have little effect on the ranking in search results.
Google doesn't report any error at once: it may return a couple of times, and if all attempts failed, it would display a message in the console.
Google developers say that most DNS errors don't affect promotion because they don't interfere with crawling. But they still should be fixed asap, otherwise, users may leave the website due to slow page loading.
Server errors also need to be fixed first. If the website is currently running (check through the Google crawler, which will be available until March 2019), and an error message appears in the console, it may have been detected earlier.
The webmaster's task is to make sure that the situation doesn't happen again. If the new version of the console does not introduce a similar tool, use crawler software. For example, Netpeak Spider.
What can happen?
- timeout: occurs if the connection timed out, error code 408;
- truncated headers: the robot connected, but didn't completely receive the server response headers, error code 304;
- connection reset: the request was processed by the server, but the bot didn't manage to get the result, error code 205;
- truncated response body: not completely received due to premature shutdown, error code 206;
- connection failure: occurs if the CDN or the content delivery network cannot connect to the web servers, error code 522. In other words, the computer cannot connect to the server;
- the lack of response means that the server or proxy server didn't receive a response from the upstream server to complete its request, error code 504;
- the timeout period expired: the robot cannot connect for the time period set by the system, error code 502. That is, the timeout period expired before the operation was completed. Either the server didn't respond, because the time elapsed before the connection was successful, or all connections have already been used.
The root document robots.txt for a web resource is created in order to prescribe directives and close visits to search robots to technical web pages and other pages that don't contain useful and unique information. Also, adjust the crawling process and provide a path to the sitemap for a better crawl.
This file allows reducing the number of requests to your server, respectively, the website load.
- whether the sections and documents prohibited to process are specified correctly;
- whether the file is available or gives a 404 server response.
It is easier to fix such issues: when analyzing them, you can see the specific pages that have issues.
URL errors display a list from most important to minor issues. They need to be fixed since the robot has a certain "crawling budget": if it spends all its time viewing non-existent pages, the website won't have useful pages crawled (or it will take a long time to index them).
"Soft 404" error. It appears when:
- the page that was deleted doesn't return an HTTP 404 response code at the request of the user or bot;
- non-existent page redirects users to an irrelevant page. For example, if you set a redirect from the bikes category to the motorcycles category;
- when the page is blank, there is no content on it.
404 error. It occurs when the robot crawls a non-existent page because it was referenced in other documents, including sitemap.xml. There are internal and external 404 errors:
- if the link to the remote page is inside the website, developers can remove it themselves;
- if the link is external, developers, together with the SEO specialist or content manager, can configure 301 redirects in the .htaccess file to transfer its link weight to any relevant page.
To fix this issue, just remove the reason that prevents access:
- enable opening the page without authorization;
- generate the robots.txt file correctly and make it available for the search robot;
- check through Google's crawler how the search engine sees the website from its part.
- check redirect chains through online services, for example, redirectdetective.com. Note that the number of redirects should be minimal, it is advisable to limit it to one;
- work with the website structure: at least one static link should lead to each page. To do this, check everything manually or use crawler systems/crawler tools if your site has more than 1000 pages;
- replace the redirected URLs found in the service with the destination URL by recording in the Sitemap.
In order to preserve the search engine ranking, you should regularly check on the presence of any errors and fix them as soon as possible.
This article is a part of Serpstat's Checklist tool
|Try Checklist now|
Learn how to get the most out of Serpstat
Want to get a personal demo, trial period or bunch of successful use cases?
Send a request and our expert will contact you ;)
Cases, lifehacks, researches and useful articles
Don’t you have time to follow the news? No worries!
Our editor Stacy will choose articles that will definitely help you with your work. Join our cozy community :)