How to Resolve Website’s Crawl Errors Using Google Search Console

crawl stats

What is Google Search Console?

Google Webmaster Tools (GWT), a free web service offered by Google, which is an essential communication and diagnostic centre for your website is now known as Google Search Console (GSC). Indeed, “This tool is a boon to webmasters as it offers many features through which the website can be perfected for website visitors and search engines”. Webmasters can –

  • Submit sitemaps,
  • Review website links and keyword data,
  • Inspect for crawl errors (detected by the search engines),
  • Make amendments to how your website is visible in the search results and much more.

Whenever Google is planning to make a drastic change or any unusual activity is detected on your website, this Google Search Console tool will send crucial notifications to the webmasters (or to the one who manages and maintains websites).

I. What are Crawl Errors?
Crawl errors are divided into two – (a) Site errors and (b) URL errors.

a)   Site Errors:
Site errors occur when your entire site is affected. These include issues like fetching your robots.txt file, connectivity issues with your web server, and DNS resolution failures.
site errorsb)   URL Errors:
URL errors are encountered by Google when Googlebot tries crawling specific website pages through different devices like desktop, Smartphone, or Android app. Google uses different crawling mechanisms to access your website pages on a variety of devices and the errors found are particular to those types of pages.
For example, when Googlebot tries to access a website through a desktop device, errors like page not found or access denied is found.

url errorsWebmasters should regularly monitor for these website errors. Way too many errors will send a signal to Google that the website is badly maintained (website’s quality is poor). Setting 301 permanent redirects using <.htaccess> file, webmasters can fix the issue for those affected website pages so that it returns a 404-page (It is recommended to have a customised 404-page with all the website’s navigation links intact). Once this is done, just check the box appearing in front of the URL (in GSC tool) and click “mark as fixed” – it will make sure that those errors are cleaned up, besides which it will not do anything.

Smartphone visits
Google also tests for faulty redirects and blocked URLs for your Smartphone visits. Using robots.txt file, URL(s) can be blocked for Googlebot-mobile for Smartphone users. If they are kept intentionally, just check the box in the GSC tool. If they aren’t, webmasters should rectify it using robot.txt file and allow access. Finally, in the crawl error section, you’ll just want the following text to appear.

Crawl Stats
The Crawl Stats page gives details regarding the activity of Googlebot on your website for the period of last 90 days (all types of content that Google download like JavaScript, Flash, PDF files, images, and CSS are considered). It also takes into account AdSense fetches and fetches for some Google products like Google Images, Google Scholar, and Google News etc.
crawl statsIt lets webmasters know how their websites are performing, which is really a good thing. Suppose modifications to the website’s structure have been done, or webmaster has just added XML sitemap for the first time or alternations made to the robots.txt file – it will reflect in the given graphs accordingly.

There is really something terribly wrong with the website or website’s hosting server if these stats show a significantly declining line (of even a straight-line at zero). It is imperative to monitor the robots.txt file that might be blocking Googlebot or your website’s hosting server might be down.

II. Fetch as Google
If webmasters find any crawl errors, they should look into what happened and why. Webmasters can view their websites as Google sees it using one of the tools available in Google Search Console.

Webmasters can fetch a web page as Google in two ways: First, they can go to the “Fetch as Google” section in Google Search Console to enter a URL manually. Second, they can click the link at Crawl Errors and click the Fetch as Google link in the pop-up.

Fetch as Google

In the above image, 3 different types of status are found (these are for both Fetch and Fetch & Render commands):

a) Unreachable: Your website must be very slow and Googlebot did not have the patience to wait for very long time for your website to fully load (you need to make your website load faster), or your web-hosting server may have responded that it could not grant the request for the concerned URL(s).

b) Not Found: This means the web page was not found by Googlebot. This occurs when a proper redirect has not been setup when a URL/ structure alteration has been made, or you might have just deleted the web page where the server returns a 404 Error.

c) Partial: This means the specific web page can be partially shown, as some components are most likely not shown as intended or not at all – for example – you are blocking JS or CSS in your robots.txt file. When you click the line with the partial status in the overview, you will be directed to a snapshot of how Google showed your page. On that specific page, Search Console will also let you know which resources it could not find, so you will be able to fix them.

Sure, there are other statuses as well, which are as follows:

Temporarily Unreachable: There were too many successive requests made to the server for different URLs or the server took too much time to respond.
DNS Not Found: Very likely, you might have entered the wrong URL. The domain name entered may not be correct.
Not Authorised: Your web-hosting server conveys Google that URL access is restricted or blocked from the search engine crawlers (a 403 error is returned by the server).
Error: When trying to complete the fetch – an error occurred (In this case, contact Search Console product support).
Blocked: Your website’s robots.txt file conveys Googlebot to leave (i.e. don’t crawl the web page)
Redirected: Your website (HTML/JS) or the web-hosting server told Googlebot to visit another URL.
Complete: The status you really want to see – “Google was able to successfully crawl the entire web page”.
Unreachable robots.txt: Google was not able to reach your robots.txt file (Information regarding robots.txt file is explained below).

III. Why do you need a Robots.txt file?
A robots.txt file is used to prevent search engine bots from crawling specific website page(s) or URL(s). If you want that all your website pages/ URLs should be crawled and indexed by the search engines, just leave it out. But, if you want certain important URL(s) to be blocked from crawling and indexing by the search engines (due to security reasons), it’s recommended to create a robots.txt file and upload it onto the root of your website’s domain.
For example, a robots.txt file should be saved at the root of xyz.com at the URL address <http://www.xyz.com/robots.txt> – this will be easily found by the web crawlers.
Get more information on robots.txt file.

IV. Why you need a Sitemap?
A sitemap is a very important web page that you should have on your website. A sitemap contains all the URLs (web pages) of your website. A sitemap not only helps search engines but also website visitors to find all your web pages with ease. A sitemap also increases the chances of better crawling and indexing by all the major search engines. It is particularly helpful in the following cases:

  • Massive website: If your website has numerous web pages, search engines might fail to notice your recently updated or newly created web pages.
  • Archived Web Pages: You may have archived several of your older web content pages, which may remain isolated or those web pages may not be correctly interlinked to other important pages on your website. In case, your website don’t have proper interlinking of web pages, you can place all of those isolated web pages/URLs in your sitemap web page, which will help search engines to discover, crawl and index all those isolated web pages.

V. URL Parameters
It is good that we begin with the Google’s important warning here:
Use this feature only if you are sure how parameters work.
Incorrectly excluding URLs could result in many pages disappearing from search.
In this specific section, you can convey Google how to handle parameters for your website. When you click “Add Parameter”, immediately a pop-up will be presented to you with the following choices to select:
Add Parameter
In the above picture, two selections are presented, which are –
(1)    No: Doesn’t affect page content (example: tracks usage)
(2)    Yes: Changes, reorders, or narrows page content

These parameters are referred as passive and active URL parameters by Google.
Passive parameters are generally just used for referrals or tracking purposes – for instance, Google’s own utm_source, and Magento’s SID (session ID).
Active parameters can be used for pagination, sorting and for categorisation or translations.
If you think that you require selecting option 2 from the above alternatives, which is “Yes: Changes, reorders, or narrows page content” and select it in the above select box, you will be given other 4 choices to make your selection.

Here, you can set your parameter telling Google how to handle it for your website:
a)  Let Googlebot Decide: A general option that you can select if you are unsure what to select here.
b)  Only URLs with specified value: When you select this option, you are telling Google that you just want URLs crawled that have a specific value for this parameter and neglect the rest. It is used for avoiding duplicate content because of sorting options.
c)  No URLs: When you select this option, you are telling Google to avoid crawling web pages with this parameter. This one is also used for avoiding duplicate content.
d)  Every URL: Every URL using this parameter is a totally new product or page.

Here, in place of using URL parameters for option (b) & (c) above, you can also set the right canonical on all of these pages.

Concluding Words

These are basics to resolving your website’s crawl errors using Google Search Console. Use it prudently, so that you can maintain your website accurately to provide better usability to your end users as well as search engines. It will definitely increase your website’s overall performance and maximise sales/revenues if you’re into selling products or services.

It's only fair to share...Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn