If you’ve read through part one of this blog, you should have an understanding of Google’s Crawl Budget and the best practices for maximising your crawling efficiency. So… let’s move on to some of the methods you can use to monitor your site’s crawl profile. This will help alert you to any potential crawl/indexing issues that may arise within your website, so that you can address them accordingly, in a timely manner.
5 Key steps to monitoring the crawl profile of your website
1. Use Google’s Crawl Stats Report to identify availability issues being encountered by Googlebot
Google’s Crawl Stats Report is the place to go when you’re looking for Googlebot’s crawling history for your website. If Google encounters availability issues when crawling your site, here is where you will find them. The Crawl Stats Report comes with its own documentation that can help treat any diagnosed issues that have arisen – but some other things that can help include: Blocking pages from crawling when they don’t need to be crawled, increasing page loading and rendering speed, and increasing server capacity.
2. Check site logs to check for any pages that should be crawled, but aren’t.
Your site logs (located on your server) will tell you which pages have been crawled by Googlebot, and which haven’t. If you have pages within your site that are meant to be crawled but haven’t been included, you can try one of the following: Update your sitemaps to reflect new URLs, ensure your robots.txt file isn’t blocking page access, check your site settings within the URL parameters tool, review your crawling priorities and/or check that you’re not running out of server capacity.
3. Ensure that website updates are being crawled within a reasonable period of time
Site logs will also tell you when specific URLs within your site were crawled by Googlebot. The indexing date can be found using the URL inspection tool – a simple calculation using the two will give you an approximate time period that it takes for Google to crawl and index your website content. If the time taken is longer than expected, you can try a number of things, including: Ensuring you are using a simple URL structure and that your links are crawlable, utilising the tag in sitemaps to indicate when content has been updated, and using a news sitemap if relevant to your content – simply Ping Google when your sitemap has been posted or changed.
4. Optimise site pages and resources for crawling in order to maximise your crawl efficiency
If you are experiencing slow crawling and the above monitoring methods are reflecting that, but you’re struggling to find a solution, you can always try simply increasing your page load speed. Ensure you prevent large but unimportant resources from being loaded by Googlebot (using your robots.txt file), block non-critical resources and avoid long redirect chains. You should also hide all URLs that you don’t want to show within search results.
5. Monitor your server for overcrawling of your site so that you can prevent Googlebot from becoming overwhelmed
This rarely happens as Google’s algorithms typically prevent it from overwhelming your site with crawl requests. However, if you are seeing excessive Googlebot requests to your site within your server, there are things you can try to help prevent this from happening going forward. You can temporarily return 503/429 HTTP result codes for Googlebot requests when your server is overloaded, before reducing the Googlebot crawl rate for your site. When the crawl rate subsequently drops, you can remove the 503/429 HTTP result codes. You can monitor and adjust the crawl rate and host capacity accordingly, over time. Note: returning 503 for more than 2 days will cause Google to drop the 503 URLs from the index.
Don’t believe everything you hear about crawling…
Whilst optimising the crawling and indexing of your website, it’s important to follow the aforementioned steps. It’s also just as important to avoid a number of common misconceptions about how Google crawls and indexes websites, as these can quickly negatively impact the performance of your site. Let’s take a look at some of the myths you may have encountered during your SEO research.
Compressing sitemap files into zipped folders will increase your crawl budget – FALSE. These compressed files still need to be fetched from your server and so you’re not really saving any time or effort in zipping them up.
Google has trouble crawling query parameters and so prefers clean URLs – FALSE. Google can easily recognise and crawl URLs that contain parameters. You can however also choose to block duplicate content with parameters if you’re trying to avoid duplicate content issues.
Smaller, less popular sites just aren’t crawled as often as large, popular sites – FALSE. If a site contains important content that is changed regularly then it will be crawled often, regardless of the size of the website.
Google prefers fresh content, so if I keep tweaking my page, it’ll be ranked higher – FALSE. Whilst having a site with regularly updated content signals that it’s up to date and relevant, content is rated by quality, regardless of age. There is no additional value in making unnecessary changes just because you want the page to appear ‘fresher’.
So… that’s it from me, for now. Hopefully you have enough information to start monitoring and optimising your crawl budget in line with these best practices. If you’re still unsure of where to start, or if you’d like help with the crawling and indexing of your website, give the Technical SEO professionals at Varn a call or drop us an email today. We’d love to hear from you!