You might have seen some website pages getting indexed in minutes by Google. You may also have seen some website pages taking weeks to get indexed. And so it raises a question. Why do some pages get indexed within a day while some others take weeks to get indexed?
The short answer to that question is ‘crawl budget,’ something that many SEOs tend to ignore, but it is also something that can improve the ROI of your SEO investments when done right.
If your website’s new landing pages and articles are taking longer to get indexed, it might be time to do a crawl budget optimization. Now, many aspects of the crawl budget are controlled by Google, but there are a few things you need to do to make the most out of it. We’ll get into that in a bit; let’s understand the crawl budget better first.
What is crawl budget?
In simple terms, SEO crawl budget is the number of pages Google will crawl on your site within a given timeframe. This number may vary slightly from day to day, but it remains relatively stable.
For example, depending on various factors, including the website size, Google may crawl just four pages on the website within a given timeframe. It might crawl 5,000 pages, or it may even crawl 1,000,000 pages within a day.
The major factors affecting your SEO crawl budget are the number of pages on your website, your site’s overall health, the strength of the internal link network, the number of errors Google encounters, and the number of links pointing to your website.
Why is crawl budget important for SEO?
Crawl budget is important for big websites because if you don’t have a high crawl budget enough to get all your pages indexed quickly, you will miss out on potential ranking opportunities. And it will slow down the return on your SEO investments.
Google is really good at finding and crawling pages, so most websites with less than a thousand pages don’t need to be concerned about the crawl budget. If your website has an excellent internal link network and only a few pages, say, 20 or 30 pages, you don’t need to worry about your crawl budget at all.
However, if you have a big site with thousands of pages, Google can have trouble finding them all if your website architecture is not up to the mark. Also, when you add a new section to your website with several pages, you should have enough crawl budget to get them indexed quickly.
Six things to know about crawl budget
● Since Googlebot doesn’t have unlimited resources, it divides its attention across millions of websites. By assigning crawl budgets to websites, Googlebot prioritizes its crawling efforts and finds a balance between its resources and the pages on the internet it needs to crawl.
● Time is important for Googlebot since it needs to keep its index fresh and updated. When a website is fast, Googlebot can crawl more pages without spending too much time on the website.
● If you update your website pages frequently, Google will recrawl such pages more often to keep its index updated.
● Crawl budget is not just about web pages; it’s about any file or document search engines crawl, including JavaScript, CSS, and PDF files.
● Search engines are careful not to overload servers with crawl requests. So if you are on a shared server, your crawl budget can depend on the number of websites hosted on your server and the amount of traffic those websites are getting.
● If you have separate versions of your website for mobile and desktop running on the same host, they’ll have a shared crawl budget.
How to increase crawl budget and optimize it for SEO?
We’ve already mentioned that you can optimize your crawl budget to get your pages crawled efficiently, although Google sets the crawl budget for your website based on different factors. Let’s see how you can optimize your Google crawl budget.
a) Avoid broken and redirecting URLs.
Let’s say Google crawls 50 pages on your website at a time, and you have exactly 50 pages on your website. If for some reason, Google encounters some broken links during the site crawl, it will be a dead end for Google, and you’ll lose a part of your crawl budget to those broken links. Search engines crawl each resource on your website, including JS and CSS, so it’s important to ensure your website doesn’t return any errors when they crawl it.
If you are using WordPress, you can use the Rank Math SEO plugin’s 404 Monitor to get 404 error logs of all resources on your website, including JavaScript and CSS files.
When Google encounters a 301 redirect or a redirect chain during site crawling, it adds the new URL to the to-do list. Google may access the URL immediately, but more often than not, it adds the URL to the to-do list and moves on.
Redirects take more time to load than non-redirecting links, and Google just doesn’t want to exhaust its resources on redirecting links.
b) Ensure a strong internal link network.
Googlebot prefers pages with several links pointing to them as it indicates that the page has quality content.
Also, the way Googlebot works is that it will crawl a page, and if there are links on the page, it will go on to crawl the pages that the links point to. So, if you have many backlinks and internal links, there’s a higher chance of getting crawled and indexed. Yes, getting backlinks for all the pages on your website is a time-consuming task, but it helps in getting your pages crawled effectively by Google.
However, it is not necessary to have backlinks to get your pages crawled—a strong internal link network also works fine. Since Googlebot also crawls the pages that are linked from a page, all that you need is a proper internal link network that doesn’t leave out any important pages on your website.
c) Avoid orphan pages.
Orphan pages are pages on your website that are not linked from any other part of your website. Users will have to type in the URL of the page to access it since the website navigation wouldn’t lead users to that page. We’re talking about a page that stays disconnected from the rest of the website in terms of links.
Googlebot needs at least one link, preferably several links pointing to a page, to discover and crawl it. So orphan pages stay out of Googlebot’s radar.
To learn more about orphan pages and fix them, you can read our detailed guide on orphan pages. Fixing orphan pages takes time, but you can use tools like ScreamingFrog SEO Spider along with Google Analytics to find them. If you use WordPress, you can use the Ahrefs SEO plugin. You can also run a website crawl test using ScreamingFrog.
d) Use a flat website architecture.
A flat website architecture means all pages on your website can be reached in one, two, or three clicks from the website homepage or main page. It ensures that all the pages on your website have some link authority flowing to them.
When you use a complex site architecture, a long chain of links will lead to inner pages on your website. This results in a poor flow of link authority to those pages, negatively affecting the visibility of such pages to search engines.
You can group your website pages into categories and use tags properly to make sure link authority flows evenly to all your pages.
e) Avoid on-site duplicate content.
Google doesn’t want to waste its resources by crawling and indexing multiple pages with very similar content. And when you have pages with duplicate content, you’ll be exhausting your crawl budget on pages that don’t add value, which will drain or delay crawl activity on pages that actually have value.
However, for some websites like e-commerce websites, some pages will have to stay similar for different reasons. For example, the products listed on a page will stay similar when sorting, but the URL or the parameters of the URL will be different.
Earlier, Google Search Console had a feature that lets web admins specify URL parameters to avoid crawling websites’ unnecessary URLs. But Google is much better now at understanding which URL parameters are relevant and which pages are duplicates, and so they have removed the parameter specification feature from Google Search Console.
However, some other search engines are not as good as google in understanding URL parameters. So if you make all such URLs accessible to search engines, your crawl budget can get severely affected.
There are two things you can do to avoid this crawl trap. You can use your robots.txt to instruct search engines not to crawl such pages. Also, you can use a canonical tag to tell search engines which page you want them to crawl and index.
You can read our expert guide on robots.txt to block search engines from crawling duplicate pages and URLs with parameters.
f) Improve your site speed.
Improving your site speed can increase your Google crawl rate. Googlebot doesn’t want to waste its resources on slow loading pages—slow pages have a negative impact on user experience, and so it makes sense not to waste time on such pages.
On the other hand, fast-loading pages offer a much better user experience and so offer more value to both Google and readers. Googlebot values its time, and it can crawl more fast pages than slow pages in the same timeframe.