Crawl Budget

Dennis Benjak

Wiki

Kategorie

0 Kommentar(e)

Diskussion

Definition

Basically, the crawl budget is the number of resources used to crawl a web page. How many and how often e.g. subpages of a website are searched is decided by Google.

The crawl budget varies from website to website and is defined by the size of the site, the number of backlinks and how buggy the website is.

How is the crawl budget affected?

There are some factors that affect the crawl budget of a crawler. However, most of them influence the budget negatively! When crawling, it is important that the different pages offer added value.

A problem can exist when a web page contains many sub-pages. Due to the budget of the crawler, not all pages of the website are crawled and therefore not indexed. This can result in the website owner getting less traffic.

You should also know the crawl rate in this context. Google’s definition of this is as follows:

The crawling frequency refers to the number of requests per second that the Googlebot makes during the crawling process on your website, e.g. five requests per second.

This depends on two factors:

  1. Number of concurrent connections that the spider needs to crawl a web page.
  2. The time that elapses between requests for a web page.

The disadvantage turns out to be when a website is very slow to respond. Consequently, this will reduce the crawl rate for that web page.

In contrast, if a web page responds very quickly, the crawler assumes that the servers are running properly.

How fast or slow a web page responds is called the “crawl health”.

Also important to the topic is the term index budget. It specifies how many of the crawled pages are actually indexed.

The difference can be seen in the following example:
If a crawler crawls an old web page that has many subpages that are no longer accessible(error code – 404), then the crawl budget is burdened with each page accessed.

However, since most pages give the 404 response and therefore cannot be indexed, the index budget is underutilized . If a page is accessible but does not provide the content you were hoping for, a status 200 (OK- the request was successfully processed) is sent instead of the 404 status.

Relevance for SEO

For search engine optimization, the crawl budget is of great importance. As explained in the example above, the crawl budget may not be sufficient to reach all pages of a website. Thus, the crawl budget is not ideally utilized.

Here you can however with a few
Optimizations remedy the situation
. It is important to identify pages that are inaccessible, contain low quality or low content. These pages should be denied to the searchbot.

Tip: The crawl budget can also be set manually in Google Search Console.

How do I make the best use of the crawl budget?

For every problem there is a solution. You just have to know what your options are. Now follow a few tips on how to make the most of the potential of the GoogleBot or other spiders.

  1. First you should exclude unimportant pages like login pages from crawling. This is possible via robot.txt or via the meta tag specifications nofollow and noindex. The crawler first takes note of the information in Robot.txt. So if the corresponding information is stored, it does not even consider the information of the meta tag. Problem of Robot.txt: Excluded pages may appear despite the entry, if they are e.g. have been linked to by other pages and Google considers them relevant.
  1. The website should have a flat page structure and the subpages should be accessible by a few clicks.
  2. Pages that are more important should have internal linking, especially those that are connected by backlinks.

Conclusion

The crawl budget has a great importance for search engine optimization. If you “unnecessarily” waste the crawl budget of a search engine spider, you may be giving away valuable traffic . If you own or manage a small or medium-sized website , you don’t need to worry about whether the crawl budget is sufficient.

icon
Inhaltsverzeichnis

    Leave a Reply

    Your email address will not be published.

    icon
    icon

    Jetzt kostenlos registrieren!

    Ausschließlich für Geschäftskunden (B2B). Mit Absenden deiner E-Mail Adresse stimmst du unseren AGB und Datenschutzbestimmungen zu. Die Registrierung ist durch reCAPTCHA geschützt. Es gelten die Google Datenschutzerklärung und Nutzungsbedingungen.