robots.txt

Manu Magno

Wiki

Kategorie

0 Kommentar(e)

Diskussion

The robots.txt is a text file in which you can define which parts of a domain a web crawler is allowed to crawl and which not.

Definition

With the text file robots.txt you can exclude single files in a directory, whole directories, subdirectories or complete domains from crawling. You store them in the root of the domain.

Almost every website on the Internet contains a robots.txt file, but not all website operators are familiar with its function or even know that it exists.

How does the robots.txt work?

Once the crawler of a search engine has arrived at a website, it looks for a robots.txt file. If it finds it, the crawler reads it first.

Because it contains information or possibly instructions on “how the search engine should crawl”, the information there instructs further crawler actions on that particular web page. If there is no robots.txt file (or if it does not contain instructions prohibiting the activity of a user agent), the search engine will examine all content linked from the source code. But the search engines still decide for themselves whether to follow the instructions of robots.txt or to ignore them partially or even completely.

How to create and edit a robots.txt file

If you don’t have a robots.txt yet, you can easily create one:

On wordpress, you can create a sort of sample robots.txt via the Yoast plugin under “tools” with one click. There you can also edit the Robots.txt.

On the server of your website, you can easily create a text document in the start or root directory using an FTP client such as FileZilla, for example with the name “robots.txt” using Notepad. You can then easily edit this file via the server. To be on the safe side, you should always make a backup of your old robots.txt file before making any changes.

To create a robots.txt file, Google provides webmasters with instructions.

Why you need a robots.txt?

In search engine optimization (SEO), the robots.txt file takes a big role depending on the website.

With WordPress, you can use robots.txt to deny access to the admin area (wp-admin), for example, to protect the sensitive database data located there.

Other types of websites, such as online stores, use robots.txt to block out certain parameters or IDs to prevent duplicate content. But also to limit the amount of irrelevant pages for search engines and to direct the focus on relevant content.

The function of the file should always be used with caution. On the one hand, search engines themselves decide whether they follow the instructions in the robots.txt file, on the other hand, you can make important content inaccessible to search enginesby entering incorrect information.

Conclusion

The robots.txt decides the crawl behavior for your website, while the meta robots tag can determine indexing behavior at the individual page (or page element) level. However, using robots.txt is not necessarily easy: For smaller websites, robots.txt does not need to contain numerous instructions, while for larger sites and online shops, on the other hand, the correct operation can play an important role for crawlability and clean indexing.

icon
Inhaltsverzeichnis

    Leave a Reply

    Your email address will not be published.

    icon
    icon

    Jetzt kostenlos registrieren!

    Ausschließlich für Geschäftskunden (B2B). Mit Absenden deiner E-Mail Adresse stimmst du unseren AGB und Datenschutzbestimmungen zu. Die Registrierung ist durch reCAPTCHA geschützt. Es gelten die Google Datenschutzerklärung und Nutzungsbedingungen.