robots.txt - that's the reason
If you are running a website, you have probably already discovered a file called robots.txt in your FTP program. You can find out what is behind this text file and why it is important in this practical tip.
robots.txt - Requirements for search engines
Each domain should have a robots.txt. It is an important part of SEO.
- Search engines work with crawlers. These are small, independently working programs. They search the internet for content. Websites are read out and indexed.
- Because crawlers work independently, they are also called search engine bots or robots.
- Your website's robots.txt tells these crawlers which directories can and cannot be read.
- To get this information, crawlers first look for a domain's robots.txt. For this reason, the robots.txt must be at the top level of the directory structure. It must not be moved to a directory - then the bots will not find these text files.
- Put simply, robots.txt gives search engines crawling two pieces of information. The entry "User-agent:" specifies for which robot - this is addressed in robots.txt as a user-agent - the following instruction applies.
- This is followed by the entry "allow:" or "disallow:". The directories and subdirectories that the bot is allowed to crawl and which directories he should leave out when indexing are then listed behind.
- The entry "allow:" is less important. Anything that is not expressly excluded is indexed by the robot anyway.
- Some CMS such as Drupal create robots.txt directly during installation. In WordPress you can create the robots.txt using a plugin.
If you receive the Google message "Unusually many requests", you can find out in our next practical tip what you can do.