A basic guide on Robots.txt with Best Practices
A complete guide on Robots.txt with Best Practices
A robots.txt is a file that is obeyed by most search engines including Google, Bing, and Yahoo to crawl or not crawl pages on the website. A basic file looks like this –
User-agent: *
Disallow:
Sitemap:?https://example.com/sitemap.xml
Importance of Robots.txt File
How does the robots.txt file work?
The search engine works by crawling and indexing from the syntax of the robots.txt file and discovering to follow and nofollow. From robots.txt, the crawler knows which page to be indexed and which not.
Where you should put Robots.txt file?
The robots.txt file should be placed in the root of your domain and make sure you write it as “robots.txt” as it is case sensitive otherwise it will not work.
Best Practices on Creating Robots.txt
User agent: Googlebot
领英推荐
Disallow: /images
The above directive means to crawl everything by the spider’s name Google bot except images folder. Make sure to enter the right disallow directive for images as it is case sensitive and should not be Images instead of images. You can choose * for all bots and syntax be like this
User-agent: *
Disallow: /images
2. Common User agents – Here is the list of most common agents to match the most used search engines
In the above example use of * in the first line is used to match the file name and will be blocked but second line will not be blocked from the crawl.
Ex
Disallow: /*.php$
In the above example /index.php will be blocked but /index.php?p=1. Hence it is important to use the expressions very diligently otherwise many pages will get block on the site.
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
6. Using a sitemap is crucial to index all the webpages although you need to submit it to the search console for recommendations.
Source: https://eliteseozone.com/robots-txt-guide/