Robots.txt- Quick Guide for Beginners
What is Robots.txt file?
Robot.txt is a text file that tells search engine what part of your website to be crawled and what not to be crawled.
In this file, you can list the pages or content that you want to keep away from the search engines like Google.
Why Robots.txt file is important?
It is important for below 3 reasons.
- Blocking Non-Public Pages
- Maximizing Crawling Budget
- Preventing Indexing Resoureces
How Does this File Look Like?
Sitemap: [URL location of sitemap]
User-agent: [bot identifier]
[directive 1]
[directive 2]
[directive ...]
User-agent: [another bot identifier]
[directive 1]
[directive 2]
[directive ...]
Here user agent and directives means-
Use-agent - It is a specific bot.
Directives- They are the rules that you want bots to follow.
Here is a simple robots.txt file with two rules, explained below:
# Group 1 User-agent: Googlebot Disallow: /nogooglebot/ # Group 2 User-agent: * Allow: / Sitemap: https://www.example.com/sitemap.xml
Learn the rules here- https://support.google.com/webmasters/answer/6062596?hl=en
How to Track Issues with Your Robots.txt file?
Check “Coverage” report in your Google Search Console.