Robots.txt- Quick Guide for Beginners

What is Robots.txt file?

Robot.txt is a text file that tells search engine what part of your website to be crawled and what not to be crawled.

In this file, you can list the pages or content that you want to keep away from the search engines like Google.

Why Robots.txt file is important?

It is important for below 3 reasons.

  1. Blocking Non-Public Pages
  2. Maximizing Crawling Budget
  3. Preventing Indexing Resoureces


How Does this File Look Like?

Sitemap: [URL location of sitemap]

User-agent: [bot identifier]

[directive 1]

[directive 2]

[directive ...]


User-agent: [another bot identifier]

[directive 1]

[directive 2]

[directive ...]

Here user agent and directives means-

Use-agent - It is a specific bot.

Directives- They are the rules that you want bots to follow.

Here is a simple robots.txt file with two rules, explained below:

# Group 1
User-agent: Googlebot
Disallow: /nogooglebot/

# Group 2
User-agent: *
Allow: /

Sitemap: https://www.example.com/sitemap.xml

Learn the rules here- https://support.google.com/webmasters/answer/6062596?hl=en

How to Track Issues with Your Robots.txt file?

Check “Coverage” report in your Google Search Console.


要查看或添加评论,请登录

Vishal P.的更多文章

社区洞察

其他会员也浏览了