Let's do learn about robots.txt

Let's do learn about robots.txt

robots.txt is a text file that is placed in the root directory of a website. It provides instructions to web robots and search engines on which pages or sections of a website should be crawled and indexed or not.

Control your crawl budget with robots.txt

By using the robots.txt file to exclude specific pages or directories from being crawled by search engines, you can potentially reduce the amount of time and resources that search engines spend on your website. This can help focus their attention on the most important and relevant pages of your website, which can improve the overall crawl efficiency and indexing of your website.

Don't

user-agent:
disallow: /downloads/        

No user agent is defined.

Do

user-agent: *
disallow: /downloads/

user-agent: magicsearchbot
disallow: /uploads/        

Don't

# start of file
disallow: /downloads/

user-agent: magicsearchbot
allow: /        

No search engine crawler will read the?disallow: /downloads?directive.

Do

# start of file
user-agent: *
disallow: /downloads/        

Don't

user-agent: *
allow: https://example.com
sitemap: /sitemap-file.xml
        

do

user-agent: *
allow: /
sitemap: https://exmaple.com/sitemap-file.xml        

Note: Search engines may stop processing?robots.txt?midway through if the file is larger than 500 KiB. This can confuse the search engine, leading to incorrect crawling of your site.

要查看或添加评论,请登录

Sushil Dahiya的更多文章

社区洞察

其他会员也浏览了