How to Fix Blocked by robots.txt Errors?

How to Fix Blocked by robots.txt Errors?

A robots.txt file tells search engines which pages on your website they can and cannot crawl. It is a text file that is located at the root of your website. For example, the robots.txt file for the LinkedIn website is located at https://www.dhirubhai.net/robots.txt.

If your robots.txt file contains incorrect rules, it can block Googlebot from crawling and indexing your pages. This can lead to blocked errors in the Google Search Console.

Here are some tips on how to fix blocked by robots.txt errors:

  1. Use the robots.txt tester in Google Search Console to find blocking rules. This tool will show you which lines in your robots.txt file are preventing Googlebot from crawling your pages.
  2. Delete those rules in the actual robots.txt file on the webserver to allow access. You can do this using an FTP client or file manager.
  3. For auto-generated robots.txt files, submit a sitemap with the URLs you want indexed. This will override the robots.txt file and allow Googlebot to crawl and index the specified URLs.
  4. Don't try to control indexing with robots.txt. Robots.txt is used to control crawling, not indexing. To prevent pages from being indexed, use the meta robots noindex tag.
  5. Use the meta robots noindex tag to keep pages out of Google's index. This is the best way to prevent pages from being indexed, even if they are crawled by Googlebot.

Here is a more detailed explanation of each step:

Step 1: Use the robots.txt tester in Google Search Console to find blocking rules.

To use the robots.txt tester, go to Google Search Console and select the property for your website. Then, click on the Crawl tab and select Robots.txt tester.

Enter the URL of the page that is blocked by robots.txt and click Test. The robots.txt tester will show you which lines in your robots.txt file are preventing Googlebot from crawling the page.

Step 2: Delete those rules in the actual robots.txt file on the webserver to allow access.

Once you know which rules are blocking access, you can delete them from the actual robots.txt file on the web server. You can do this using an FTP client or file manager.

Step 3: For auto-generated robots.txt files, submit a sitemap with the URLs you want indexed.

If your robots.txt file is auto-generated by a CMS like WordPress, you cannot edit it directly. In that case, you can submit a sitemap in Google Search Console containing only the URLs you want indexed. This will override the robots.txt file and allow Googlebot to crawl and index the specified URLs.

Step 4: Don't try to control indexing with robots.txt.

Robots.txt is used to control crawling, not indexing. To prevent pages from being indexed, use the meta robots noindex tag.

Step 5: Use the meta robots noindex tag to keep pages out of Google's index.

To use the meta robots noindex tag, add the following code to the <head> section of the page you want to prevent from being indexed:

<meta name="robots" content="noindex">        

This will tell Googlebot not to index the page.

Summary

Fixing incorrect robots.txt rules will allow Googlebot to access and index your pages, resolving blocked errors in Google Search Console.

It is important to note that it may take some time for Google to recrawl and index your pages after you have fixed the robots.txt rules. You can check the status of your pages in Google Search Console by using the URL Inspection tool.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了