How many web sites are blocked form Ahrefs, Semrush, Majestic

How many web sites are blocked form Ahrefs, Semrush, Majestic

Long ago jetoctopus team was interested in how much popular marketing instruments show a real picture of the links and semantics. It’s not a secret that some niches have their backlinks hidden from ahrefs by default and the one of the goals of SEO specialist is to mislead the competitors. The biggest problem is that different services can show the completely different numbers and basically it is normal – everyone has his own metric system.

We took a fresh list of all the domains – it is about 213 mln. items. 122 mln. of them respond with 200 status code. And crawled all this list and to analyze the content of robots.txt.

46,5 mln. domains have robots.txt (38% from active ones).

6,28 mln. (5.2%) from them are entirely blocked from all the bots, which means they are disabled for indexation.

A distribution by domain zones:

com - 3,367 mln (4,36%)

ru - 595 ths (15%)

net - 353,6 ths (4.6%)

org - 257,8 ths (4.3%)

xyz - 180,7 ths (16.5%)

xn--p1ai- 140,1 ths (26,8%)

top - 123,4 ths (4.6%)

xxx - 81,5 ths (83.8%)

biz - 62,9 ths (3%)

info -58,7 ths (3.3%)

Meanwhile 3,1 mln (2.5%) are blocked from all the bots, but enabled for Google indexation. Consequently all the tools might have an incorrect information about them.

There is also a distinct category of websites that consists of ones, which are blocked from the bots selectively: only from ahrefs - 661 ths, semrush - 256 ths, majestic - 2.7 mln.

Let’s consider the blockings by every service in the context of domain zones:

Ahrefs:

com - 299 ths (0.39%)

vip - 56,8 ths (19.3%)

ru - 51,5 ths (1.35%)

net - 48,8 ths (0.62%)

info- 29,4 ths (1.68%)

org - 21,6 ths (0.37%)

accountant - 19,2 ths (25.2%)

Semrush:

com - 168,5 ths (0.22%)

net - 32 ths (0.40%)

org - 5,2 ths (0.09%)

de - 4,6 ths (0.21%)

ru - 4,4 ths (0.12%)

info- 3,9 ths (0.22%)

pl - 3,3 ths (0.57%)

Majestic:

com - 2 mln (2,58%)

net - 143,8 ths (1,81%)

org - 79 ths (1,34%)

vip - 65,3 ths (22,18%)

realty - 53,7 ths (98,51%)

info - 40 ths (2,28%)

xyz - 29,8 ths (2,73%)

We were really surprised that the position of a “ru” domain zone is in third place in ahrefs blocking list. We wondered why web masters dislike this tool so much.

Digging a little bit deeper there was a “small” satellite net found that consisted of 34 ths really well done domains of all kinds of subjects. I would like to shake hands of its creators. Guys have conducted a large work – different patterns, ip, scripts. But in such commercial scales it is hard to make everything unique and it is not very difficult to find out similarities either human or search bot at the same time.

It is expected that majestic as the oldest service is the most blocked one.

Conclusions:

These data show that the average web site doesn’t care a lot of blocking ahrefs, semrush and majestic. While preparing the article we analyzed hundreds of web sites and we can say that if some of them try to block themselves from these bots there is a high probability that it’s for a concrete reason. Within the creating of robots and ip subnet patterns there are networks of all kinds of entertaining subjects found.

This still leaves open the question of how many web sites block bots by User Agent, but not in robots.txt. 

We love working with huge data and making interesting investigations. Stay tuned with jetoctopus.com

要查看或添加评论,请登录

Julia Nesterets的更多文章

社区洞察

其他会员也浏览了