Turns out you can understand the Internet without reading the words.

Turns out you can understand the Internet without reading the words.

Creating a good search engine is never easy. Just ask Bing (previously Live.com, Previously MSN, Previously Microsoft Search, previously Looksmart... ). But it turns out that knowing what is ON the page is not always as important, oftentimes, as knowing how the pages are grouped together. There are some very powerful research papers that demonstrate this theory (https://maj.to/1yZOdW5 is about TrustRank Paper and https://maj.to/1Bb1lHu is a Stanford Paper), but actually a few good analogies suffice. Think about going into a good old fashioned library to find the mass of the Moon. All those pages of data... where do you start? The natural sciences section of course! Opening random pages would be absurd, but once you have the Astronomy subsection of the natural sciences, the chances are you'll find the answer in every third book on that shelf.

Majestic used this logic, combined with its crawl of three BILLION web pages a day, to categorize the entire internet and build a marketing search engine, without saving a word of online content. 3 Billion is a lot, by the way... Twitter only claims 500 Million unique tweets a day, to get a mental comparison.

It turns out that web pages links just like people do. If you are a mum with a child at school, there is a very strong chance that you will - within two degrees of separation - know every other mum through MULTIPLE paths. By following links on web pages at scale, Majestic has been able to work out what not only what every web page is about, but also how influential it is in that category.

How Did They Do That?

A traditional search engine has these four distinct steps:

1.Data Collection
2.Data Grouping
3.Data Indexing
4.Data Matching

But they have spent billions to achieve results. Majestic may have spent Millions, but how did they get so far on such limited resources?

1.Data Collection

Majestic has been able to be one of the top 10 largest Internet crawlers on the planet (beating Yandex and Baidu outside their home countries) by crawling differently. They crowd sourced the crawl!

2.Data Grouping

This is the magic. Whilst many engines tried to group data based entirely or in part on the on page content, Majestic looked at the links between content. As the image at the top of the page tries to demonstrate, if four pages are all authprities in the subject of (say) blue widgets, then the shaded page close to the authority pages is much more likely to be about blue widgets than the one spatially for away.

3.Data Indexing

Now the cost savings get extreme, because majestic does not need to index all the content. It already knows what a page is about and how influential it is. Majestic just saved billions...

4. Data Matching

By using the same principal of looking at links, Majestic has also been able to categorize keywords in a similar way. So when a user types in a search phrase, then Majestic can match the keyword to the pages.

Think it doesn't work? Well it has SOME legs... its search engine is only in alpha, (as Majestic's core business is the link intelligence database itself) but here's the results for the phrase "Credit Cards"... the results are not bad!

If you would like to get contacted when more research comes out, or get into the beta program, get yourself a free account now at Majestic and register for the beta testing program here.

ALE AGOSTINI ??

Where SEARCH Marketing MEETS Artificial Intelligence & Digital Sustainability ★★★★★

9 年

I guess the "new Reading" on the web is Scrolling :-)

回复
Dixon Jones ?

CEO. Board member. NED advisor. Startup veteran in the digital SAAS space. BA(Hons.). MBA. FRSA.

9 年

Depends where you start from Chris - but there are 10 types of people... Those that understand binary and those that don't. I don't... So I start at words.

回复
Chris Emmett

Murex Developer at ICBC Standard Bank Plc

9 年

There are words on t'ínternet?

要查看或添加评论,请登录

Dixon Jones ?的更多文章

  • Topical Content Planning is New (not "News")

    Topical Content Planning is New (not "News")

    Next week I head to Dallas for @InLinks to talk ,about my new buzzphrase, "Topical Authority Planning". This is the…

    2 条评论
  • Auditing your Knowledge Graph

    Auditing your Knowledge Graph

    If you are not a nerdy SEO, please go away..

    1 条评论
  • The US Search Awards are open for entries. Here is a list of Categories.

    The US Search Awards are open for entries. Here is a list of Categories.

    The US Search Awards are now in their 5th(?) year I think and they really are worth entering. Just getting on the…

  • How to Avoid a Loss of Perspective

    How to Avoid a Loss of Perspective

    Running an Internet business is much more difficult when you have money or time pressures. It is not always clear what…

    2 条评论
  • Express & Mirror SEO fight for "SEM"

    Express & Mirror SEO fight for "SEM"

    This is an interesting SEO phenomenon. Major national papers are fighting over a 24 hour SEO opportunitty every day to…

    1 条评论
  • Printing the Internet in 3D from Space

    Printing the Internet in 3D from Space

    Majestic.com – a specialist search engine and “Deloitte 50” fastest growing company in the UK - is “over the moon” to…

    9 条评论
  • Saving S.A.A.S. Users time, With Timely Alerts

    Saving S.A.A.S. Users time, With Timely Alerts

    I think Majestic has started 2016 well. In the first couple of weeks we created personalized dashboards for our users…

    1 条评论
  • Can Links Predict Elections?

    Can Links Predict Elections?

    Over at Majestic we occasionally play with out link data to make predictions about elections. We have looked at a…

    3 条评论
  • Getting Personal with your SAAS

    Getting Personal with your SAAS

    Yesterday Majestic launched a new dashboard feature. Running a Software as a Service is a great business model, but it…

    2 条评论
  • See the Top 10 Most Influential people for anything

    See the Top 10 Most Influential people for anything

    “Do Twitter engagements or follower numbers help you decide whether or not you would follow a certain profile? Wish…

    11 条评论

社区洞察

其他会员也浏览了