HUGE Google Search document leak reveals inner workings of ranking algorithm
Raj Nandan (Ranky SEO)
SEO Analyst | Helping Businesses Boost Rankings & Traffic with Proven SEO Strategies | SEO Specialist with Expertise in On-Page, Off-Page & Technical SEO | SEO Solutions for Business Success
The documents reveal how Google Search is using, or has used, clicks, links, content, entities, Chrome data and more for ranking.
A trove of leaked Google documents has given us an unprecedented look inside Google Search and revealed some of the most important elements Google uses to rank content.
What happened. Thousands of documents, which appear to come from Google’s internal Content API Warehouse, were released March 13 on Github by an automated bot called yoshi-code-bot. These documents were shared with Rand Fishkin, SparkToro co-founder, earlier this month.
Why we care. We have been given a glimpse into how Google’s ranking algorithm may work, which is invaluable for SEOs who can understand what it all means. In 2023, we got an unprecedented look at Yandex Search ranking factors via a leak, which was one of the biggest stories of that year.
This Google document leak? It will likely be one of the biggest stories in the history of SEO and Google Search.
What’s inside. Here’s what we know about the internal documents, thanks to Fishkin and King:
Links matter. Shocking, I know. Link diversity and relevance remain key, the documents show. And PageRank is still very much alive within Google’s ranking features. PageRank for a website’s homepage is considered for every document.
Successful clicks matter. This should not be a shocker, but if you want to rank well, you need to keep creating great content and user experiences, based on the documents. Google uses a variety of measurements, including?badClicks, goodClicks, lastLongestClicks and unsquashedClicks.
Also, longer documents may get truncated, while shorter content gets a score (from 0-512) based on originality. Scores are also given to Your Money Your Life content, like health and news.
What does it all mean? According to King:
Documents and testimony from the U.S. vs. Google antitrust trial confirmed that Google uses clicks in ranking – especially with its Navboost system, “one of the important signals” Google uses for ranking. See more from our coverage:
Brand matters. Fishkin’s big takeaway? Brand matters more than anything else:
领英推荐
Entities matter. Authorship lives. Google stores author information associated with content and tries to determine whether an entity is the author of the document.
SiteAuthority: Google uses something called “siteAuthority”.
Chrome data. A module called ChromeInTotal indicates that Google uses data from its Chrome browser for ranking.
Whitelists. A couple of modules indicate Google whitelist certain domains related to elections and COVID – isElectionAuthority and isCovidLocalAuthority. Though we’ve long known Google (and Bing) have “exception lists” when “specific algorithms inadvertently impact websites.”
Small sites. Another feature is smallPersonalSite – for a small personal site or blog. King speculated that Google could boost or demote such sites via a Twiddler. However, that remains an open question. Again, we don’t know for certain how much these features are weighted.
Other interesting findings. According to Google’s internal documents:
The articles.
Update, May 29. Google provided a statement to Search Engine Land. Read our follow-up: Google responds to leak: Documentation lacks context.
Update, May 30. King has written a follow-up article for Search Engine Land:
Quick clarification. There is some dispute as to whether these documents were “leaked” or “discovered.” I’ve been told it’s likely the internal documents were accidentally included in a code review and pushed live from Google’s internal code base, where they were then discovered.
The source. Erfan Azimi, CEO and director of SEO for digital marketing agency EA Eagle Digital, posted a video, claiming responsibility for sharing the documents with Fishkin. Azimi is not employed by Google.