Google PageRank Algorithm
Warning This is an advanced-level topic, mathematics and probability are required to understand the Google PageRank algorithm.
The Pages are ranked by Google as the probability of being at all times
The concept is quite simple, spouse we have this network of pages:
Network:
The image above represents the internal links between pages
In our reality, this is also the case, but with:
Something more like this:
For practical effects, we will focus on the Network image above with only 3 pages.
The Google PageRank algorithm would use a random surfer to explore all the pages and rank them as follows:
A is the page to rank
G is each page that links to A
There 2 possible ways for a random surfer to be on a particular page:
Given these definitions, the PageRank is defined as:
PR(A) = (1 - d) / N + d * Σ (PR(G) / L(G))
Where:
Important note: L(G) refers to the number of exit links that bring the surfer to various pages within the network including page A. These exit links are Commonly called BACKLINKS.
To start we must have a list of all pages and for each page, we must have a list of links on that page.
The image above(Network) does for us that, as long as we can visualize these interconnections is possible to start calculating the PR of A.
Notice that PR(G) must be a known value, so to solve this at the start all pages are taken and assigned to them the probability rank of 1 / N in Network this means one world where the event occurs out of the total number of possible worlds (All the pages in the Network)
So this represents an equal probability of landing on a page.
Therefore, the initial conditions are:
领英推荐
PR(page 1) = 1 / 3 ≈ 0.3333
PR(page 2) = 1 / 3 ≈ 0.3333
PR(page 3) = 1 / 3 ≈ 0.3333
Given these conditions, we can now calculate the PR for each page:
PR(A) = (1 - d) / N + d * Σ (PR(G) / L(G))
PR(page 1) = 1 - 0.85 / 3 + 0.85 * Σ [ (0.3333 / 2) ] = 0.1917
0.1917 is a new value that will be used on the next iteration for PR(page 1), moreover later.
PR(page 2) = 1 - 0.85 / 3 + 0.85 * Σ [ (0.3333 / 1) + (0.3333 / 2 ) ] = 0.5
0.5 will be used on the next iteration for PR(page 2)
The last term of the summation (0.3333 / 2 ) is the default rank for page 3 over the total number of exit links (backlinks) on page 3.
PR(page 3) = 1 - 0.85 / 3 + 0.85 * Σ [ (0.3333 / 1) ] = 0.3333
Notice that at this point all page ranks have been updated from previous page ranks thus:
PR(page 1) = 0.1917
PR(page 2) = 0.5
PR(page 3) = 0.3333
This process can now be repeated again, this is what in Computer Science we call an iterative algorithm
Tolerance:
Also if this process is done n times eventually the change between a set of previous PRs and a new set of PRs will be quite small, when this happens the Summation of the equation meets its convergence. This of course is a smart way to define the tolerance of the algorithm, We can program an algorithm that would stop calculating values only and only if the change between New PageRank values and Previous PageRank values is less than a certain value, a sober value would be something like 0.001.
At this point, the level tolerance has been met and the algorithm can stop iterating to rank pages. All pages have been ranked by their probabilities.
Just to be clear the change can be calculated as
Change = New PageRanks - Previous PageRanks
In conclusion:
This is how Google is right now ranking pages The given reasons in this article is the importance of understanding backlinks, especially well-made backlinks
Entrepreneur at Tiny Guard Mobile App
1 年Fantastic explanation, thanks for sharing it!