Site Quality Score - Google Search Patent US9760641B1
Kyumars Dadelahi "Thinker And A Doer", INTJ-A
SEM Practice Lead @ Glassroom | SEM Supervisor, Senior Manager
Patent ID: US9760641B1
Inventor: Navneet Panda, April R. Lehman
Current Assignee: Google LLC
Application Granted: September 12, 2017
Expiration Adjusted: September 08, 2032
Introduction:
This patent describes and explains several implementations of a site quality score metric.
The site quality score measures the ratio between reference queries (numerator) and queries where the listing of a particular site on the google SERP was selected (clicked on).
In search marketing lingo, a reference query is a “branded” search.
Specifically, we’re looking at two sets of queries S and U to form the site quality score ratio S/U.
S is defined as follows:
The count of unique queries received by a search engine that are categorized as referring to a particular website. We refer to these queries as “site queries” implying that the association between the query and a particular website is contained in the query.
Just pointing out the obvious here:
It’s a unique count of queries (how many different branded queries exist for the site?)
No user-SERP interaction is measured, only the number of unique reference query submissions to a search engine are counted.
U is defined as follows:
The count of unique queries, where the query is associated with a particular website if
the user selected a listing of a particular website on the SERP (click) AND
i) The selected listing was in direct response to the query
ii)The click identifies a resource (page) in that specific web site
How are queries categorized as reference queries?
A query is categorized as a reference query if the query includes
a site label?
[Example from the patent: “san francisco site:www.domain.com”]
a term that has been determined to be a reference to a particular website
[Excerpt from the patent: “if the search system has data indicating that the terms “domain sf” and “dsf” are commonly used by users to refer to a site “sf.domain.com,” queries that contain the terms “domain sf” or “dsf”, e.g., the queries “domain sf news” and “dsf restaurant reviews,” can be counted as queries that refer to the site “sf.domain.com.“]
a term that has been determined to be a navigational query to the particular website
[Excerpt from the patent: “a search system can determine that a query is a navigational query to a particular site when a search result linked to the particular site has received at least a threshold percentage of the user selections (clicks) that were received for all search results that are responsive to the query.”]
How are queries counted?
The patent clearly indicates that the query counts are unique. But there’s more than one way to define a unique query count.
Unique query count ignoring term order
If the queries in question are made up of the exact same terms but in different queries the order of the terms is different then they are considered equivalent and a representative query will be chosen to represent them, counting as 1 unique query.
Unique query count ignoring term order and site: label
Same as the first scenario but now queries that include the term “site:” are also included in this query class.?
Unique query count respecting term order
In another implementation, term order in the queries matters. This implies that for a given set of queries that are all made up of the same terms but in different order, each query would fall into a class on its own. This would produce as many query classes as there are different queries.
Unique query count respecting term order but disregarding site: label
Same as above but now queries containing the term “site:” are mapped to the query that is identical to the query with the “site:” term removed.
In short, these are four different implementation methods on how queries may be counted “uniquely”.
How is site quality score calculated?
There are 6 implementations mentioned in the patent.
Here S and U both represent a unique count of queries as defined previously.
T is a threshold value that could be set to an arbitrary natural number (say 30).
L is a lower bound of unique queries (say 0)
n is an exponent in the range (0,1). It’s purpose would be to dampen the effect of U on the site quality score
B is a base value greater than zero, to ensure that the denominator remains positive in all cases. It is very similar in function to T.
For the sake of examples that will follow, I’ve chosen the following values for the variables:
S = 1,031
U = 56,802
T = 30
L = 0 (this value makes the most sense)
n = 0.5
B = 1,500
Here are the 6 formulae mentioned in the Patent.
Implementation 1:
S/U
This is the most simple form of site quality score.?
S / U = 1,031 / 56,802 ~ 1.81%
领英推荐
Implementation 2:
(S - T)/U
In this version of site quality score we only add T, a threshold value as an additional variable. The purpose of introducing T is to ensure that a new site with only few branded queries will receive a 0 in the numerator until its unique branded query count grows beyond the threshold value.
As for our example: (S - T)/U = (1,031 - 30)/56,802 = 1,001/56,802 ~ 1.76%
As expected there’s no significant change in site quality score with a small threshold value of 30.
However, the larger the threshold value gets the more likely it is that a site will receive a negative numerator (S - T). Therefore, we have?
Implementation 3:
Max(L; S - T)/U
Which takes care of this dilemma by imposing a lower bound L on (S - T). In this example we set the lower bound to 0.?
Max(L; S - T)/U = Max(0; 1,001)/56,802 = 1,001/56,802 ~1.76%
Same number as in the previous implementation. But in case T had a value of 2,048 this would have looked differently because S - T = 1,031 - 2,048 = -1,017 which wouldn’t be ideal since we would like the numerator of the site quality score to take a positive value in general.
Since the Max function simply returns the maximum of the set of numeric inputs, it would return 0 as its output ensuring the numerator remains positive.
Max(L; S - T)/U = Max(0; -1,017)/56,802 = 0/56,802 = 0
That’s clearly not a great score but keeps the frame more simplistic and functional.
For implementation 4, 5, and 6 note that the challenges to overcome in the numerator are the same as for implementations 1, 2, and 3. Hence, I will focus mainly on the denominator of site quality score. Ultimately, I believe that implementation 6 is the most likely implementation because it addresses all possible scenarios.
Implementation 4:
S/U^n
Comparing implementation 1 and 4 you immediately recognize that U in the denominator now carries an exponent n (I was not able to show that correctly due to the lack of mathematical symbols), which generally takes a value between 0 and 1.?
For the next example I have chosen n = 0.5, which is the same thing as taking the square root of U. Because U is large, raising it to the exponent n = 0.5 will reduce its value and hence produce the desired dampening effect.
U^n = 56,802^0.5 ~ 238.33
Note that in this case, the site quality score S/U^n ~ 1,031/238.33 ~ 432.59%
If we want to keep the site quality score below 100% we would have to choose a different value for n.
For n = 0.75, U^n = 56,802^0.75 ~ 3,679.36
In this scenario the site quality score would be S/U^n = 1,031/56,802^0.75
which would yield ~ 1031/3,679.36 ~ 28.02%
Implementation 5:
S/(B + U)^n
Almost done…we’re only introducing B as a new variable which is a base value we’re adding to U in case U is rather small or zero.
S/(B + U)^n = 1,031/(1,500 + 56,802)^0.5 = 1,031/58,302^0.5 ~ 1,031/241.4580 ~ 426.98% suggesting again that we may want to adjust the value of B to be significantly larger.
Let B =1,048,576, then S/(B + U)^n =1,031/(1,048,576 + 56,802)^0.5 =1,031/1,105,378^0.5
Which yields approximately 1,031/1,051.3695 ~ 98,06%.?
Implementation 6:
Max(L; S - T)/(B + U)^n
Comparing implementation 6 to 3, will explain what happens in the numerator and the numerator is the same as in the previous implementation. Therefore, we can go straight to an example.
Max(L; S - T)/(B + U)^n
= Max(0; 1,031 - 30)/(1,500 + 56,802)^0.5
= 1,001/58,302^0.5
~ 414.56%
Suppose we do want to keep site quality score below a 100% so it’s properly normalized, we need to modify our parameters, but by now we know some good candidate values.
Let’s modify the parameter n to 0.75…
Max(L; S - T)/(B +U)^n
= Max(0; 1,031 - 30)/(1,500 + 56,802)^0.75
= 1,001/58,302^0.75
~ 26.67%
How is site quality score used?
One intuitive way to use a site quality score would be to use it as a term in a page level quality score (resource level).
Here’s the description mentioned in the patent…”For example, a site quality score for the site “https://www.domain.com” can be used as a term in the computation of a score for a resource “https://www.domain.com/resource.html” that is in the site.”
Moreover, the site quality score system can be configured to treat various collections of pages as a “website”.
For example, this could happen on the server level, domain level, subdomain or subdirectory level.
Also, the patent makes it clear that these choices do not have to be mutually exclusive.
This suggests the idea that site quality score could be really calculated for a sub-collection of resources in a website, even individual pages. It seems that all you need is a unique count of branded queries to the resource(s) and all queries associated with the particular website/resource(s).?
As previously mentioned, the query and website/resource are associated via a click and the identification of a resource on that website. However, there are a few more nuances mentioned about that click that are worthwhile keeping in mind.
According to the patent, the system can be configured such that the following data can be collected with regards to clicks:
All of just one of the above could be used as a definition for user selection by the system. This would primarily affect the denominator of site quality score.
Closing Remarks
Site quality score is interesting since intuitively as search marketers we know that branded query growth is what builds a brand.
It should be fairly straight forward to approximate site quality score with your search engine data to have a measure in place that informs you on the brand growth of a website that can directionally inform on the quality of your content.
It is also not clear (at least not from the patent) if site quality score is something that would apply only to organic listings or possibly Ads as well. Since it is a site level quality measure it would intuitively make sense to consolidate all user behaviour data to be quantified holistically.
The patent was filed in 2012 and granted in 2015 and has an (extended) expiration date of 2032. There’s no explicit evidence that site quality score or a variation of it is actually being used.