How does Google search works?

How does Google search works?

Whenever you search on Google, the search engine combs through the over 30 trillion webpages on the internet and finds the top 10 results for your question.?92% of the time, you’ll click on a result on the first page (that is, among the top 10 results).?Finding the top 10 things out of 30 trillion is really hard — it’s about as hard as trying to find a penny randomly dropped somewhere in New York City.?Yet Google does this expertly and in just half a second, on average.?But how?

Google doesn’t actually visit every page on the internet every time you ask it something. Google actually stores information about webpages in databases (tables of information, like in Excel), and it uses algorithms that read those databases to decide what to show you. Algorithms are just series of instructions — humans might have an “algorithm” to make a PB&J sandwich, while Google’s computers have algorithms to find webpages based on what you typed into the search bar.


Google’s algorithm starts by building a database of every webpage on the internet. Google uses programs called spiders to “crawl” over webpages until it’s found all of them (or at least, what Google thinks is all of them). The spiders start on a few webpages and add those to Google’s list of pages, called the “index.” Then, the spiders follow all the outgoing links on those pages and find a new set of pages, which they add to the index. Next, they follow all the links on?those?pages, and so on, until Google can’t find anything else.

This process is always ongoing; Google is always adding new pages to its index or updating pages when they change. The index is huge, weighing in at over 100 million gigabytes.?If you tried to fit that on one-terabyte external hard drives, you’d need 100,000 — which, if you stacked them up, would be around a mile high.

Word search

When you search Google, it grabs your query (the text you typed into the search bar) and looks through its index to find the webpages that are most relevant.

How might Google do this? The simplest way would be to just look for occurrences of a particular keyword, kind of like hitting Ctrl+F or Cmd+F to search a giant Word document. Indeed, this is how search engines in the 90’s used to work: they’d search for your query in their index and show the pages that had the most matches,?an attribute called keyword density.

This turned out to be pretty easy to game. If you searched for the candy bar Snickers, you’d imagine that would be the first hit. But if a search engine just counted the number of times the word “snickers” appeared on a webpage, anyone could make a random page that just said “snickers snickers snickers snickers” (and so on) and jump to the top of the search results. Clearly, that’s not very useful.


Instead of keyword density, Google’s core innovation is an algorithm called PageRank, which its founders Larry Page and Sergey Brin created for their PhD thesis in 1998.?Page and Brin noticed that you can estimate a webpage’s importance by looking at which other important pages link to them.?It’s like how, at a party, you know someone is popular when they’re surrounded by?other?popular people. PageRank gives each webpage a score that’s based on the PageRank scores of every other page that links to that page.?(The scores of?those?pages depend on the pages that link to them, and so on; this gets calculated with linear algebra.)

For instance, if we made a brand-new webpage about Abraham Lincoln, it would initially have very low PageRank. If some obscure blog added a link to our page, our page would get a small boost to its PageRank. PageRank cares more about the quality of incoming links rather than the quantity,?so even if dozens of obscure blogs linked to our page, we wouldn’t gain much. But if a New York Times article (which probably has a high PageRank) linked to our page, our page would get a huge PageRank boost.

Once Google finds all the pages in its index that mention your search query, it ranks them using several criteria, including PageRank.?Google has many other criteria as well: it considers how recently a webpage was updated, ignores websites that look spammy (like the “snickers snickers snickers snickers” site we mentioned earlier), considers your location (it could return the NFL if you search for “football” in the US, but the English Premier League if you search for “football” in England), and more.

Gaming Google?

There are pitfalls to PageRank, however. Much like spammers abused keyword density (as with “snickers snickers snickers snickers”), spammers have now started making “link farms,” or webpages that contain tons of unrelated links. Website owners can pay link farms to include a link to their webpages, which would artificially boost their PageRank. However, Google has gotten pretty good at catching and ignoring link farms.

There are some more mainstream ways to game Google, though. An entire industry, called search engine optimization (SEO), has sprung up to help website owners crack Google’s search algorithm and make sure their webpages appear at the top of Google searches.?The most basic form of SEO is getting more pages to link to your page. SEO includes plenty more techniques, such as putting the right keywords in your page’s title and headings or making all of your site’s pages link to each other.

Google’s search algorithm is always changing, though; Google rolls out minor updates over 500 times a year.?There are occasionally more major updates, and after each, SEO experts try to find ways to use the changes to get ahead. For instance, Google changed its algorithm in 2018 to favor websites that loaded faster on mobile devices, leading experts to suggest that website owners make stripped-down articles with a Google tool called Accelerated Mobile Pages, or AMP.


Mabast O. Hamadamin的更多文章

