Brother, Can You Infer a Link?
By Sam Sullivan
No one will be shocked to discover the internet has changed since 2010.
Of course, it’s often way less fun to both acknowledge and acquiesce technologically to this fact. But one of the simple truths embedded in this proposition is this: how we search for information has changed. How our queries are answered has changed. We still look for the same things we always did—we’re still human; we still want that recipe or the shoes or how to find tickets. But how we go about it, and the technology behind it, is evolving.
Quickly.
The vast majority of searches today are done over Google. More often than not, the process of Search Engine Optimization (SEO) is more akin to a process of Google Optimization. We hang on the dictums of John Mueller; we eagerly wait for edicts brought down from The Keyword to guide us where we want to go. Google has been a search industry leader for 20 years. The search engines that have followed, from Bing to DuckDuckGo, have optimized and grown based on their response to Google, and the public necessity of refining a search engine.
But Google shapes the conversation around search just the same. Their goals have always been to facilitate search the way a human does. They want to answer queries in real time, and to provide a string of answers in an efficient, reliable way, the way a person naturally recognizes needs in a logical order and provides information. “You’re taking a trip? Where to?” You say Hawaii. Your friend, who’s been there, then tells you everything you need to know. You don’t structure it as queries, refined for new information after each input. The information is volunteered and given, logically connected based on the conversation and the stated and inferred need of the asker.
Enter MUM.
MUM stands for Multitask Unified Model. Recently announced by Google, its goal is to help with complex needs and searches to ensure fewer searches. Google wants you to find information quicker and better. MUM is similar to BERT in that it is built on a Transformer architecture. It is, however, much more powerful. MUM both understands and generates language. It’s trained over 75 different languages and can process many tasks at once. MUM is also multimodal, meaning it can understand information across different mediums as varied as text and images, and eventually audio and video.
MUM understands the world in a complex, connected way, the way a human might. If you Google Hawaii, for instance, MUM will eventually understand contextually what else you’ll need if you’re going. It can help you with plane tickets, reservations, hotels, but also looser connected information such as when the tide is high, elevations of mountains you might want to hike, and boots that will help with your hike, or surfboards that will help you conquer the best waves during the periods of agreeable tide.
MUM also can translate data automatically. So if a source that would prove helpful is only available in Italian, it will take data from that source and translate it into your preferred language. Language barriers are erased in favor of pursuing more relevant sources for maximum efficiency. It’s possible that all the helpful information you will find is in a language you don’t speak.
But MUM is hardly alone. In Search Engine Optimization, links and backlinks form an essential backbone. The architecture of the web is built on, well, a web. A web of nodes and edges, or links and hyperlinks, forms the structure we’ve come to know, love, and sometimes hate deeply. When you click a link on a page, it’s a portal to another page; when you see a link in a blog connected to anchor text, you choose to follow the link if you want to go to the desired location. It’s a system we’ve learned to take for granted. It’s been this way since Tim Berners-Lee first labored in the lab in Geneva.
A backlink to a site connotes relevancy. Enough backlinks to a site connotes utility. Heavily trafficked pages confer value to other pages by linking to them. PageRank, the original Google algorithm, is built on this very idea. Why just look at pages when you can look at the links those pages link to? Or the ones linking to it? Google learned to follow the links. The mathematical principle underpinning this, for those so inclined, is eigenvector centrality, or the measure of the influence a node has on a network. More important sites confer more value to linked sites than less important sites do. Hence why the link-buying schemes of the 2000s were doomed to fail and are now mostly just an embarrassment: the links came from garbage sources. WebMD backlinks weren’t exactly for sale in those packages.
So despite the importance of links and the link graph in SEO, there might be a coming paradigm shift: inferred links.
This is far from guaranteed, and I doubt (heavily doubt) the backlink and hyperlink with anchor text will ever lose its value or its status. But with Google becoming more reliant on deep learning and complex connections between text using Natural Language Processing, it’s a fair assumption that something like this is not only possible, but eventually will be a factor going forward.
An inferred link is precisely that: a place where you infer a link would be. Rather than direct anchor text or a hyperlink, the search engine would confer the equivalent of backlink value based on useful words in the text themselves, and their relation to other words involved in relevant queries. Rather than blue text or links shoved into blogs like ads into television shows, the inferred link is natural, less invasive. You don’t even notice it, and you process it the way you process other information, connecting it to relevant ideas and concepts and needs without being prompted. It would certainly make the reading experience easier online, and more a natural fit. Search Engine Optimization came from a library discipline called information retrieval. How appropriate to return to its roots, like a book: noninvasive narrative.
While it’s likely Google will or already does factor such things into its crawling and indexing, the proposition of making it the dominant or only form of backlink recognition has some real challenges. For one, the web is built on an absolute mountain of collected data with relational terms and concepts. The idea of connecting all of these (sort of like the semantic web concept) based only on the inferred connections between words leads to flat-out entropy, a web of organization so organized it ceases to be discerning and thus useful.
Theoretically, every word is connected to every other word in the same language; links can take us anywhere, but we choose the backlinks we want to put into text because of their chosen relevancy to the subject matter. A powerful engine left to connect links where it feels they should be would lead to a potential minefield of random connections across sites. If everything is relevant, then nothing is. It’s also left to its own devices to determine the nature and context of a link, and what will be most helpful to a potential searcher.
Most sites and links are built on the idea of exclusion, or scarcity. If a large node site links to every site, what good is its word? Why would Google bother following its links anymore? Even in our information age, scarcity has value. Links from large, reputable sites are prized because they are specific and exclude 99.9% of links possible on the internet. Who we choose to link to, and the text we use to link to them, has direct value.
There is a distinct lack of editorial oversight involved in inferred links. While it can, at least in theory, provide a more naturalistic endorsement for a product or a service, ultimately backlink authority is left to Google to literally infer. Publishers, for example, such as websites and bloggers, can input a backlink to a source while also tagging them with nofollows, user-generated content, or a sponsored link notification. Google might well be smart enough to recognize the difference between endorsements and not, but at the moment, at least for the user, anchor text and URL controls help determine what is endorsed positively and negatively.
It’s certainly true that conventional hyperlinked text can be obtrusive. Random blue and underlined words sticking out of text, painfully shoehorned in text meant to lead to a link. We’ve all see the classic CLICK HERE, and the many admonitions against using things like CLICK HERE because the chosen words are too general and don’t convey meaning beyond the action request. So anchor text is a natural solution. A good call is to write optimized text, and use structured keywords with a low keyword density (KD) that fulfills the job without stuffing. Once the text is written, then searching for natural points of anchor text for a backlink can make the experience more natural and less invasive.
That’s the real benefit of the inferred link: it takes the latter part of that process out. Keywords are still used, and keyword clusters are still employed to best target larger groups and make the text more simultaneously relevant. The text just doesn’t have a flurry of blue or red and is written more like a conventional book or paper. It’s a smoother medium, and more artistically satisfying for many sites. It mimics, like MUM, the way a human talks and disseminates information. It’s almost conversational. It also isn’t at all what we’re used to on the web.
It could be argued that inferred links give more advantage to large sites, or organizations with more natural search traffic as a result of their size. More words, more keywords, more clusters, more inferred links. Again, eigenvector centrality rears its head and gives the weight to the node. This might mean that link building is harder for smaller sites and organizations because a backlink can be anchored in text, whereas an inferred link would require the brand name and mention in the text itself, or a direct association. This would kill our aforementioned low-key, naturalistic text advantage.
That being said, you should strive for brand mentions regardless. There is no greater endorsement than a narrative or editorial content on a site that mentions your brand, and discusses it in a meaningful way. A blog about a business and its benefits is likely still better than some middling anchor text and a link, if your goal is traffic and user value. A paragraph of black will beat a line of blue if the association is positive enough or the information around it relevant enough, and on a site with higher node value to the rest of the network.
Ultimately, the idea forming the internet holds true: it’s all connected. Inferred links are inferred both in the minds of readers and, potentially, in search engine indexers. You can’t go wrong with writing optimized, useful, relevant, quality content that satisfies a searcher’s needs. Backlinks remain the norm, and I suspect they will for quite some time. It’s a convention we’re used to. We understand the etiquette and the methodology. We’re aware that a link isn’t always endorsement, but comes with caveats usually discernible from the anchor text.
Narrative blogs or text filled with inferred links have visible value, and writing with the assumption that inferred links exist is a good call. An inferred link, importantly, won’t negate the value of an actual blue hyperlink. It will simply provide more possible contextual data for deep learning that could potentially help searchers find what they want. Like with MUM, inferred links are all around us; the connections between words and phrases is readily apparent to anyone who’s ever opened a thesaurus or tried to create a Venn diagram. Words and concepts are frequent associates. They run together in packs, after all.
Web and word topology is ever expanding. Our map doesn’t have the same borders it used to. We’re used to the web changing, from version 1.0 all the way to 3.0, where we sit today. Deep learning is changing the game, and Google is evolving as fast as possible. Ironically, despite the influence of machine learning, the query traits mimicked are ever more human. We want our machines to function like humans, but a perfected form. We want our queries answered like we’re talking to a person, not a machine, despite the fact we know intuitively no human could ever provide us with the wealth of information the way a machine does.
Between inferred linking and MUM, we have our work cut out for us. The challenge is in understanding, deeply, how words and their concepts are related, and how searchers process information and store it. I suspect advanced categorization will continue to lead us to a better understanding of searching and, conversely, searchers. Google does this daily; we’d do well to keep up with it.