Search Reimagined: AI, Information Retrieval, Behavioral Changes & Future Cyber Threats
An "AI-based search engine" would be a contradiction in terms. But are there significant upcoming developments in the Search industry?
Will SearchGPT indirectly challenge Google's dominance in the search engine market? If so, how could it be the beginning of a new phase that will reshape how humans interact with information online?
Google's Moments of Fear
Since OpenAI launched ChatGPT on Nov 30, 2022, Google has tried to adapt very clumsily. After an intelligent initial move, Alphabet, Google's parent company, made one mistake after another. Strategic errors that ended up destroying entire companies unfairly targeted by Core Algorithmic Updates for no reason whatsoever. Here's one of the many stories. Julien is a CEO who lost his company hours ago. This is the story of employees losing their jobs because of Google's carelessness. What follows are the words of someone who built a profitable business that died without Google's organic traffic.
To my knowledge, the bankruptcy papers were signed a few hours ago but here's what the CEO posted 2 days ago, along with their Google Search Console's data.
In red, my comment highlights what happens once Google updates its "Core Update".
Some criticized this company, insisting it never found a profitable product or service. Others blamed their blog and posted screenshots of the credit card section of their blog. I just visited the website. I looked at the blog, and I used the credit card filter.
Yes, some articles in this competitive financial niche section look like they were created to capture organic traffic. But many legitimate companies did this. They didn't cheat, they just tried to capture more traffic without deeply understanding the user's search intent.
Here's what their blog really looks like without the filters.
That's a regular company blog with tons of articles, some high quality, some lower quality. But content creation is challenging and costly. Let's not be too harsh and instead of making fun of an online company, consider this: if they had done anything worth the penalties they suffered, Google would not have to reverse course and update their Core Algorithm Update. And the organic traffic would not be back to normal after 11 months!
Here is what the CEO posted a few days ago:
I did everything I could to reduce expenses starting in September 2023, when a Google update caused Hardbacon’s traffic to plummet. I let go of our employees one by one until I had to let go of the last two employees at the beginning of the month.
Despite extensive SEO and content optimization work, our traffic continued to decline, and each Google update since September has accelerated our traffic loss, without any clear explanation.
In total, we lost 97% of our traffic from Google. This means our traffic went from 350,000 per month in September to around 50,000 per month at the time of writing.
Elie's remark: 97% of 350,000 is 339500 which means you'd have about 10,500 per month, not 50,000. But let's continue reading because this does not really matter.
Thousands of websites were impacted, but sites with affiliate links like Hardbacon were particularly hit, obviously because we are direct competitors to Google Ads. Indeed, publishers like us allow their clients to run online advertising campaigns knowing exactly what their return on investment is.
Of course, the fact that Google lost its monopoly abuse case in the United States and that many similar cases are pending suggests that Google might lose ground due to its abusive behavior.
These were the CEO's own words. They shed light on how Google might have intentionally penalized perfectly legitimate websites, which could have been a direct or indirect competitor to Google Ads by ranking organically affiliate content in SERPs.
For over a decade, we've known that users trust organic results more than they trust Ads. And it makes sense when you think about it: you can pay to become instantly visible via Ads but you cannot pay any search engine to instantly get great SEO. Even technically unsophisticated users know that you have to deserve organic rankings!
Now, if you think I'm being too harsh with Google, here are two Barrys posting about this story as well. Yes, Google is punishing "perfectly legitimate businesses."
Search GPT will soon be released to the general public and I absolutely expect Google to make new mistakes in the near future.
But Google's first move after the release of ChatGPT made a lot of sense: they updated their search engine content guidelines and immediately went from E-A-T to E-E-A-T:
Experience,
Expertise,
Authoritativeness,
Trustworthiness.
Google has a team of people called Quality Raters who are paid to check to see if search results are helpful and "trustworthy". They also look at how well the information answers questions and whether the source is reliable and EXPERIENCED.
How qualified are they to fact-check anybody else? Nobody really knows...
That said, immediately after ChatGPT was released, Google requested content creators demonstrate Experience whenever possible, because that's something AI is incapable of.
Hence the additional "E" in the E-E-A-T acronym now known by the SEO community. This first move was followed by a disaster in terms of SERP quality. And as you just saw, rankings are much more than vanity metrics: without decent organic traffic, many companies will fail.
Google panicked and rushed SGE (Search Generative Experience) in 2023 after witnessing the multibillion dollar deal between Open AI and Microsoft, which operates Bing.
The Bing search engine owned by Microsoft is not popular compared to Google but it does have interesting features and a lot of talented people working behind the scenes.
For Google, SGE became "AIO" for AI Overviews in 2024 and many Google users immediately hated them. And there are good reasons as you'll soon see.
The Search Generative Experience became a reality because AI Overviews are now an integral part of Google Search.?This means the only way to block your content from being used in AI overviews is to block the Googlebot family of search engine crawlers, as I explained in this technical SEO post.
Forcing users and artificially pushing them in a given direction against their will never worked. Google should have learned after their failed Google+ experiment.
Google+ was supposed to become Google's answer to rival social networks. The result? You never think about Google+. As if it ever existed! Why? Because users don’t like to be forced!
Another example: Meta's Threads which was launched on July 5, 2023, after it was built by the Instagram team. The novelty phase wore off extremely quickly and despite aggressive tactics to coerce users, we can see in Google Trends how people refused adoption.
At times, I wonder how well global companies understand basic human psychology. Why would they attempt to force people knowing most users will say "Thanks but NOPE"?
With regards to accuracy and usefulness, Google's AI overviews have been deeply disappointing: they not only offer little to no added value, they are often misleading. See what Google's Artificial Stupidity recommends in cooking recipes... in 2024!
Isn't it interesting how in a world governed by so-called "fact-checking", one of the most influential companies can ship an inaccurate, unfinished and misleading product?
Objectively, in the last 24 months, the quality of Search Engine Result Pages (SERPs) decreased beyond anything I could have predicted. That was in part caused by abusive low-value programmatic SEO tactics that were suddenly accessible to anyone thanks to AI-generated content programmatically injected into websites via APIs.
The SEO industry is full of people who love to experiment but it also has "cheaters" who use black-hat SEO techniques that can fool Google for a time. This wasn't even black-hat and to me it was obvious Google would have to take action. This is something I predicted in early 2023 in the following Linkedin video that almost nobody watched (PS: I had no voice left).
Nearly 1 year after my video, Google finally addressed the problem with their Core Algorithm Updates in March 2024. Problem solved? Not really, as you'll soon understand!
Officially, Google decided to actively fight low-value AI-generated content that has been polluting SERPs since December 2022. But at the same time, Google decided to artificially boost the rankings of some popular web forums that are not only full of cognitive, intellectual and political biases but also populated with AI-generated content that Google was officially trying to fight. Interesting and quite paradoxical, isn't it?
Guess what some did to abuse the system? They moved to those "forums" and managed to artificially rank whatever low-value content they wanted. Who could have imagined?
Google's biases
Let's be honest: Google has its own biases and it arbitrarily favors some "trusted" entities in its organic search results. Google processes about 8.5 billion searches daily. I'll let you imagine how minor tweaks in the SERPs can impact how people think and behave...
That's why an alternative and less biased search or "research engine" has been a topic of interest in recent years. Many entrepreneurs, including Elon Musk, have been playing with this idea. Let's explore how traditional search engines work.
Search engines like Google rely heavily on document retrieval techniques.
When you search for something, you input a query. The search engine then retrieves a list of documents or web pages that are relevant to the query based on keywords, metadata and other signals. This content was previously discovered, crawled and indexed by Googlebot.
Google also analyzes users' behavioral patterns. Google looks at how users search and experience things online, in part thanks to Chrome, the world's most popular web browser.
But to summarize, you input a search query using keywords and the output is a list of previously indexed pages, ranked in a specific arbitrary order determined by Google.
Thanks to their Chrome browser, Google also collects high-quality data that ends up in their Chrome User Experience Report (CrUX). Many simply equate CrUX with Core Web Vitals.
And yes, the CrUX dataset is full of anonymized aggregated user experience data. But outside CrUX, Google collects much more than what is shared publicly. And unlike public datasets, everything is clearly visible on Google's end. Imagine IPs, Operating Systems, devices, search queries, websites visited, when people connect, how long they stay on this website, how quickly they bounce off this other website, what their daily habits are, etc...
Most people outside the cybersecurity industry do not understand how fingerprinting can be used to identify specific users or behaviors. Here's my quick breakdown of how this is used to track individual users or devices based on unique characteristics or patterns. Some will disagree and that's okay: feel free to add your thoughts if you believe I'm wrong.
Network Fingerprinting
Behavioral Fingerprinting
Now, imagine the power of combining different types of fingerprinting when you are the leading search engine, Google, which is essentially the gateway to the internet. And when you own the world's most popular web browser: Chrome. Oh, and when you also own Google Analytics, the most popular solution to track and analyze web traffic!
That means that even if you are not using Google's search engine and even if you are not using the Chrome browser, Google's web analytics service can still track you in some way because GA is installed on 75+ million websites... some with a tremendous amount of pages.
We've seen the theory today but I believe behavioral fingerprinting will completely change in the coming years once multi-modal conversational AI models will use facial recognition and Voice Emotion Recognition (VER) efficiently. In this short video, I simplified multimodal large language models (LLMs).
For those still unaware, Voice Emotion Recognition is used to identify emotional changes in a human voice by analyzing various acoustic features of speech, such as pitch, intensity and tempo to "understand" our underlying emotional state.
This technology is tested by some in the security field to attempt to detect deception or stress in voice-based authentication systems. It is also used in the gaming industry because some games try to enhance the user experience. With the help of well-paid neuroscientists, gaming platforms attempt to adapt the game to the player's emotional state to keep them engaged for as long as possible.
Those who play action-packed video games have first-hand experience of how quickly and violently our emotional state can fluctuate during intense virtual situations. A few giants in the customer service industry also try to assess customer satisfaction using VER.
Keep in mind that your smartphones and laptops are all equipped with microphones and built-in cameras. The technology is already in place from a hardware standpoint. The change will happen at the software level. I hope what I'm predicting here is wrong and that legal safeguards will be put in place to protect the users' privacy. But do I believe it? No.
领英推荐
What are X and xAI up to?
The X platform could soon surprise everyone with its own search engine designed to become a Google competitor and of course generate billions through targeted advertising.
But what people do not fully realize is that it is likely that X (or xAI) is using a tremendous amount of user-generated written data to train their AI models. I believe X is using Spaces, which are essentially targeted vocal chat rooms on X where users feel safe enough to share their opinions, their knowledge, their frustrations, etc. None of this will be crawled or indexed. Well, at least not in the traditional sense. But I'm sure the data is being collected.
I believe X is already using vocal recognition and voice-to-text models to convert the vocal chats into text that will be used to train their upcoming AI models. That's just me: I don't have a source, and I could be completely wrong. But... why wouldn't they?
Google's search engine monopoly has never been so ripe for the picking. They've made mistakes that could have sunk any other company. But Alphabet is a group with a market capitalization of 2.03 trillion or 2,000+ billion US dollars as of today (Aug 23, 2024).
And yes, Google has 20+ years of experience in crawling, indexing and ranking content. This alone gives them a solid competitive advantage. The search giant won't go down without a fight. Google, with its vast resources and established infrastructure, will likely respond to any future challenge. Instead of aggressively pushing artificial changes that nobody wants, it may be time to try to understand human psychology.
Google became a global company thanks to sponsored advertising targeting the users' search intent. The whole business model is built on top of organic search results. If the quality of those organic results goes down, people will look for credible alternatives.
Google's search advertising revenue totaled $237.86 billion in 2023, up 6% from 2022.
Advertising accounts for the majority of Google's revenue of 305.63 billion USD in 2023. Without organic search results, Google's cash cow would be sick. Google's search monopoly appears to be safe, at least for now, because while they kept printing money, ChatGPT alone costs between 700k and 1 million dollars per day, with most of the traffic coming from underdeveloped countries. This gives OpenAI little hope of any return on investment. OpenAI has been losing billions of dollars, and they expect a 5 billion loss in 2024. They are not exactly playing in the same financial category as Google/Alphabet. And they might run out of money if this continues.
In tech, users are quick to adopt new technologies and new behaviors. The competition is always one click away. Initially, search analysts believed that AI chatbots such as ChatGPT would completely destroy informational search queries: what users search for when they want to find an answer to a specific question. This started to happen but nothing compared to what some imagined. So yes, Google still appears quite safe for now but they should not be too sure of their "superiority". Things can change fast in Tech.
Looking at the cold hard data, I can see that Google kept its leading position and only lost a fraction of its market share. I see nothing that will scare shareholders. For now, that is.
Senior SEOs in my network display diverging behaviors. Some say that their use of Google is down "50%" since they started using AI models while others haven't changed their habits.
At the early stage, it appeared that SearchGPT was primarily focused on leveraging its large language models (LLMs) without incorporating Retrieval Augmented Generation (RAG).
I believe OpenAI's final approach will involve a hybrid model, where RAG will be used to analyze crawled and indexed data that isn't yet part of the LLM's dataset OR for certain types of queries or as a backup mechanism. Nothing is set in stone online and what people can see when using the prototype isn't the final product. We'll likely be surprised by the power of advanced multi-modal LLMs combined with real-time access to data. This will likely outsmart Google's current approach with their AI overviews.
I was not given access to SearchGPT. It appeared I joined the waiting list a little too late.
Thanks to people who have access, I've been shown details, the interface, the results, etc.
I don't have any insider knowledge but I suspect Open AI will use a cluster of crawlers to retrieve data and that a short thumbnail will be displayed to users to give them the illusion of real-time processing while the RAG system generates comprehensive "answers".
OpenAI wrote that:
OAI-SearchBot is for search. OAI-SearchBot is used to link to and surface websites in search results in the SearchGPT prototype. It is not used to crawl content to train OpenAI’s generative AI foundation models.
Regarding the User Agents, I could publish a spicy article explaining why publishers can try to block OpenAI's official web crawlers without ever fully blocking the crawlers they use :)
So, will SearchGPT change how people search and research? If Open AI survives, it could happen over several years. For now, after the many ChatGPT hallucinations, it would be a good move to appear as credible as possible. If SearchGPT achieves this, it will play in a different category than Perplexity, which remains an answering engine.
Research engine VS search engine VS answer engine
Before we go any further, I want to clarify what RAG or Retrieval Augmented Generation is. The core idea is to combine the power of traditional search engines with the capabilities of powerful generative AI models.
RAG or Retrieval Augmented Generation works in 3 phases.
A. Retrieval
When a user submits a query, a traditional search engine such as Google retrieves relevant documents from a large corpus of text. These documents could be articles, web pages, books, or any other type of textual content (such as YouTube video subtitles).
B. Augmentation
The retrieved documents are then processed and transformed into a different format. This might involve extracting key information, summarizing the content, or converting it into a structured representation. As you can see, the nature of information started to change.
C. Generation
The AI model (often an LLM) is then tasked with generating a response to your search query. The model leverages the information from the retrieved documents to create a coherent and relevant answer. At this stage, the fundamental nature of content and information has changed. It has been deeply altered. The original authors? Gone!
So, is it a good or a bad idea?
In theory, RAG will be used to better understand the context of a query and generate responses that are highly relevant to the user's needs. The claim is that AI models can be used to generate creative and innovative responses to search queries, going beyond simple factual information. And we know they went far beyond factual information thanks to serious inaccuracies and innumerable hallucinations. In a way, this is a clever trick to shift the focus from the AI model itself to the quality of the information it is fed...
How Search could change
Here's how SearchGPT and Google's upcoming evolution could change how people search online. First, thanks to Natural Language Processing (NLP), humans will search using a more conversational style instead of searching using keywords. Sure, there are still KEY WORDS but it is the Transformer's job to identify what matters using a probabilistic mathematical approach. If you don't know what a Transformer is, watch the following short video: I shared my definition of this particular type of Neural Network.
Until now, we've been prompting. But prompting isn't searching. The mindset and the objectives are different. As of today, you do not search when you use conversational AI models, you explore. At this stage, most users do not blindly trust AI models, and rightly so!
If OpenAI or xAI manages to use mainly authoritative and trustworthy sources while displaying easy to understand answers, users will start to trust the output. But we're not there yet and officially SearchGPT won't be a search engine, just an improved ChatGPT.
Long-tail keywords or conversations?
Conversational interactions go way beyond what SEOs call "long-tail keywords". I expect human-machine interaction to change drastically and not only because people are getting used to prompts but because this will soon lead to voice search becoming more popular than ever before. With speech recognition, Natural Language Processing, hybrid search models combining LLMs with RAG, and thanks to smartphones, I expect voice search to become popular in the coming years. We are getting close but we are not there yet.
Technologically speaking, it is easy to transform the astounding amount of textual content into vocal answers. I've tested multiple approaches and it worked pretty well.
People don't buy prompts. But they will buy applications or services that can simplify their busy and noisy existence, even if they could achieve the same goal for free in 10 prompts.
It is all about simplicity and what is simpler than speaking to get a written, spoken or video answer in return? That's what I mean by a drastic change in human-machine interaction.
What could the future of advertising look like?
I bet it will be something such as short AI-generated vocal or video ads targeting the users’ search intent BEFORE delivering any decent "organic" answer. SEOs are talking about YouTube SEO or Video SEO but this currently happens at a basic level. By targeting the titles of videos, the descriptions, the thumbnails and often closed captions (subtitles), we can provide richer semantic context to bots and human users. But I'm talking about something radically different. I'm thinking of future search engines that will be able to understand your preferences and fluctuating emotions and adapt in real-time.
All of this with one clear goal: pushing you to buy, for the benefit of advertisers.
If you think nobody clicks on ads anymore, think again. First, that's not true at all. Then, ads come in different forms. They can be disguised or hidden. There's a whole industry named influencer marketing. Brands love this because they can leverage the human connection a content creator spent time building with their audience to push their products and services.
For brands, this is priceless. And yet, almost every creator has a price. I've been approached many times with less than 8,000 followers. I cannot imagine the amounts content creators with 1+ million followers are being offered. For brands, the more subtle the influencers, the better. Creators who manage to incite their audience to buy without appearing to sell anything are rare. It is called an "art". The art of selling. But what if search engines could directly offer deals to influencers when they believe their user base would love the ad?
Do you see how I'm connecting the dots?
Google Ads: malware hidden in plain sight
For years, Google Ads has had a serious problem with malware distribution and it refuses to tackle the issue. I have often spoken to people in Google's paid traffic teams, but Alphabet isn't willing to take action, apart from a few employees who have manually taken down abusive ads. I salute the employees' hard work, but massive lawsuits may hit Alphabet/Google because they are lax about malvertising distribution through their platform.
While this has been a genuine problem, if things evolve the way I expect them to, the next cybersecurity threat will come from Google allowing ads in the form of AI avatars or personas targeting your search intentions and answering in the best possible way to promote malware and more. Threat actors will use this to trick you if this ever becomes a reality. And this time, instead of "only" displaying the official domain name of a Fortune 500 brand in their Google Ads which is in itself unacceptable, they'll go a step further.
Can you trust the ads you see today? Nope! But in a few years, Google Ads might display human-like avatars who have all the characteristics designed to please, seduce and convince you to perform an action. And they'll know everything about you. It will be easy.
The next few years will redefine Predictive Analytics
If that happens, the conversion rate will be better than anything experienced before because of great targeting. You'll be fooled because search engines will use an incredibly rich dataset. The meaning of "telemetry" will change once models are able to constantly analyze the various emotions in your voice and facial expression. With enough statistically significant data, search engines might be able to extrapolate how likely you are to behave in a certain way. Humans want to believe that they make rational and logical decisions but most of what people decide is emotionally driven, whether they admit it or not!
I doubt there will be an Artificial General Intelligence (AGI) but I expect human behavior to change at a deep level. I believe we are years away from what I've described but the future could come sooner than expected through gradual adoption or breakthrough technologies.
Creativity in the Age of Content Theft
Accuracy has been a serious problem in every conversational AI model, without exception.
But there's another issue that almost nobody will want to mention until the AI bubble explodes: content theft happening at a massive scale. A few years ago, nobody would have believed this would ever be allowed. And yet, here we are, partly because the Tech space moves much faster than the legal area. Will the AI bubble explode or will trees grow to the sky? Can AI grow and improve indefinitely? If so, at what cost?
I raised the issue of content theft over and over again in my videos. Why? Because the obscure datasets used to train the AI models are made of stolen human-generated content.
The creators’ collective knowledge and hard work will end up in a giant melting pot of data without credit or real attribution apart from a few links. The original authors never gave their approval. And yet, here we are. Should I explain how AI models may fundamentally change the nature of information and content? Let me know if that's something you'd like to read!
At this early stage, RAG will introduce some latency because it requires retrieving and processing documents so if the goal is a real-time search experience this won't work well.
Do I believe Google will react with more innovative mistakes to what they'll perceive as threats from AI-based "search engines" such as SearchGPT? Yes. The right strategic move for Alphabet/Google would have been to focus on their core strengths instead of diluting their accuracy and credibility. The Google leadership of the past years has been terrible, with no innovation, no real creativity and a surprising tolerance for unacceptable mistakes.?
Things are slowly changing, but I expect this to continue as long as their ad revenue keeps going up. However, as soon as the revenue trend inverts (goes down), it will be too late and Google's monopoly will be up for grabs. And it will happen gradually.
Other giant entities have been silently crawling and scraping data for years, they are currently training unbelievably powerful multi-modal large language models. Don't forget that many aren't based in the so-called "Western world". Soon people might be amazed and surprised when the lines between reality and artificially generated content will be blurred.
For now, it is still obvious. But soon, it will become harder than ever to know what is real and what is artificial. And the greatest casualty will be Truth.
People are already easy to manipulate, but manipulating crowds will become easier than ever, on a global scale. I'm sad to write this, but it seems like a clear trajectory/trend.
When global brands do not use scrapped data coming directly from websites, they leverage user-generated content on platforms such as Linkedin or X. I do believe the X platform or xAI will launch a different type of search engine, with an emphasis on unbiased ranking.
Of course, bias can be subjective. Whatever happens, humans will witness a technological, intellectual and cognitive challenge. I forgot to eat so I'll stop here but I could not end without trolling so here's me imagining I'm Google's CEO.
YouTube (owned by Google) shadow-banned my parody video but the Linkedin audience loved it with more than 1,200 minutes of watch time! Enjoy! :)
Search Engines Reimagined: AI, Information Retrieval, Behavioral Changes & Cyber Threats on my website (the V1 of this article I wrote hours ago).?????
Chief Legal Officer | Cybersecurity | Risk Management Expert | Privacy and Governance Leader | Emerging Technologies Expertise | Digital Transformation| Corporate Business Strategist | Integrator | Speaker
6 个月Elie Berreby well done! This is a huge accomplishment??
SEO Expert ?? Blogger | Onpage & Offpage Expert | Local SEO | Technical SEO | SMM Expert | Wordpress Developer | Facebook Ads Manager
6 个月Such a comprehensive guide!. Thank you for sharing. Elie Berreby
Digital Marketer | Cyber Security Practitioner (Ce-CSP) |?CISMP |?ISO 27001 |?ITF+ | CCSK
6 个月The landscape of search is shifting fast, huh? Curious how AI will change the game for businesses. What’s your take on RAG?
Senior SEO n00b
6 个月I'm sorry guys: there are (WERE) typos in my Linkedin article and because the LI interface is TERRIBLE to edit large articles with images/videos, I did the following experiment and I'll try to keep you updated regarding the stats and everything else. I just subscribed to the most expensive X plan to be able to publish articles. I never use X so this will be a test to see if the reach/visibility is different, what type of audience, etc. X messaged me to say they were reviewing my account for a "blue" checkmark after I paid. Interesting note: I just published a shortened version of this article (without the typos). It is still 5045 words (30628 characters) and there were NO issues despite the number of characters being above the official 25,000 limit. Here's the screenshot. This is my first article on X (@semking)! ?? EDIT: typos corrected! What. A. Waste. Of. Time. With. Such. A. Bad. User. Interface.
I help Generative AI help You. Business Strategist. Content Marketing | SEO | AI Business Processes | OG Space Resources Nerd. Engineering Physicist. Master Photographer.
6 个月I haven’t read your tome yet, but I’ve been asking lately if anyone has done an analysis of the economic damage Google has done with updates in the past 12 months, especially re SMBs. Did you address that at all? I think it would be a great thesis project for a grad student.