Google vs. Perplexity, Whose API Reigns Supreme?

Google vs. Perplexity, Whose API Reigns Supreme?

Google Gemini team recently updated the Gemini API with a new tool, “Grounding with Google Search.” This augments your LLM request by “grounding” the response with relevant context gathered from a Google search result. This is analogous to the functionality provided by Perplexity's Sonar Online API models, so as a spiritual follow-up to my previous piece on Perplexity’s lawsuit, I thought I’d give a head-to-head review of Gemini’s Grounding with Google Search vs. Perplexity’s API.

Generative Search, or Search Augmented Generation, or “Grounding with Google Search” ????♂?

Generative Search tools are a specific implementation of Retrieval Augmented Generation (RAG), where instead of performing a search across a custom knowledge base, the information retrieval is performed over a web index containing the contents of the entire Internet (the useful parts of it, anyway). Similar to a normal RAG pipeline, the user’s query is used first to retrieve relevant content from the index, which is then included in the LLM input context. The LLM is also instructed to answer the user query using the additional search content, and the result is (usually) higher quality answers that are more relevant to the user’s query, and not limited to the model’s own knowledge from pretraining.

Grounding LLMs with search results provides multiple advantages, such as richer answers with more “freshness”. It can provide the model with more recent information than the models’ knowledge cutoff date, and can result in more nuanced and expert-level information, improving answers even where the model would have likely generated the correct answer. Grounded answers are also potentially useful when you want the LLM to analyze several related pieces of content and return a summary, or extracted element from the search results.

There are limitations to this approach, as the LLM may overstate the certainty or correctness of an answer that is based on incomplete information, or even mis-evaluate the significance of the various sources of information it is presented with. Some critics argue that “chatbot as search” presents a technological mismatch that is fundamentally unreliable.

It’s fair to say grounding LLM answers in search results does not eliminate the potential for inaccurate or “hallucinated” answers. So with that, let’s dig in to a head-to-head comparison of Perplexity’s Search API offering, and Gemini’s “Grounding with Google Search” feature!


The Terms of Service

Overall Winner: Google

Google is the clear overall winner in the TOS comparison, reserving fewer and less permissive rights for Provider uses of API data. Next to Perplexity’s near-unfettered rights, Google’s terms of service offer superior protection for user-submitted inputs.

High-level TOS comparison with Gemini and Perplexity

Customer Allowed Uses of LLM Outputs

Winner: Google is slightly advantaged

Neither TOS allows the users to copy, cache, distribute, or make derivative works from the API output. In both instances, the purpose of the API is to “receive and display” some search augmented output, and to make no other uses.

Google permits caching of API results to optimize displaying of results (in conjunction with Google’s restrictions regarding displaying search results), as well as storing the outputs in users’ chat histories for up to six months. At that point, you need some process that has kept track of any stored chat histories containing Grounding with Google Search results, and remove them from user chat histories. Awkward, but there is no comparable allowance in Perplexity's terms.


Ownership of Outputs

Winner: Tied

Perplexity claims ownership of API outputs:

As between the parties, Perplexity retains all right title and interest, including all intellectual property rights, in and to the Perplexity Materials and any and all improvements, modifications or enhancements thereto, well as all related software programs, data, documentation, specifications, descriptions, algorithms, methods, processes, techniques and know-how (the “Perplexity Property”).

This seems questionable given that the API output will often contain search content that is clearly not the property of Perplexity, and they are currently being sued for the crawling and processing activities behind their search index.

Google Gemini’s TOS do not explicitly claim ownership over the API output, but apply similar restrictions as Perplexity on downstream uses.

Both terms effectively prohibit modifying the API output through prohibitions on copying, creating derivative works. Google’s TOS are slightly more restrictive in that they also recommend displaying Links, and require displaying Search Suggestions with the Grounded Results, and without modification or interspersing any other content. (In terms of messaging, Google is inconsistent here because some of their resources “recommend” displaying Search Suggestions with Grounded Results, while the TOS makes clear it is a requirement.) As a display requirement, this isn’t particularly burdensome. It’s a small graphical chip that provides users access to the search results that generated the API output. In general, users will want quick access to these references to verify the accuracy of their answers. It also feels “fair” in the sense that users are directed to the actual Google search query that generated the included content (we’ll have to wait and see whether content owners end up suing Google).


Search Suggestions Chip

In practical terms, this is only a restriction in that your application must be able to render CSS/HTML to comply. This may rule out integrating Grounding in Google Search into some legacy stacks, but overall I don’t suspect this is particularly restrictive.


Provider Reserved Rights

Winner: Google, Clear Winner

Perplexity’s terms of service reserve a ton of rights, and in my view should give developers pause. These terms should rule out any proprietary or sensitive use cases, because it doesn’t provide sufficient protection for data submitted as input.

Google’s terms are more protective of user inputs, but notably are not zero-retention. The terms allow for debugging, testing, monitoring for compliance, but doesn’t allow for unfettered uses such as “improving the product” or “business purposes”. Notably, neither the Gemini TOS nor the Grounding in Google Search terms are zero-retention, and Google does not offer a zero-retention option.


Provider Covenants

Winner: Google

Google’s Terms Gemini’s Paid Services include a restriction from using Inputs to train models or improve the service. Perplexity’s agreement does not.

Neither agreement includes API inputs or outputs under the definition of ‘Confidential Information’.


The API Features

Overall Winner: Google

Google's implementation also wins from an API standpoint as being seamlessly integrated into their existing Gemini API endpoints, and by offering better citation and attribution features. Google's victory in this category is less definitive, due to some oddities in generated answers when low grounding scores result in strange LLM behaviors.


Accessing Search Features

Winner: Google, Clear Winner

The overall difficulty of accessing either Search Grounded LLM features is similar. For Perplexity, you must write your own cURL request specifying a sonar-online family model (Perplexity’s family of models that uses their grounded search tool) and including your desired parameters. It feels rather hackish since they don’t offer their own SDK that wraps requests.

On the Google Gemini side, what they’ve done is incredibly slick. Gemini API includes first-party tools, which currently includes “code_execution” and “google_search_retrieval”. These behave similarly to function calls, except they trigger services run by Google. Code_execution allows the model to access a python runtime, similar to a slimmed down version of OpenAI Assistants code_interpreter agent. The google_search_retrieval enables their grounded search feature. By adding a tools: string parameter, Gemini (pro, flash, or flash-8B) gains access to the tool, automatically using it when appropriate.

  • Unified API endpoint

Unlike the comparable OpenAI Assistants or Perplexity sonar-online models, Gemini’s additional features are accessed from the same API endpoint, and involve calling the same model. This allows devs to access these sophisticated tools without parsing the chat history, or having to invoke different LLM code. This streamlined approach seems easier and faster to implement, with the added advantage of interacting with the same LLM the entire time.

Gemini's Grounding in Google Search feature starts by classifying the user’s query to determine how likely it is to benefit from search grounding, and assigning a score between 0 (no search needed) and 1 (search required). The API then includes a “Dynamic Retrieval” parameter which allows you to specify a threshold value for how frequently you would like Grounding in Google Search to trigger. This can help manage cost, and latency, or calibrating the frequency of search grounding for your application.

Examples of Dynamic Retrieval Scores

Search Grounded LLM Model Selection

Winner: Tied

Perplexity’s API is powered by fine-tuned open-weights models that receive the grounding content and generate the results. The current sonar-online models are finetunes of Llama 3.1, either 8B, 70B, or 405B. Gemini's Grounding in Google Search tool is available using the latest gemini-1.5 models: flash-8B, flash, and pro.?

I think these options are roughly equivalent. Benchmark-wise, 405B has a slight lead, but you can’t send complex queries to the sonar-online variant because it will affect the search retrieval. Gemini models are multimodal, and have greater context windows.

One could argue that Llama models take a slight edge based on benchmarks, or that the Gemini models are better for their modalities and context windows. I think they end up about the same, but one or the other may shine based upon your use case.


Accuracy / Performance

Winner: Perplexity, Maybe

I haven’t run anything near a benchmark for accuracy / performance, but in just a few trials, I ran into answer-instability for one of my tests around multi-source questions. When asked “Are there any actors who starred in both K-dramas Our Blues and Pachinko?” Gemini’s answer was unstable when run multiple times (about 8 tries), seemingly due to low source relevance. This is a common RAG failure case, where a system is trained to answer based on retrieved content, and the retrieval stage fails to find relevant information.

Gemini Grounded Answer "No" with no Sources
Gemini Grounded Answer "Yes" with no Sources. This answer is incorrect. Koh Hansu is played by Lee Min-ho.

Questions that require multiple sources to answer are generally challenging for RAG systems to process because most RAG retrieval uses a single query to collect information. So whereas Perplexity’s search retrieval answered correctly based on sources that do not actually provide a conclusive answer, it did not exhibit the behavior of returning no grounding sources due to insufficient relevance, and didn’t present unstable answers when queried repeatedly (6 times). This may actually be a hallucination-style failure case on the part of Perplexity, and Google’s solution may actually be technically superior by requiring grounding sources to pass a relevance threshold before being provided to the large language model, but in this scenario, it ends up significantly degrading answer quality.?

Perplexity Sonar Online Answer

Gemini's output needs to account for when grounding results are inconclusive. It probably shouldn’t provide a definitive answer. Outside of this particular failure condition, both APIs returned similar answers for my other two questions (NVDA stock price, and dark chocolate health benefits).


Citations and Attribution

Winner: Google, Clear Winner

One of the most fascinating features in Gemini’s Grounding in Google Search is the “grounding_chunks” and “grounding_support” metadata returned with the response. I believe this builds on their previous Semantic Retrieval beta technology, and uses a model that predicts when model generated text is attributable to a particular context source.

Gemini answer with sources and Search Suggestions Chip

This allows building UIs with passage-level attribution to source material, as well as displaying the Links relied upon to generate the model’s answer.

Perplexity, through closed beta features, also provides Citation support, which provides links to the top search content URLs for a given input query, and includes approximate citations in the output. Perplexity’s CEO has cautioned that the citation attribution is still approximate, and they are still working on improving the accuracy, which makes me think that they are relying on prompt-level and constrained generation techniques to provide the links as part of the LLM output, instead of a separate predictive model like Google. If the attributions are LLM-generated, this may also provide a reason for devs to lean towards the larger sonar-online variants (based on llama 3.1 70B and 405B) rather than saving a few cents on inference with the smaller model, if you are attempting to use the citations feature.

Perplexity Sonar Online Large answer with sources and "attribution"

Additional (Beta) Features

Winner: Perplexity

Perplexity has a number of additional search feature parameters, such as the ability to whitelist or blacklist domains (albeit currently limited to only 3 domains, which is deeply limiting), and a search recency filter, both of which seem like excellent additions for really targeted use cases. If you want to perform searches across a specific domain list, or only consider websites updated in the past month, Perplexity supports this.

Additional beta features include an option to return related questions, or images related to search, which both seem like features for building your own Perplexity Pro web app. It is confusing that they are providing the tools to build a Pro doppelganger, given that they prohibit using the API to build a competing service. ??


Total Cost

Winner: Perplexity, Clear Winner

From a cost perspective, both Perplexity and Google charge a flat “search query” fee per request, as well as charging for user input and LLM output tokens. Neither charges for the cost of the search content input tokens, which are invisible to the user.

At $35/1000 queries, Gemini’s search fee is 7x that of Perplexity (at $5/1000 queries). That’s 3.5 cents vs. .5 cents per request, but even these small amounts tend to dwarf the model inference costs associated with these services.


Conclusions

Terms of Service - Google Clear Winner

I’ve previously written that Perplexity’s Terms of Service seemed unsuited for any type of application development because the customer’s allowed uses for the generated content were too restrictive, and the Provider’s reserved use rights were too broad. Google’s Terms of Service, despite adding display requirements, still wins both in terms of the allowed uses, and reserved rights.

Gemini’s terms of service still make clear that the model output can only be used to answer user questions, but at least provide specific allowances for developer caching (30 days) to optimize the display of Grounded Results within your application, and within user chat histories for up to six months, to allow users to review the information.

API Features - Google leads by a smaller margin

Between Google’s superior implementation through a tool use parameter, dynamic retrieval, and advanced citation/attribution features, I think Grounding with Google Search is just the better product, despite Perplexity’s year+ lead to market. With beta features, they both produce similar looking results, although Perplexity’s inability to consistently map generated text to particular citations results in more confusion/legwork to verify answers from the supporting links.

Perplexity has, and is adding more, interesting search-related parameters. None of these seem like they would be difficult for Google to implement, and therefore don’t provide much of a moat. They do demonstrate that Perplexity is continuously innovating on their API product, and hopefully they keep it up now that there's competition.

Gemini’s Grounding in Google Search is surprisingly expensive compared to Perplexity, and that may be due to Google’s implementation ingesting many more tokens worth of context, and/or orchestrating multiple models to provide the attribution scores. Given how aggressively-priced the rest of the Gemini API appears to be, I find it quite striking that it is SEVEN TIMES Perplexity’s rate. These aren’t Anthropic prices, or OpenAI o1 prices, but it’s still a noticeable difference. Both options are in-line with the expected cost of using a top-performing LLM.

Gen AI talking heads are often critical of Google, in that they fumbled the ball with transformer-based generative AI after inventing the technology, and still seem third (or fourth) in terms of model quality, and can’t seem to offer as compelling services as either OpenAI, or Anthropic, or a search feature similar to Perplexity. With ChatGPT’s Search feature, people took it as yet another sign that Google’s dominance in search may be threatened. If anything, Gemini’s Grounding in Google Search feature says otherwise, offering developers best-in-class tools to build their own search-enabled LLM applications, and seamlessly integrated with the existing Gemini API.?


Hrijul Dey

AI Engineer| LLM Specialist| Python Developer|Tech Blogger

4 天前

Unlock the power of choice: AI Agents vs AI Pipelines demystified. Discover how LLM apps can benefit from both, with coding insights and use cases you can't afford to miss. https://www.artificialintelligenceupdate.com/ai-agents-vs-ai-pipelines-a-practical-guide/riju/ #learnmore #AI&U #LLM #AIApplications #Python #Coding

回复
Brian W Tang

Innovator, ecosystem builder and educator at the confluence of law, technology, sustainability & finance

2 周

Hi Leonard, just wanted to let you know i gave your piece on the Perplexity lawsuit a shout out in my LITE Lab HKU classes - thanks continuing the great sharings!

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

2 周

It's fascinating to see how Gemini's Grounding attempts to bridge the gap between traditional search and the emergent capabilities of LLMs. The integration with Google Search presents a compelling opportunity for developers to leverage both factual grounding and generative prowess. However, will this approach truly address the limitations of current search paradigms, such as the inherent bias in training data and the difficulty in capturing nuanced user intent? Can Gemini's Grounding effectively navigate the treacherous terrain of semantic ambiguity and contextual understanding within the vast expanse of the knowledge graph?

要查看或添加评论,请登录

Leonard Park的更多文章