Take Searchers Seriously, Not Literally

Search application developers manage numerous tradeoffs, foremost the tradeoff between precision and recall. Precision measures the fraction of search results that are relevant, while recall measures the fraction of relevant documents that are retrieved. Precision is about returning “nothing but the truth”, while recall is about returning “the whole truth”.

Unfortunately, many search application developers misinterpret this tradeoff by taking a literal, reductionist approach to query understanding. These developers interpret precision as matching the exact keywords the searcher uses, rather than matching the intent behind those keywords. Unfortunately, this understandable attempt to respect the searcher’s intent is misguided and harms the search experience.

Synonyms

This problem surfaces in the context of query expansion — specifically synonyms. In many search applications, results that exactly match query words score higher than results that match through synonym expansion.

Too many search application developers confuse probability of relevance with degree of relevance. Sometimes synonyms represent a slight drift in meaning, such as from sneakers to shoes. Often, however, they represent an equivalence subject to context. For example, the words “company” and “firm” have essentially the same meaning when they refer to commercial businesses, but both words have other meanings in different contexts. There is a big difference between a synonym retaining 80% of the meaning of the original word and there being an 80% probability of retaining all of its meaning — even if they yield essentially the same expected value.

For example, consider a search on an e-commerce site for “cell phone chargers”. In this context, “cell” and “mobile” are synonyms with no loss of meaning. Therefore, the search application should treat results for “mobile” phone chargers just like results for “cell” phone chargers. Indeed, it would be a disservice to searchers and the business to not show the best phone chargers to searchers looking for one, regardless of whether they are indexed as “cell” phone chargers or “mobile” chargers — and regardless of which word the searcher uses in the query.

Holistic Query Intent

In contrast, searchers are not happy when a search for “cell phone” returns a flood of cell phone accessories, such as cases and chargers. Search application developers may protest that they are just following orders, returning results that exactly match the searcher’s keywords. However, searchers expect search applications to know the difference between a product and its accessories — and to recognize their intent the way a human would. People searching for cell phones want phones, not cases.

Scenarios like these make it clear that query understanding needs to be holistic rather than reductionist. At the very least, a search application should recognize the broad category or categories targeted by the query and avoid hurting precision by including out-of-category results.

Search Query vs. Search Intent

Fundamentally, search application developers need to manage precision and recall in terms of the searcher’s intent rather than the literal search query. Searchers do not care whether a search application matches their exact keywords; they care whether it matches their exact intent. Search application developers may feel that exact keyword matching improves explainability, but most searchers see those explanations as excuses.

Focusing on the holistic meaning of the query may sound like AI-powered search, favoring neural over traditional token-based retrieval. Indeed, AI can help address the reductionist errors of token-based approaches. However, that does not mean that search applications need to implement embedding-based retrieval. It may be simpler and more robust to use query classification and query similarity to understand search intent.

Summary: Think Like a Searcher, Not a Developer

Delivering effective search applications requires empathy with searchers. Focusing on literal search keywords and the computation associated with retrieval and scoring makes sense to developers but is not something that searchers even think about. Searchers expect search to just work, for search applications to understand what they mean. This expectation may be unreasonable. However, it is what searchers expect, and it is the ideal that search application developers have to strive for. Most importantly, it should frame how developers think about search problems and solutions. Search applications need to take searchers seriously, not literally.

要查看或添加评论,请登录

Daniel Tunkelang的更多文章

  • Documents, Queries, and Categories

    Documents, Queries, and Categories

    I have published a number of posts and presentations about the bag-of-documents model, which essentially represents…

  • Where Do Categories Come From?

    Where Do Categories Come From?

    In my previous post, I argued that categories are fundamental for search applications. I characterized a robust set of…

  • Categories are Fundamental for Search

    Categories are Fundamental for Search

    As a search consultant, I have learned to be flexible about structured data. However, I do insist on content being…

    4 条评论
  • Quo Vadis Nunc, Quora?

    Quo Vadis Nunc, Quora?

    I was one of Quora’s earliest users, earned Top Writer status for a few years, and topped the leaderboard as a 9-time…

    2 条评论
  • Seriously or Literally?

    Seriously or Literally?

    The other day, I posted about the need for search applications to take searchers seriously, not literally. This need…

  • Cold Start, Practical Edition

    Cold Start, Practical Edition

    If you are a search application developer or some other kind of machine learning practitioner, you have probably…

  • All Else Equal

    All Else Equal

    In The Three-Body Problem, Liu Cixin describes how an alien species drives scientists to suicide by making it…

    8 条评论
  • Hallucinating a Post-Search World

    Hallucinating a Post-Search World

    When I first heard about 3D printing, I imagined something like a Star Trek replicator that could synthesize arbitrary…

  • Handling Facets With Many Values

    Handling Facets With Many Values

    The previous post addresses the challenge of selecting which facets a search application should present to searchers as…

  • Facets, But Which Ones?

    Facets, But Which Ones?

    This post dives into a particular challenge of faceted search, exploring the challenge of selecting which facets a…

社区洞察

其他会员也浏览了