登录查看更多内容

Take Searchers Seriously, Not Literally

Daniel Tunkelang

Query Understanding

发布日期: 2024年9月4日

Search application developers manage numerous tradeoffs, foremost the tradeoff between precision and recall. Precision measures the fraction of search results that are relevant, while recall measures the fraction of relevant documents that are retrieved. Precision is about returning “nothing but the truth”, while recall is about returning “the whole truth”.

Unfortunately, many search application developers misinterpret this tradeoff by taking a literal, reductionist approach to query understanding. These developers interpret precision as matching the exact keywords the searcher uses, rather than matching the intent behind those keywords. Unfortunately, this understandable attempt to respect the searcher’s intent is misguided and harms the search experience.

Synonyms

This problem surfaces in the context of query expansion — specifically synonyms. In many search applications, results that exactly match query words score higher than results that match through synonym expansion.

Too many search application developers confuse probability of relevance with degree of relevance. Sometimes synonyms represent a slight drift in meaning, such as from sneakers to shoes. Often, however, they represent an equivalence subject to context. For example, the words “company” and “firm” have essentially the same meaning when they refer to commercial businesses, but both words have other meanings in different contexts. There is a big difference between a synonym retaining 80% of the meaning of the original word and there being an 80% probability of retaining all of its meaning — even if they yield essentially the same expected value.

For example, consider a search on an e-commerce site for “cell phone chargers”. In this context, “cell” and “mobile” are synonyms with no loss of meaning. Therefore, the search application should treat results for “mobile” phone chargers just like results for “cell” phone chargers. Indeed, it would be a disservice to searchers and the business to not show the best phone chargers to searchers looking for one, regardless of whether they are indexed as “cell” phone chargers or “mobile” chargers — and regardless of which word the searcher uses in the query.

领英推荐

Listboxes vs. Dropdown Lists

Radiant Digital 2 年前

Comprehensive Guide to Installing Google Analytics 4…

Negar Pourjavad 10 个月前

Step-by-Step Guide to Installing Google Tag Manager on…

Negar Pourjavad 10 个月前

Holistic Query Intent

In contrast, searchers are not happy when a search for “cell phone” returns a flood of cell phone accessories, such as cases and chargers. Search application developers may protest that they are just following orders, returning results that exactly match the searcher’s keywords. However, searchers expect search applications to know the difference between a product and its accessories — and to recognize their intent the way a human would. People searching for cell phones want phones, not cases.

Scenarios like these make it clear that query understanding needs to be holistic rather than reductionist. At the very least, a search application should recognize the broad category or categories targeted by the query and avoid hurting precision by including out-of-category results.

Search Query vs. Search Intent

Fundamentally, search application developers need to manage precision and recall in terms of the searcher’s intent rather than the literal search query. Searchers do not care whether a search application matches their exact keywords; they care whether it matches their exact intent. Search application developers may feel that exact keyword matching improves explainability, but most searchers see those explanations as excuses.

Focusing on the holistic meaning of the query may sound like AI-powered search, favoring neural over traditional token-based retrieval. Indeed, AI can help address the reductionist errors of token-based approaches. However, that does not mean that search applications need to implement embedding-based retrieval. It may be simpler and more robust to use query classification and query similarity to understand search intent.

Summary: Think Like a Searcher, Not a Developer

Delivering effective search applications requires empathy with searchers. Focusing on literal search keywords and the computation associated with retrieval and scoring makes sense to developers but is not something that searchers even think about. Searchers expect search to just work, for search applications to understand what they mean. This expectation may be unreasonable. However, it is what searchers expect, and it is the ideal that search application developers have to strive for. Most importantly, it should frame how developers think about search problems and solutions. Search applications need to take searchers seriously, not literally.

要查看或添加评论，请登录

Daniel Tunkelang的更多文章

Documents, Queries, and Categories

2024年11月25日

Documents, Queries, and Categories

I have published a number of posts and presentations about the bag-of-documents model, which essentially represents…
Where Do Categories Come From?

2024年11月20日

Where Do Categories Come From?

In my previous post, I argued that categories are fundamental for search applications. I characterized a robust set of…
Categories are Fundamental for Search

2024年11月18日

Categories are Fundamental for Search

As a search consultant, I have learned to be flexible about structured data. However, I do insist on content being…

4 条评论
Quo Vadis Nunc, Quora?

2024年9月25日

Quo Vadis Nunc, Quora?

I was one of Quora’s earliest users, earned Top Writer status for a few years, and topped the leaderboard as a 9-time…

2 条评论
Seriously or Literally?

2024年9月18日

Seriously or Literally?

The other day, I posted about the need for search applications to take searchers seriously, not literally. This need…
Cold Start, Practical Edition

2024年9月16日

Cold Start, Practical Edition

If you are a search application developer or some other kind of machine learning practitioner, you have probably…
All Else Equal

2024年9月10日

All Else Equal

In The Three-Body Problem, Liu Cixin describes how an alien species drives scientists to suicide by making it…

8 条评论
Hallucinating a Post-Search World

2024年8月30日

Hallucinating a Post-Search World

When I first heard about 3D printing, I imagined something like a Star Trek replicator that could synthesize arbitrary…
Handling Facets With Many Values

2024年8月21日

Handling Facets With Many Values

The previous post addresses the challenge of selecting which facets a search application should present to searchers as…
Facets, But Which Ones?

2024年8月15日

Facets, But Which Ones?

This post dives into a particular challenge of faceted search, exploring the challenge of selecting which facets a…

See all articles

Take Searchers Seriously, Not Literally

Daniel Tunkelang

Query Understanding

Synonyms

领英推荐

Holistic Query Intent

Search Query vs. Search Intent

Summary: Think Like a Searcher, Not a Developer

Daniel Tunkelang的更多文章

社区洞察

其他会员也浏览了

Step-by-Step Guide to Installing Google Tag Manager on Your Progressive Web App

Facets of Faceted?Search

How to Make Your Site Faster With Google AMP

Google Warns: URL Parameters Create Crawl Issues

6 Methods to Reverse Image Search on iPhone

How Google Chrome's New Update Improves Search Suggestions for Desktop and Mobile Users!

How to create an exit-intent opt-in form using ConvertKit.

What is Cross-Origin Resource Sharing?

How to Use AMP to Accelerate Mobile Pages: Boosting Performance and User Experience

group buying clone script | HWINFOTECH

Synonyms

领英推荐

Holistic Query Intent

Search Query vs. Search Intent

Summary: Think Like a Searcher, Not a Developer

Daniel Tunkelang的更多文章

Documents, Queries, and Categories

Where Do Categories Come From?

Categories are Fundamental for Search

Quo Vadis Nunc, Quora?

Seriously or Literally?

Cold Start, Practical Edition

All Else Equal

Hallucinating a Post-Search World

Handling Facets With Many Values

Facets, But Which Ones?

社区洞察

其他会员也浏览了

Step-by-Step Guide to Installing Google Tag Manager on Your Progressive Web App

Facets of Faceted?Search

How to Make Your Site Faster With Google AMP

Google Warns: URL Parameters Create Crawl Issues

6 Methods to Reverse Image Search on iPhone

How Google Chrome's New Update Improves Search Suggestions for Desktop and Mobile Users!

How to create an exit-intent opt-in form using ConvertKit.

What is Cross-Origin Resource Sharing?

How to Use AMP to Accelerate Mobile Pages: Boosting Performance and User Experience

group buying clone script | HWINFOTECH