登录查看更多内容

AI for Query Understanding

Daniel Tunkelang

Query Understanding

发布日期: 2021年5月24日

In the past decade, the incredible progress in word embeddings and deep learning has fueled an interest in neural information retrieval. An increasing number of folks believe that it’s time to retire the traditional inverted indexes (aka posting lists) that search engines use for retrieval and ranking.

In its place, they advocate a model where search engines use neural networks to represent documents and queries as vectors, and then use nearest neighbor search — or more sophisticated ranking models — to retrieve and rank results.

This revolutionary approach is tempting, but — in my view — misdirected. As I argue below —?and?as?you?can?learn?how to?implement?here —?the right place to focus AI efforts is query understanding.

The traditional information retrieval approach is limiting.

The neural retrieval approach is tempting, because it addresses some of the most frustrating limitations of the traditional information retrieval in general, and inverted indexes in particular.

The traditional retrieval and ranking approach uses a variety of factors, such as BM25, to ensure precision. These factors tend to be highly sensitive to individual keywords. For example, a search for “iphone cases” typically returns slightly different results than a search for “iphone case” — even though the search intents are almost certainly identical.

Similarly, traditional query expansion methods, such as stemming and synonyms, as well as query relaxation, tend to take the query too literally because they focus on keywords rather than the overall query intent. As a result, they can make egregious errors, such as returning iPhone cases when someone performs a search for a “case of apples”.

Embeddings and deep learning allow us to holistically represent and target intent, rather than relying on a reductionist approach based on keywords.

领英推荐

How neural networks drive smarter search results

Algolia 1 个月前

Outperforming LLMs with Fewer Data and Smaller Model…

Danny Butvinik 1 年前

What Is The Difference Between Deep Learning, Machine…

Bernard Marr 7 年前

But applying this representation directly to ranking and retrieval skips a critical step: query understanding.

The best place to address query intent is query understanding.

Query understanding is what happens before the search engine retrieves and ranks results: it comprises the searcher’s process of expressing an intent as a query and the search engine’s process of determining that intent. Query understanding involves a combination of holistic and reductionist steps, the holistic steps considering the query as a whole, and the reductist steps breaking the query down into separately analyable parts.

Holistic query understanding with embeddings and deep learning is a great way to look beyond keywords and recognize queries that represent similar or equivalent intent. Before retrieving and ranking results, query understanding can map the query to a representation that canonicalizes the class of equivalent queries. Doing so often allows the search engine to aggregate behavioral signals, such as clicks and purchases, that would otherwise be fragmented among the various expressions of a given intent. Holistic query understanding can also classify the query into a topic or product taxonomy.

Reductionist query understanding can also be valuable. In particular, recognizing entities in queries leads to a richer representation of the query than treating the query as a sequence of keywords. Not only can entity recognition can drive query scoping to improve precision (e.g., recognizing whether “apple” is a brand or product type), but it can also provide context to intelligently expand or relax entities to increase recall (so that we don’t confuse iPhone cases with cases of apples).

Use AI intelligently.

Embeddings and neural networks are powerful tools, and we are only beginning to realize their potential to revolutionize search. But search is more than retrieval and ranking. Search starts with users expressing their intent through queries. Hence, the highest-leverage way to improve search is to through better query understanding.

So let’s focus our AI efforts on this critical part of the search stack. By not taking queries so literally, we will literally revolutionize search!

Nicolas Fiorini

Senior Director of AI Engineering at Algolia

3 年

This paper is what convinced me to stay away from dense search in most cases (as it’s been mentioned, depending on the requirements and the scalability, this might change): https://arxiv.org/pdf/1904.09171.pdf In most cases, really understanding the query is the most critical step or bottleneck, and I don’t think we should hope we can bypass it using dense search only. Now, using dense search after we’ve understood the query, that’s another story but I feel we’re not there yet.

1 次回应

Peter Dixon-Moses

?? Delightful Discovery Experiences ?? | Product Development and Engineering Partner (Search / Discovery / Relevance / GenAI)

3 年

Among other things, an effective dense (first-phase) retriever should promote recall ("expansionist" / surfacing true positives which may have received false negative treatment from bm25). To Daniel's point, query understanding is often used to drive filtering/pruning (via domain/behavioral knowledge), rewriting filters into unfiltered queries as a ("reductionist") method of improving precision... and in this case I can see how the two approaches could work against each other. However, incorporating signal *boosts* from query understanding as a second-phase reranker, seems like a complementary application of the strengths of the two approaches for existing pipelines with a (precision-focused) QU step. And to JKB's point, as training techniques improve, the hope is that much of the domain nuance from QU can be embedded in future dense retriever models to shorten the query pipeline and decrease latency.

4 次回应

Jo Kristian Bergum

Retrieval Evangelist

3 年

I don’t understand your point. Using a trained dense retriever instead of sparse (bm25) does not rule out using query understanding/classification.?

查看更多评论

要查看或添加评论，请登录

Daniel Tunkelang的更多文章

ChatGPT, Are You Just Telling Me What I Want to Hear?

2025年3月3日

ChatGPT, Are You Just Telling Me What I Want to Hear?

These days, the Turing Test — which Turing originally called the “imitation game” — feels hopelessly outdated. With…
Not All Recall is Created Equal

2025年2月24日

Not All Recall is Created Equal

Search application developers constantly navigate tradeoffs, particularly between precision and recall. Precision…

1 条评论
To Bot or Not to Bot: It Depends on the Question

2025年1月31日

To Bot or Not to Bot: It Depends on the Question

I was one of Quora’s earliest users. I earned Top Writer status for several years and even made some money through…
Ground Truth: A Useful Fiction

2025年1月14日

Ground Truth: A Useful Fiction

A key concern about AI is that models “hallucinate” — technical jargon for saying that they make up things that look…

5 条评论
Conjunction, Disjunction, What’s Your Function?

2025年1月6日

Conjunction, Disjunction, What’s Your Function?

Like many folks of my generation, I grew up on Schoolhouse Rock, a series of animated educational shorts that aired…
Modeling Queries as Bags of Documents

2024年12月2日

Modeling Queries as Bags of Documents

Last week, I had the honor of presenting “Modeling Queries as Bags of Documents” at Search Solutions 2024 with Aritra…
Documents, Queries, and Categories

2024年11月25日

Documents, Queries, and Categories

I have published a number of posts and presentations about the bag-of-documents model, which essentially represents…
Where Do Categories Come From?

2024年11月20日

Where Do Categories Come From?

In my previous post, I argued that categories are fundamental for search applications. I characterized a robust set of…

1 条评论
Categories are Fundamental for Search

2024年11月18日

Categories are Fundamental for Search

As a search consultant, I have learned to be flexible about structured data. However, I do insist on content being…

5 条评论
Quo Vadis Nunc, Quora?

2024年9月25日

Quo Vadis Nunc, Quora?

I was one of Quora’s earliest users, earned Top Writer status for a few years, and topped the leaderboard as a 9-time…

2 条评论

See all articles

AI for Query Understanding

Daniel Tunkelang

Query Understanding

领英推荐

Daniel Tunkelang的更多文章

社区洞察

其他会员也浏览了

?? Where can deep learning take us?

AI Research News Updates: Issue 10 (Jan 26-Feb 1, 2022)

Seeing the World Through AI – The Role of Deep Learning in Visual Tasks

Understanding AI Algorithms: A Primer

Deep Learning Demystified: Key Concepts for Easy Understanding

Deep Dive: Building GPT from scratch - part 3

Anatomy of the Beast with many heads! [with code]

Understanding LLMs from scratch: Part 1

Building an AI-Powered Image Similarity Search: A Step-by-Step Guide

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

领英推荐

Daniel Tunkelang的更多文章

ChatGPT, Are You Just Telling Me What I Want to Hear?

Not All Recall is Created Equal

To Bot or Not to Bot: It Depends on the Question

Ground Truth: A Useful Fiction

Conjunction, Disjunction, What’s Your Function?

Modeling Queries as Bags of Documents

Documents, Queries, and Categories

Where Do Categories Come From?

Categories are Fundamental for Search

Quo Vadis Nunc, Quora?

社区洞察

其他会员也浏览了

?? Where can deep learning take us?

AI Research News Updates: Issue 10 (Jan 26-Feb 1, 2022)

Seeing the World Through AI – The Role of Deep Learning in Visual Tasks

Understanding AI Algorithms: A Primer

Deep Learning Demystified: Key Concepts for Easy Understanding

Deep Dive: Building GPT from scratch - part 3

Anatomy of the Beast with many heads! [with code]

Understanding LLMs from scratch: Part 1

Building an AI-Powered Image Similarity Search: A Step-by-Step Guide

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions