Facets of Faceted?Search

Faceted search is a fascinating topic. It’s a standard feature of site search, and one could write an entire book on the subject. In this post, I’ll focus on some nuances of faceted search that I feel have been neglected in the literature.

Broad Queries vs. Ambiguous Queries

Both search engine developers and users treat facets as useful for refining broad search queries. But there’s a tendency to conflate broad queries with ambiguous queries. There’s an important distinction between the two.

Broad queries are unambiguous but underspecified. For example, the query “shirts” expresses a clear but underspecified intent: it includes shirts for men, women, and children; t-shirts and dress shirts; all colors and materials; etc.

In contrast, ambiguous queries do not express a clear intent. For example, the query “mixers” is ambiguous because it’s unclear whether “mixers” refer to kitchen appliances, sound equipment, or several kinds of industrial machines.

Facets are useful for narrowing down broad, unambiguous queries — especially when the large number of results and underspecified search intent limit the usefulness of ranking. In contrast, it’s better to address ambiguous queries with category disambiguation or some other clarification dialogue.

Finding vs. Exploring

Search queries are not the same as search intent. In particular, broad search queries do not necessarily reflect broad search intent, and that makes a big difference as to how searchers use facets.

Some searchers who type in broad queries know exactly what they’re looking for, but don’t express their narrower search intent in the search box. For example, they may type in “shirts”, even though they have a particular brand of men’s shirts in mind. This can happen for several reasons. They may not know — or may not be able to spell — the right words to express their specific intent. They may not trust the search engine to understand a more specific query — or to return all relevant results for it. Or they simply may prefer to type less — indeed, they may have been nudged to enter a short query by autocomplete. In all of these cases, facets help searchers narrow down the results for their initial broad queries to express a more specific intent.

In other cases, searchers don’t know exactly what they are looking for; rather, they only know enough to express a broad intent. For example, a searcher who doesn’t know much about shirt types, brands, or prices might search for “shirts” in order to see the options for these facets. In general, these searchers use facets as guidance to understand how the inventory is organized, what options are available, and trade-offs among those options. They are using facets to explore and discover.

Facets can serve both searchers who know what they are looking for and those who don’t. But it’s important to keep in mind that these are different use cases. In the first case, facets help searchers find more efficiently; in the second case, facets enable exploration and discovery. These two kinds of searchers tend to have very different kinds of search journeys.

Popularity, Coverage, and Utility

What about the facets themselves? What makes a particular facet useful for a particular query?

A facet for a search query should satisfy the following three properties:

  • Popularity. Facets and their values should represent result aspects that many searchers who perform that query care about, e.g., someone searching for shirts probably cares about their size and color.
  • Coverage. Facets should have high coverage among the results, e.g., shirt size has high coverage if the results are all shirts, but has lower coverage if the results also include pants and shoes.
  • Utility. Selecting a facet value should significantly (but not entirely!) reduce the number of results, and it should filter out a large fraction of top results, e.g. color is a useful facet for “shirts” but not for “white shirts”.


The simplest way to determine the popularity of facets and values for a query is to measure how often searchers who perform that query use it. This approach is simple and direct, but it suffers from presentation bias (the order in which the search interface presents facets) and sparsity (many queries don’t have enough facet usage to derive a robust distribution).An alternative is to infer facets from queries usingentity recognition, e.g., inferring the color facet from the query “black shirts”.Another way to address sparsity is to aggregate queries by category, e.g., aggregating all shirt-related queries. But this approach requires a way to map queries to categories. And the facets won’t be useful for all queries in the category, e.g., color isn’t useful for “white shirts”.


Facet coverage is more straightforward, but there’s a catch: coverage is highly sensitive to the search engine’s retrieval strategy. Most search engines rely on ranking to promote relevant results to the first page, but doing so often hides irrelevant results on later pages (which is a problem when users sort the results by some other attribute, like price). Irrelevant results can drastically skew facet coverage. Hence, it’s important to compute facet coverage based on a retrieval strategy that emphasizes relevance, independent of ranking.

Utility, like coverage, is also sensitive to the retrieval strategy, particularly when it comes to ensuring that a facet value represents at least a meaningful fraction of relevant results. But at least it’s easy to compute what fraction of the first page a facet value filters out. If selecting a facet value leaves the first page essentially unchanged, it wastes the searcher’s time. It’s impossible to eliminate all sources of friction from the search journey, but a bare minimum the search engine can do is to ensure that at every choice it suggests to the searcher — especially facet values — meaningfully changes the search results.

Summary

Faceted search is a simple idea, but it turns out to be quite nuanced in practice. It’s useful for broad queries, but not for ambiguous queries. It can help searchers find more efficiently, but it can also help them explore and discover. Facets should optimize for popularity, coverage, and utility; and determining these requires a retrieval strategy that emphasizes relevance. In short, faceted search is a fascinating topic with many facets!

Very well written Daniel.

Bob Bachand

Technology Architecture Manager @ Accenture

4 年

Thanks Daniel, well done!

要查看或添加评论,请登录

Daniel Tunkelang的更多文章

  • ChatGPT, Are You Just Telling Me What I Want to Hear?

    ChatGPT, Are You Just Telling Me What I Want to Hear?

    These days, the Turing Test — which Turing originally called the “imitation game” — feels hopelessly outdated. With…

  • Not All Recall is Created Equal

    Not All Recall is Created Equal

    Search application developers constantly navigate tradeoffs, particularly between precision and recall. Precision…

    1 条评论
  • To Bot or Not to Bot: It Depends on the Question

    To Bot or Not to Bot: It Depends on the Question

    I was one of Quora’s earliest users. I earned Top Writer status for several years and even made some money through…

  • Ground Truth: A Useful Fiction

    Ground Truth: A Useful Fiction

    A key concern about AI is that models “hallucinate” — technical jargon for saying that they make up things that look…

    5 条评论
  • Conjunction, Disjunction, What’s Your Function?

    Conjunction, Disjunction, What’s Your Function?

    Like many folks of my generation, I grew up on Schoolhouse Rock, a series of animated educational shorts that aired…

  • Modeling Queries as Bags of Documents

    Modeling Queries as Bags of Documents

    Last week, I had the honor of presenting “Modeling Queries as Bags of Documents” at Search Solutions 2024 with Aritra…

  • Documents, Queries, and Categories

    Documents, Queries, and Categories

    I have published a number of posts and presentations about the bag-of-documents model, which essentially represents…

  • Where Do Categories Come From?

    Where Do Categories Come From?

    In my previous post, I argued that categories are fundamental for search applications. I characterized a robust set of…

    1 条评论
  • Categories are Fundamental for Search

    Categories are Fundamental for Search

    As a search consultant, I have learned to be flexible about structured data. However, I do insist on content being…

    5 条评论
  • Quo Vadis Nunc, Quora?

    Quo Vadis Nunc, Quora?

    I was one of Quora’s earliest users, earned Top Writer status for a few years, and topped the leaderboard as a 9-time…

    2 条评论

社区洞察

其他会员也浏览了