Seriously or Literally?

The other day, I posted about the need for search applications to take searchers seriously, not literally. This need holds even when searchers practically ask to be taken literally.

Quotation Marks

By using quotation marks, a searcher — already signaling a relatively advanced knowledge of search engines — explicitly asks the application to only retrieve results with those exact words in that order. This seems an incontrovertible case for taking the searcher literally. Or is it?

What if the quoted phrase is misspelled? Should a search for the quoted phrase “luois viutton” include results for Louis Vuitton? What about stemming and lemmatization? Should a search for “dress shirt” retrieve different results than a search for “dress shirts” or rank them differently depending on whether the noun is singular or plural? After all, the quoting may be more to ensure that the application does not retrieve shirt dresses.

Situations like these call for nuance. On one hand, a search application should enable searchers to retrieve results containing a seemingly misspelled word (e.g., “solr”) or a word with a particular ending. On the other hand, a Bayesian perspective argues for considering the prior. In particular, the searcher may have used quotation marks to specify the word order or to avoid the word being optionalized through query relaxation without realizing that doing so might override spelling correction.

A compromise is for the search application to show results for one query interpretation and offer a “did you mean” clarification dialogue for the other interpretation. Which interpretation should be primary? Ideally, that decision should be a Bayesian probability calculation. However, given the challenge of calculating this probability, it might be better to provide a consistent experience and always treat the literal interpretation as the primary one. Or to experiment and learn from searcher behavior.

Known-Item Search

While most search queries today are short and generic (e.g., “mens shoes”), some queries are much longer — particularly when the searcher is trying to find a document or product by name.

For example, a comparison shopper might copy and paste a product name from one site into another’s search box, leading to queries like “Gaggia Brera Super-Automatic Espresso Machine, Small, 40 fluid ounces, Silver” (the unofficial sponsor of my waking hours) or “Faceted Search (Synthesis Lectures on Information Concepts, Retrieval, and Services, 5)”. Or one of my fans might search on LinkedIn for “Dan Tunkelang”.

On one hand, recognizing these queries as instances of known-item search suggests a conservative retrieval approach that emphasizes precision and takes the query literally. On the other hand, it is clear from these examples that a product, document, or person can have many name or title variants.

Ironically, query expansion and relaxation, despite their primary association with increasing recall, can be critical for matching the specific intent of known-item search queries to the results they target. For example, if someone searches for “Dan Tunkelang”, the application should return my profile page rather than a result that happens to include the words “Dan” and “Tunkelang” in its body text. Likewise, copy-and-paste product searches for my espresso maker or faceted search book should retrieve the intended result on any site that carries them, regardless of the title variant.

Seriously, Don’t Take Searchers Literally!

I hope these examples reinforce the need to take searchers seriously rather than literally. And we have not even touched on messier issues like handling numbers. Searchers may know exactly what they want, but they cannot ensure that their queries exactly match the content representation in the search index. So, regardless of the temptation to take searchers literally, serious search applications require query understanding.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了