The Art of Search Ranking: Leveraging Solr Relevance for Impactful Searches
@Copyright 2023 Rajat Singh

The Art of Search Ranking: Leveraging Solr Relevance for Impactful Searches

This article can be treated as a continuation of my earlier article in which I explained how Solr works under the hood.

In this article, I will try to explain how Solr Relevance works and what are the techniques in general to improve it. Before jumping to Solr Relevance let's go through the Solr admin console for querying.

Solr Admin console:

In the below-attached image, I have selected the Query Node in the test collection.

@Copyright 2023 Rajat Singh

let's go through each part of the query section

Request Handler: The Request Handler serves as the HTTP server, and by default, it maps requests to the /select endpoint. While it's possible to customize request handling for different endpoints, in most cases, it is not needed.

Query(q): The Query field is where we can pass a query.

q=*:* // All Fields:All Values        

start, rows: Pagination

Query Operation(q.op): defines the default query operator for search queries, specifying whether "AND" or "OR" is used when no operator is provided.

Filter Query(fq): it allows you to narrow down search results without affecting relevancy scoring, making it useful for filtering based on specific criteria.

//example
fq=category:"Electronics" AND price:[100 TO *]        

sort: To sort the result based on the value of the field.

Field List(fl): specify the list of fields you want to include in the search results.

Default field(df): The df parameter in Solr allows you to specify the default field for queries. This parameter ensures that if a user doesn't explicitly specify a field for their query, Solr will search in the specified default field.

Example:

df=title // query term will be search in all the titles of documents        

writer type(wt): response Type

wt=json        

debugQuery: When true given response will include a "debug" section with various details about how the query was parsed, which filters were applied, how the scoring was calculated

defType: query parser, default is lucene.

Highlight(hl): used for highlighting search terms.

q=laptop&hl=true&hl.fl=content
//q is the query parameter, set to search for "laptop."
//hl=true enables highlighting.
//hl.fl=content specifies the field(s) to be highlighted(Content).        

Facet: facet allows for the categorization and counting of search results based on specific fields, providing a way to analyze data and present aggregated information to users.

1. Facet Field: The "facet.field" parameter is used to specify the fields for which you want to generate facets.

Example:
q=laptop&facet=true&facet.field=category
// Solr will provide the counts of documents that fall under different categories.        

2. Facet Query: The "facet.query" parameter enables to specifying custom queries for facet counts. This can be useful when you want to define specific criteria for counting certain subsets of your data.

Example:
q=laptop&facet=true&facet.query=price:[0 TO 100] AND inStock:true
// Solr will provide the count of documents that have a price between 0 and 100 and are in stock.        

Spatial: Spatial search in Solr involves querying for data based on their geographical location or spatial coordinates

spell check: refers to a feature that improves search results by suggesting corrected or alternative spellings for query terms. This is useful for enhancing the user experience, especially when users might make typographical errors or use different variants of a term.

Raw Query: all the above parameters can be passed here as raw queries.

now that we are clear about the building blocks of the Solr admin panel and querying let's get through Relevance.

what is Solr Relevance?

Solr Relevance is a crucial concept in the realm of search engines. Imagine you're shopping online, looking for the perfect pair of shoes. Solr Relevance helps ensure that the results you see are not just a random list of shoes, but are tailored to what you're likely to find most interesting.

When you type in your search query, Solr works behind the scenes. It doesn't just find products with your keyword; it ranks them based on relevance. Let's say you're searching for "running shoes." Solr will consider various factors to decide which shoes should appear first. These factors may include:

  1. Keyword Matching: It checks if the product title or description contains the words "running shoes."
  2. Popularity: Solr might also consider which shoes other customers have liked and bought when searching for running shoes.
  3. Price: If you have a budget in mind, Solr can show you shoes that match your price range.

Solr Relevance can be different for two users searching for the same thing. For instance:

  • User A might be an athlete looking for high-performance running shoes, so Solr will prioritize those.
  • User B might be on a budget, so Solr will show them affordable running shoes.

In this way, Solr Relevance ensures that the search results are tailored to each user's preferences, making the online shopping experience more personalized and efficient.

In conclusion, Solr Relevance ensures that the products you see are not just random but carefully chosen to match your needs and preferences.

How Solr Relevance Works?

Imagine you're on an e-commerce website, searching for a "smartphone." You type in your query and hit enter. Behind the scenes, a complex process unfolds to give you the most relevant results. This process is driven by Solr Relevance.

The Journey of a Search Query:

1. User Query: You input your search term, "smartphone." Solr starts by analyzing this query.

2. Tokenization: Solr breaks down your query into smaller units called tokens. In this case, "smartphone" may become two tokens: "smart" and "phone."

3. Parsing: Solr employs a parser to understand what to do with these tokens. Here's where it gets interesting. Solr offers various parsers to handle different types of data. For text-based searches like this, it uses a query parser, which understands how to interpret your query and turn it into a structured search request.

@Copyright 2023 Rajat Singh


Let's explore each parser with examples to understand how they work in Solr.

1. Common Query Parser:

The Common Query Parser is the default parser for basic text queries. It's a good choice when you have simple search requirements.

Example:

Let's say you want to search for documents containing the word "apple." You would enter the query as:

q=apple        

This query will return all documents containing the word "apple."

2. DisMax Query Parser:

The DisMax Query Parser is user-friendly and allows for more complex queries. It's useful when you want to search across multiple fields.

Example:

Imagine you're searching for smartphones, but you want to prioritize documents where the word "smartphone" appears in either the product name or description. You would use DisMax like this:

q=smartphone
defType=dismax
qf=name^2 description        

In this example, we're giving more weight (a boost factor of 2) to the "name" field compared to the "description" field.

3. EDismax Query Parser:

EDismax is an extended version of DisMax, offering even more flexibility for advanced queries.

Example:

Suppose you're searching for smartphones, but you also want to filter by price. You might use EDismax like this:

q=smartphone
defType=edismax
qf=name^2 description
fq=price:[200 TO 500]        

In this example, we're searching for smartphones with the keyword "smartphone" in the name or description and filtering the results to only include products with prices between 200 and 500.

These examples showcase how different parsers in Solr allow you to tailor your search queries to your specific needs, from simple text searches to more complex and precise searches that consider multiple factors. Each parser offers a different level of control and flexibility, ensuring that you can retrieve the most relevant results for your use case.

4. Query Execution: Solr takes the parsed query and executes it against the indexed data. It searches through all the documents, looking for matches based on the query's criteria.

5. Scoring: Here's where Solr's Relevance really shines. It assigns a score to each document based on how well it matches your query. This score considers factors like keyword matching, document popularity, and other relevance metrics.

6. Ranking: Solr then sorts the results by their scores, presenting the most relevant documents at the top. This is what you see on your screen when you search for "smartphone" - a list of smartphones, but with the most relevant ones appearing first.

Solr Relevance is a sophisticated system that uses parsers to understand your search queries, searches through indexed data, scores documents based on boost value and other factors, and ranks them for presentation.

Manipulation of the Search result:

There are multiple ways we can manipulate the search result relevancy in Solr and for this article, I will be taking the example of dismax parser.

Prior to displaying the results, Solr assigns scores to all documents and arranges the order of results based on these scores.

let's understand the score calculation logic.

Score Calculation In Solr:

The total score of a document is a result of a weighted combination of scores from different fields, with each field's score calculated using a specific scoring algorithm. It's not a simple sum of scores from all fields but for our understanding, we can illustrate it something like below.

Total Score Count of a document = Addition of All Fields Score[field1Score+ field 2 Score …]
field1Score = boost * idf * tf

boost: can be provided to a field while querying

Inverse Document Frequency (idf): Inverse document frequency; a measure of whether the term is common or rare across all documents.

Term Frequency: It is used to find the total number of terms with the specified name in each document.        

1. Boost/Deboost based on field value in bq:

Assign the integer greater than 1 to boost and between 0-1 to deboost on a field.

Example:
is_instock:true^4000
//boost: if true document will get 4k boost factor. so score will be //4k*idf*tf

product_classification:"unclassified"^0.5
//deboost: if true document will get .5 boost factor. so score will?be .5*idf*tf, so reduced score for that field        

2. Boost/Deboost based on the conditions:

if(termfreq(product_classification,"unclassified"),0,1000)        

Here I am checking if the product_classification value is "unclassified" and then assigning a 0 boost value else value of 1000.

Writing boost Query in SAP Hybris:

we can create a Search Query Template in indextype and define all the search-related configurations there. It has a field for defining the boost query as well.

Bonus:

below are some search terms that are frequently used while working with search engines.

1. Fuzzy Search:

In a fuzzy search, the search engine not only looks for the exact query term but also considers similar terms or terms with slight differences. It's particularly useful for correcting spelling errors or accommodating variations in a term while still providing relevant search results.

2. Wildcards:

Wildcards are symbols (e.g., *, ?) that represent unknown characters or sequences of characters in a search query. They allow for more flexible and broad search patterns, similar to fuzzy search, but they don't consider the similarity of characters.

3. Phonetic Search:

Phonetic search involves finding words that sound similar or have similar pronunciation to the query term.

4. Synonym Search:

Synonym search involves considering synonyms or related terms when performing a search. It expands the search to include words with similar meanings to the query term, enhancing the comprehensiveness of the search results.

5. Stemming:

Stemming is a process of reducing words to their base form, allowing the search engine to match variations of a word. For example, "running" and "runs" would be stemmed to "run."

Other REFERENCES: You can go through the below link to understand more about different parsers and functions

Function Queries | Apache Solr Reference Guide 8.3

要查看或添加评论,请登录

Rajat Singh的更多文章

社区洞察

其他会员也浏览了