登录查看更多内容

The Art of Search Ranking: Leveraging Solr Relevance for Impactful Searches

Rajat Singh

Lead Developer @Arrow Electronics. Developing and Enhancing SAP E-Commerce(Hybris) Applications. Passionate Coder/Thinker. Lets Connect!

发布日期: 2023年9月17日

This article can be treated as a continuation of my earlier article in which I explained how Solr works under the hood.

In this article, I will try to explain how Solr Relevance works and what are the techniques in general to improve it. Before jumping to Solr Relevance let's go through the Solr admin console for querying.

Solr Admin console:

In the below-attached image, I have selected the Query Node in the test collection.

let's go through each part of the query section

Request Handler: The Request Handler serves as the HTTP server, and by default, it maps requests to the /select endpoint. While it's possible to customize request handling for different endpoints, in most cases, it is not needed.

Query(q): The Query field is where we can pass a query.

q=*:* // All Fields:All Values

start, rows: Pagination

Query Operation(q.op): defines the default query operator for search queries, specifying whether "AND" or "OR" is used when no operator is provided.

Filter Query(fq): it allows you to narrow down search results without affecting relevancy scoring, making it useful for filtering based on specific criteria.

//example
fq=category:"Electronics" AND price:[100 TO *]

sort: To sort the result based on the value of the field.

Field List(fl): specify the list of fields you want to include in the search results.

Default field(df): The df parameter in Solr allows you to specify the default field for queries. This parameter ensures that if a user doesn't explicitly specify a field for their query, Solr will search in the specified default field.

Example:

df=title // query term will be search in all the titles of documents

writer type(wt): response Type

wt=json

debugQuery: When true given response will include a "debug" section with various details about how the query was parsed, which filters were applied, how the scoring was calculated

defType: query parser, default is lucene.

Highlight(hl): used for highlighting search terms.

q=laptop&hl=true&hl.fl=content
//q is the query parameter, set to search for "laptop."
//hl=true enables highlighting.
//hl.fl=content specifies the field(s) to be highlighted(Content).

Facet: facet allows for the categorization and counting of search results based on specific fields, providing a way to analyze data and present aggregated information to users.

1. Facet Field: The "facet.field" parameter is used to specify the fields for which you want to generate facets.

Example:
q=laptop&facet=true&facet.field=category
// Solr will provide the counts of documents that fall under different categories.

2. Facet Query: The "facet.query" parameter enables to specifying custom queries for facet counts. This can be useful when you want to define specific criteria for counting certain subsets of your data.

Example:
q=laptop&facet=true&facet.query=price:[0 TO 100] AND inStock:true
// Solr will provide the count of documents that have a price between 0 and 100 and are in stock.

Spatial: Spatial search in Solr involves querying for data based on their geographical location or spatial coordinates

spell check: refers to a feature that improves search results by suggesting corrected or alternative spellings for query terms. This is useful for enhancing the user experience, especially when users might make typographical errors or use different variants of a term.

Raw Query: all the above parameters can be passed here as raw queries.

now that we are clear about the building blocks of the Solr admin panel and querying let's get through Relevance.

what is Solr Relevance?

Solr Relevance is a crucial concept in the realm of search engines. Imagine you're shopping online, looking for the perfect pair of shoes. Solr Relevance helps ensure that the results you see are not just a random list of shoes, but are tailored to what you're likely to find most interesting.

When you type in your search query, Solr works behind the scenes. It doesn't just find products with your keyword; it ranks them based on relevance. Let's say you're searching for "running shoes." Solr will consider various factors to decide which shoes should appear first. These factors may include:

Keyword Matching: It checks if the product title or description contains the words "running shoes."
Popularity: Solr might also consider which shoes other customers have liked and bought when searching for running shoes.
Price: If you have a budget in mind, Solr can show you shoes that match your price range.

Solr Relevance can be different for two users searching for the same thing. For instance:

User A might be an athlete looking for high-performance running shoes, so Solr will prioritize those.
User B might be on a budget, so Solr will show them affordable running shoes.

In this way, Solr Relevance ensures that the search results are tailored to each user's preferences, making the online shopping experience more personalized and efficient.

In conclusion, Solr Relevance ensures that the products you see are not just random but carefully chosen to match your needs and preferences.

How Solr Relevance Works?

Imagine you're on an e-commerce website, searching for a "smartphone." You type in your query and hit enter. Behind the scenes, a complex process unfolds to give you the most relevant results. This process is driven by Solr Relevance.

The Journey of a Search Query:

1. User Query: You input your search term, "smartphone." Solr starts by analyzing this query.

2. Tokenization: Solr breaks down your query into smaller units called tokens. In this case, "smartphone" may become two tokens: "smart" and "phone."

3. Parsing: Solr employs a parser to understand what to do with these tokens. Here's where it gets interesting. Solr offers various parsers to handle different types of data. For text-based searches like this, it uses a query parser, which understands how to interpret your query and turn it into a structured search request.

Let's explore each parser with examples to understand how they work in Solr.

1. Common Query Parser:

The Common Query Parser is the default parser for basic text queries. It's a good choice when you have simple search requirements.

领英推荐

Extract Data Newsletter

Zyte 1 个月前

13K Strong & Growing: Latest Highlights from the…

Zyte 1 个月前

Streamlit how to guide: advanced tips for Data…

Amp X 1 年前

Example:

Let's say you want to search for documents containing the word "apple." You would enter the query as:

q=apple

This query will return all documents containing the word "apple."

2. DisMax Query Parser:

The DisMax Query Parser is user-friendly and allows for more complex queries. It's useful when you want to search across multiple fields.

Example:

Imagine you're searching for smartphones, but you want to prioritize documents where the word "smartphone" appears in either the product name or description. You would use DisMax like this:

q=smartphone
defType=dismax
qf=name^2 description

In this example, we're giving more weight (a boost factor of 2) to the "name" field compared to the "description" field.

3. EDismax Query Parser:

EDismax is an extended version of DisMax, offering even more flexibility for advanced queries.

Example:

Suppose you're searching for smartphones, but you also want to filter by price. You might use EDismax like this:

q=smartphone
defType=edismax
qf=name^2 description
fq=price:[200 TO 500]

In this example, we're searching for smartphones with the keyword "smartphone" in the name or description and filtering the results to only include products with prices between 200 and 500.

These examples showcase how different parsers in Solr allow you to tailor your search queries to your specific needs, from simple text searches to more complex and precise searches that consider multiple factors. Each parser offers a different level of control and flexibility, ensuring that you can retrieve the most relevant results for your use case.

4. Query Execution: Solr takes the parsed query and executes it against the indexed data. It searches through all the documents, looking for matches based on the query's criteria.

5. Scoring: Here's where Solr's Relevance really shines. It assigns a score to each document based on how well it matches your query. This score considers factors like keyword matching, document popularity, and other relevance metrics.

6. Ranking: Solr then sorts the results by their scores, presenting the most relevant documents at the top. This is what you see on your screen when you search for "smartphone" - a list of smartphones, but with the most relevant ones appearing first.

Solr Relevance is a sophisticated system that uses parsers to understand your search queries, searches through indexed data, scores documents based on boost value and other factors, and ranks them for presentation.

Manipulation of the Search result:

There are multiple ways we can manipulate the search result relevancy in Solr and for this article, I will be taking the example of dismax parser.

Prior to displaying the results, Solr assigns scores to all documents and arranges the order of results based on these scores.

let's understand the score calculation logic.

Score Calculation In Solr:

The total score of a document is a result of a weighted combination of scores from different fields, with each field's score calculated using a specific scoring algorithm. It's not a simple sum of scores from all fields but for our understanding, we can illustrate it something like below.

Total Score Count of a document = Addition of All Fields Score[field1Score+ field 2 Score …]
field1Score = boost * idf * tf

boost: can be provided to a field while querying

Inverse Document Frequency (idf): Inverse document frequency; a measure of whether the term is common or rare across all documents.

Term Frequency: It is used to find the total number of terms with the specified name in each document.

1. Boost/Deboost based on field value in bq:

Assign the integer greater than 1 to boost and between 0-1 to deboost on a field.

Example:
is_instock:true^4000
//boost: if true document will get 4k boost factor. so score will be //4k*idf*tf

product_classification:"unclassified"^0.5
//deboost: if true document will get .5 boost factor. so score will?be .5*idf*tf, so reduced score for that field

2. Boost/Deboost based on the conditions:

if(termfreq(product_classification,"unclassified"),0,1000)

Here I am checking if the product_classification value is "unclassified" and then assigning a 0 boost value else value of 1000.

Writing boost Query in SAP Hybris:

we can create a Search Query Template in indextype and define all the search-related configurations there. It has a field for defining the boost query as well.

Bonus:

below are some search terms that are frequently used while working with search engines.

1. Fuzzy Search:

In a fuzzy search, the search engine not only looks for the exact query term but also considers similar terms or terms with slight differences. It's particularly useful for correcting spelling errors or accommodating variations in a term while still providing relevant search results.

2. Wildcards:

Wildcards are symbols (e.g., *, ?) that represent unknown characters or sequences of characters in a search query. They allow for more flexible and broad search patterns, similar to fuzzy search, but they don't consider the similarity of characters.

3. Phonetic Search:

Phonetic search involves finding words that sound similar or have similar pronunciation to the query term.

4. Synonym Search:

Synonym search involves considering synonyms or related terms when performing a search. It expands the search to include words with similar meanings to the query term, enhancing the comprehensiveness of the search results.

5. Stemming:

Stemming is a process of reducing words to their base form, allowing the search engine to match variations of a word. For example, "running" and "runs" would be stemmed to "run."

Other REFERENCES: You can go through the below link to understand more about different parsers and functions

Function Queries | Apache Solr Reference Guide 8.3

Developers Everyday

413 位关注者

要查看或添加评论，请登录

Rajat Singh的更多文章

Elevate Your Skills: 6 Must-Know Tips for Spring Boot Developers

2025年1月8日

Elevate Your Skills: 6 Must-Know Tips for Spring Boot Developers

Hot Reload on Save to reduce the development downtime Optimize Data Fetching with Spring Data Projections Eliminate…
Exploring the JAVA Vector API for High-Performance Computing

2025年1月1日

Exploring the JAVA Vector API for High-Performance Computing

Dear Developers, I want to take you on a journey today, one that explores the worlds of efficiency, performance, and…
From Java 8 to 21: Exploring the evolution and the motivation Behind

2024年12月30日

From Java 8 to 21: Exploring the evolution and the motivation Behind

Dear Java Developers, In the always-changing world of programming, Java is still one of the most popular languages on…
I did 20-plus AI courses, Here is the summary and the Best Learning resources.

2024年12月8日

I did 20-plus AI courses, Here is the summary and the Best Learning resources.

I already integrated ChatGPT into my daily activities, but I still had countless questions, such as: 1. When and for…

5 条评论
Securing Your Data: The Ultimate Local LLM Setup Guide

2024年12月4日

Securing Your Data: The Ultimate Local LLM Setup Guide

I had already integrated ChatGPT into my daily activities, but I was always cautious about sharing private data. I…
Setting ANT Environment Dynamically Before Building Hybris Application.

2024年11月16日

Setting ANT Environment Dynamically Before Building Hybris Application.

Problem Switch over multiple folders to come to /platform directory just to set and execute ANT commands. Executing .

2 条评论
Hacks that Simplified My Local Git Repository

2024年11月7日

Hacks that Simplified My Local Git Repository

Dear Developers, most of our time goes to working with Git commands. It’s a lot of “git this, git that,” right? I…
How to Customize Hybris Bean a Step Further!

2024年10月30日

How to Customize Hybris Bean a Step Further!

Dear SAP Commerce Developers, Before we dive into customizing which gets generated by registering it in the *, let's…
QueryBuilder is okay. But can we have a cleaner approach to Dao Class?

2024年10月28日

QueryBuilder is okay. But can we have a cleaner approach to Dao Class?

Dear SAP Commerce Developers, Every time we introduce a new item, we typically create a dedicated DAO (Data Access…
I don't like DAO Queries to be written this way!

2024年10月25日

I don't like DAO Queries to be written this way!

Dear SAP Commerce Developers, Today, I want to share why I’ve never been a fan of writing DAO queries with long strings…

4 条评论

See all articles

The Art of Search Ranking: Leveraging Solr Relevance for Impactful Searches

Rajat Singh

Lead Developer @Arrow Electronics. Developing and Enhancing SAP E-Commerce(Hybris) Applications. Passionate Coder/Thinker. Lets Connect!

Solr Admin console:

what is Solr Relevance?

How Solr Relevance Works?

领英推荐

Writing boost Query in SAP Hybris:

Bonus:

Developers Everyday

413 位关注者

Rajat Singh的更多文章

社区洞察

其他会员也浏览了

Leveraging Semantic Search for Better Decision-Making

Global IP Database API At Your Fingertips

OPEN SOURCE DATA SEARCH

How to Scrape Google Search and Collect Data for Strategic Planning

Transform Your Search Experience with AWS OpenSearch: Ultimate Guide to Master OpenSearch!

What is Metadata?

What is Metadata?

What is metadata?

Effective Query Triage

Understanding Databases like Graph, Vector, and Relational Databases with Real-World Examples

Solr Admin console:

what is Solr Relevance?

How Solr Relevance Works?

领英推荐

Writing boost Query in SAP Hybris:

Bonus:

Developers Everyday

413 位关注者

Rajat Singh的更多文章

Elevate Your Skills: 6 Must-Know Tips for Spring Boot Developers

Exploring the JAVA Vector API for High-Performance Computing

From Java 8 to 21: Exploring the evolution and the motivation Behind

I did 20-plus AI courses, Here is the summary and the Best Learning resources.

Securing Your Data: The Ultimate Local LLM Setup Guide

Setting ANT Environment Dynamically Before Building Hybris Application.

Hacks that Simplified My Local Git Repository

How to Customize Hybris Bean a Step Further!

QueryBuilder is okay. But can we have a cleaner approach to Dao Class?

I don't like DAO Queries to be written this way!

社区洞察

其他会员也浏览了

Leveraging Semantic Search for Better Decision-Making

Global IP Database API At Your Fingertips

OPEN SOURCE DATA SEARCH

How to Scrape Google Search and Collect Data for Strategic Planning

Transform Your Search Experience with AWS OpenSearch: Ultimate Guide to Master OpenSearch!

What is Metadata?

What is Metadata?

What is metadata?

Effective Query Triage

Understanding Databases like Graph, Vector, and Relational Databases with Real-World Examples