登录查看更多内容

How events are used to improve search results automatically

Algolia

The leading provider of AI search solutions, serving over 17,000 businesses and 500,000 developers globally.

发布日期: 2023年8月30日

In the world of AI search and discovery, events are the fuel that powers accuracy and optimization. AI models heavily rely on vast amounts of high-quality event data to learn, make accurate predictions, and drive meaningful improvements.

This article will explain how the AI search machine learning models are using events to optimize results for Algolia NeuralSearch.

Three steps of search processing

If you’re new to search, it’s worth pausing for a moment to learn how search works. Every query is processed in three steps: query understanding, retrieval, and ranking.

The first step is query understanding, when a search engine parses and structures a query to better enable the input to be understood
Then, the engine retrieves results and orders them from most to least relevant
Finally, any additional rules or promotions can be applied to rank or re-rank results from most to least relevant

Historically, keyword search engines used term frequency to determine relevance and ranking for a given query. New machine learning models move beyond keyword matching to query understanding. When it comes to AI, each term is converted into a mathematical expression called a vector embedding. Queries are also vectorized. Then the machine learning models can mathematically compare a search query with a search record to understand its meaning.

Vector search is a way to use vector embeddings to find related objects that have similar characteristics using machine learning models that detect semantic relationships between objects in an index. The image above shows a simplified view of vector embeddings in 3D vector space. Real-world vectors can have hundreds of dimensions.

Why is event data important for search AI?

AI search algorithms can understand the searcher’s intent, but ordering results from most to least important is harder. For example, if someone searches your online clothing site for a “blue top” an AI search engine will understand that “top” is a synonym for “shirt” or “sweater” but how it ranks results matters just as much — your visitors don’t want to comb through pages of content to find what they’re looking for. Events help improve that relevance.?

Events can be used to determine which fields best represent the meaning of a record (and index), and with what weighting. When I say “fields” I’m referring to the fields of a record in an index, such as the example below. Each field can be assigned a “weight” that can be used to boost or bury a result for any given search query. Technically, we calculate the relationship between the query and the events (as signals) to establish the significance of each field in determining the outcome; i.e. which fields should be considered to optimize for the outcome represented by the event (e.g., a conversion).

This process trains an ‘expression’ of fields and associated weightings, which is then used to ‘vectorize’ each record. The expression must be provided for the engine to perform the vectorization process.

Can AI search be configured without events?

Technically, yes. An expression is simply a list of fields (from the record), and associated weightings (a numerical value between 0.0 and 1.0). However, determining which fields to use and to what weighting is extremely difficult for a person. To achieve a near-optimal expression is practically impossible, but to even generate an expression which yields reasonable results poses many challenges.

Consider the following real example; an expression trained on conversions, with the record excerpt shown above.

领英推荐

What sets great retrieval augmented generation apart —…

Glean 9 个月前

What is end-to-end AI search?

Algolia 1 年前

AI Weekly Digest - May 13 2024

PA Media 9 个月前

The selected fields appear reasonable enough, as are the ordering of the weightings. However, note that the description field is not included in the expression, although to a person, it may intuitively ‘best represent the meaning’ of the record. Also bear in mind, by comparison to many customers, this is an example of a better-structured record.

Consider instead the following real example; an expression trained on clicks, with the record excerpt from another customer’s index, with (typically) messier data quality.

Again, in this example, description is not used, but neither is title. tagKeyWords and saleKeyWords include many repeated words, and both tagName and h1 contain the same information. The inclusion of variantFirmness – as a relatively very important field – may also come as a surprise to a user.

These two examples are intended to illustrate the difficulties associated with training an optimal expression. With events, we can remove this complexity, and automatically determine which fields should be considered when training the expression, and with what associated weighting.

Why use machines to train the data?

One question we get is why we need machine learning to determine the importance of each field. I mean, you can evaluate each field and determine which ones are most important, right??

We learned this first hand when building NeuralSearch. Initially, the neural expression was being hand-crafted by our team. We had years of experience with customer datasets and search configurations. Even in those highly-capable hands, the resulting expressions were very different.

Consider the two customer examples from above:

Most of the selected fields have been appropriately identified, and in the same weighted ‘order’; however, the relative weightings are different. The nDCG@10 — a method we can use to measure the relevance for a particular query/results pair — for the expression trained by events was measured at ~0.6; the nDCG@10 for the expression configured by the human expert was measured at ~0.4. This is an extremely significant difference in search performance, to have been only affected by the expression.

There are more significant differences between these two expressions: most of the fields selected by the human expert are not included in the event-trained expression, and the weighting scales are not close.

Additionally, NeuralSearch is continuously improving and field weights are adjusted automatically over time. Search trends are continuously changing, new long tail queries are created, new products and pages are added or removed from your index. It necessitates automatic updating behind the scenes.?

Current Algolia customers who already have events connected to transition to NeuralSearch seamlessly provided they have collected sufficient data to provide feedback to the machine learning algorithms. New customers will need to set up events and generate enough data to determine the best field weights to help overcome the cold start problem.?

Sign up today to join the waitlist for the self-service edition of Algolia NeuralSearch. By starting today, you can configure events and be ready to jump in with AI-powered search when it’s available!

How events are used to improve search results automatically

Algolia

The leading provider of AI search solutions, serving over 17,000 businesses and 500,000 developers globally.

Three steps of search processing

Why is event data important for search AI?

Can AI search be configured without events?

领英推荐

Why use machines to train the data?

Hashing It Out: AI Newsletter

11,463 位关注者

Algolia的更多文章

社区洞察

其他会员也浏览了

Hyperight Content Digest #25 - New Content Linked to the World of Data and AI

Business Intelligence Before Artificial Intelligence: Going Slow to Go Fast

DuckDuckGo Joins The AI Race With DuckAssist

Top Artificial Intelligence Search Engines to Know in 2024

From Big Data to Smart Data: How AI is Revolutionizing B2B Market Research

DeepSeek and the Future of Enterprise Search Solutions

Building AI on a Foundation of Crawled Data: Exploring the Impact and Implications of Common Crawl

Almost Timely News: ??? Generative AI Needs Better Data, Not Bigger Data (2024-04-14)

RAG vs. RAG-Fusion: Taking AI-Powered Solutions to the Next Level

Almost Timely News: A Deep Dive on Prompt Libraries (2023-11-19)

Three steps of search processing

Why is event data important for search AI?

Can AI search be configured without events?

领英推荐

Why use machines to train the data?

Hashing It Out: AI Newsletter

11,463 位关注者

Algolia的更多文章

Ecommerce product listing pages: what they are and how to optimize them for maximum conversion

Remedying bias in AI development

Merchandising strategy: best practices and tactics to drive more sales

What’s a convolutional neural network and how is it used for image recognition in search?

AI recommendations for your Shopify store

How neural networks drive smarter search results

Using personalization to boost ecommerce ROI: trends, facts, tips

Essential tips for evaluating genAI content quality in search applications

15 best practices for ecommerce on-site search

From static to living interfaces: the evolution of AI personalization

社区洞察

其他会员也浏览了

Hyperight Content Digest #25 - New Content Linked to the World of Data and AI

Business Intelligence Before Artificial Intelligence: Going Slow to Go Fast

DuckDuckGo Joins The AI Race With DuckAssist

Top Artificial Intelligence Search Engines to Know in 2024

From Big Data to Smart Data: How AI is Revolutionizing B2B Market Research

DeepSeek and the Future of Enterprise Search Solutions

Building AI on a Foundation of Crawled Data: Exploring the Impact and Implications of Common Crawl

Almost Timely News: ??? Generative AI Needs Better Data, Not Bigger Data (2024-04-14)

RAG vs. RAG-Fusion: Taking AI-Powered Solutions to the Next Level

Almost Timely News: A Deep Dive on Prompt Libraries (2023-11-19)