Use analyzers with CQL #AstraDB #GenAI #Datastax #LLM
Pankaj Gajjar
Husband|Father|Speaker|Enterprise Architect? (TOGAF?)|MDM(PIM/DAM/MXM) Architect|ACE(Multi Cloud)|ex-AWS CB|Lead Solution Architect @Datastax|Generative AI |AI Consulting
Analyzers process the text in a column to enable term matching for large strings. Combined with vector-based search algorithms, it is easier to find relevant information in large datasets.
Instead of returning only a list of results, analyzers allow you to return specific terms while semantically ordering the results by your query vector.
For example, if you ask an LLM “Tell me about available shoes” and this query is done with vector search against your Astra DB table, you would get a list of several shoes with a variety of features.
SELECT * from products
ORDER BY vector ANN OF [6.0,2.0, ... 3.1,4.0]
LIMIT 10;
Alternatively, you can use the analyzer search to specify a keyword, such as?hiking:
SELECT * from products
WHERE val : 'hiking'
ORDER BY vector ANN OF [6.0,2.0, … 3.1,4.0]
LIMIT 10;
An analyzed index is one where the stored values are derived from the raw column values. The stored values are dependent on the analyzer?Configuration options, which can include tokenization, filtering, and char filtering.
Analyzer Operator for CQL
Example
CREATE TABLE vsearch.products
(id text PRIMARY KEY,
val text);
2. Create an SAI index with the?index_analyzer?option and stemming enabled:
CREATE CUSTOM INDEX vsearch_products_val_idx
ON vsearch.products(val)
USING 'org.apache.cassandra.index.sai.StorageAttachedIndex' WITH OPTIONS = {
'index_analyzer': '{
"tokenizer" : {"name" : "standard"},
"filters" : [{"name" : "porterstem"}]
}'};
3. Insert sample rows:
INSERT INTO vsearch.products (id, val)
VALUES ('1', 'soccer cleats');
INSERT INTO vsearch.products (id, val)
VALUES ('2', 'running shoes');
INSERT INTO vsearch.products (id, val)
VALUES ('3', 'hiking shoes');
4. Query to retrieve your data.
To get data from the rows with?id = '2'?and?id = '3':
SELECT * FROM vsearch.products
WHERE val : 'running';
The analyzer splits the text into case-independent terms.
To get the row with?id = '3'?using a different case, the analyzer standardizes that to perform the match:
SELECT * FROM vsearch.products
WHERE val : 'hiking' AND val : 'shoes';
Use analyzers with CQL #AstraDB #GenAI #Datastax #LLM Courtesy DataStax