Understanding Elasticsearch Indexing: Text vs. Keyword Fields
Picture this: Elasticsearch, the cool cat in the town, analyzing Text before it hits the Inverted Index dance floor, while Keyword, the rebel, just shows up as is.
The key difference lies in how they're treated before hitting the Inverted Index dance floor. Text undergoes analysis, while Keyword remains unaltered. This distinction significantly impacts their behavior during queries.
How to Use Text and Keyword Fields
When indexing a document without explicit mappings, Elasticsearch dynamically assigns both Text and Keyword types. However, for efficiency, it's recommended to define mapping settings based on specific use cases before the indexing begins. Here's a straightforward guide:
Keyword Mapping
curl --request PUT \
--url https://localhost:9200/text-vs-keyword/_mapping \
--header 'content-type: application/json' \
--data '{
"properties": {
"keyword_field": {
"type": "keyword"
}
}
}'
Text Mapping
curl --request PUT \
--url https://localhost:9200/text-vs-keyword/_mapping \
--header 'content-type: application/json' \
--data '{
"properties": {
"text_field": {
"type": "text"
}
}
}'
Multi Fields
curl --request PUT \
--url https://localhost:9200/text-vs-keyword/_mapping \
--header 'content-type: application/json' \
--data '{
"properties": {
"text_and_keyword_mapping": {
"type": "text",
"fields": {
"keyword_type": {
"type":"keyword"
}
}
}
}
}'
How They Work: The Indexing Process
Both Text and Keyword fields are indexed differently in the Inverted Index. The variation in the indexing process plays a significant role when querying Elasticsearch.
Navigating Queries: Text vs. Keyword Fields
Elasticsearch offers two main query types for strings: Match Query and Term Query. Understanding the implications of these queries is crucial for efficient searching.
领英推荐
curl --request POST \
--url 'https://localhost:9200/text-vs-keyword/_doc/_search?size=0' \
--header 'content-type: application/json' \
--data '{
"query": {
"term": {
"keyword_field": "The quick brown fox jumps over the lazy dog"
}
}
}'
Returns a result since both the field type and query are unanalyzed.
curl --request POST \
--url https://localhost:9200/text-vs-keyword/_doc/_search \
--header 'content-type: application/json' \
--data '{
"query": {
"match": {
"keyword_field": "The quick brown fox jumps over the lazy dog"
}
}
}'
Produces a result, as the query undergoes index-time analysis mapped to the Keyword field.
curl --request POST \
--url 'https://localhost:9200/text-vs-keyword/_doc/_search?pretty=' \
--header 'content-type: application/json' \
--data '{
"query": {
"term": {
"text_field": "The"
}
}
}'
Yields no result as the entire sentence is not stored in the Inverted Index.
curl --request POST \
--url 'https://localhost:9200/text-vs-keyword/_doc/_search?pretty=' \
--header 'content-type: application/json' \
--data '{
"query": {
"match": {
"text_field": "The"
}
}
}'
Produces a result, as the query undergoes analysis, matching the lowercased term in the Inverted Index.