ElasticSearch
Elasticsearch is a distributed search and analytics engine built on Apache Lucene. Since its release in 2010, Elasticsearch has quickly become the most popular search engine and is commonly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence use cases.
Elasticsearch?uses inverted index, a data structure that maps words to their document locations, for an efficient search. Elasticsearch's distributed architecture enables the rapid search and analysis of massive amounts of data with almost real-time performance.
How to handler search in ElasticSearch
- Index Your Data
- Create index
PUT https://localhost:9200/products
Note: You can use the POST method instead of PUT to create an index in Elasticsearch. But when you used POST method, elasticsearch will automatically generate a unique name for the index.If you use PUT method, you can create an index with a specific name. So I usually use PUT method to create index
- Define Mapping
PUT https://localhost:9200/products/_mapping
{
"properties": {
"name": {
"type": "text"
},
"price": {
"type": "float"
}
}
}
- Add Documents:
POST https://localhost:9200/products/_doc/1
{
"name": "Product A",
"price": 49.99
}
2. Define a Query:
- Match Query:? It is designed to perform a full-text search on text fields within documents
POST /https://localhost:9200/products/_search
{
"query": {
"match": {
"name": "Product A"
}
}
}
- Bool Query: Combines multiple queries using boolean logic.
POST /your_index_name/_search
{
"query": {
"bool": {
"must": [
{ "match": { "field1": "value1" } },
{ "match": { "field2": "value2" } }
],
"should": [
{ "term": { "field3": "value3" } }
],
"must_not": [
{ "range": { "field4": { "lt": 10 } } }
]
}
}
}
- "must": Use the "must" clause to specify conditions that all documents must satisfy. In the example, it requires documents to match both "field1": "value1" and "field2": "value2".
- "should": The "should" clause specifies optional conditions. At least one of these conditions should match for a document to be considered. In the example, it searches for documents where "field3" is equal to "value3" but doesn't require it.
- “must_not": The "must_not" clause excludes documents that match the specified conditions. In the example, it excludes documents where "field4" is less than 10.
- Range Query: Searches for data within a specific range.
POST /your_index_name/_search
{
"query": {
"range": {
"field_name": {
"gte": 10, // Greater than or equal to
"lte": 100 // Less than or equal to
}
}
}
}
- Term Query: Searches for exact matches.
领英推è
{
"query": {
"term": {
"field_name": "exact_term"
}
}
}
NOTES:
the main difference is that the match query is used for full-text search, allows for partial matches, assigns scores, and can handle fuzziness, while the term query is used for exact term matching and does not assign scores.
- Fuzzy Query: Searches for similar terms
{
"query": {
"fuzzy": {
"field_name": {
"value": "search_term",
"fuzziness": 2
}
}
}
}
For example, if you use a fuzzy query with a fuzziness of 2 for "search_term" equal to "elephent," it will match terms like "elephant," "elephat," "eliphant," and "elephants," among others
- Wildcard Query: Searches with wildcard patterns.
{
"query": {
"wildcard": {
"field_name": "wildcard_pattern"
}
}
}
Asterisk (*) Wildcard:
- The asterisk * represents zero or more characters.
- For example, if you use the wildcard pattern "appl*" in a query, it will match documents containing terms like "apple," "apples," "appliance,"
etc.
Question Mark (?) Wildcard:
- The question mark ? represents a single character.
- For example, if you use the wildcard pattern "gr?y" in a query, it will match documents containing terms like "grey" or "gray."
- Pagination
Pagination divides the search results into smaller pages or "pages" that can be displayed to the user. This is commonly used in web applications to show a limited number of results per page and allow users to navigate through the pages.
To implement pagination, you can use the "from" and "size" parameters in your search request. Here's an example:
POST /your_index_name/_search
{
"from": 0, // Start from the first result
"size": 10, // Return 10 results per page
"query": {
"match": {
"field_name": "search_term"
}
}
}
Distribute architecture of elastic search
The basic architecture of Elasticsearch consists of nodes, which are the basic building blocks of a cluster.
A node is a single instance of Elasticsearch that stores data and participates in the cluster's search and indexing capabilities. Nodes can be installed on a single machine or multiple machines, depending on the size and complexity of the data being indexed.
The nodes in Elasticsearch can be classified into 4 types: data nodes , master-eligible nodes, ingest nodes, and client nodes
- Data Nodes: Nodes responsible for storing and indexing data. They hold primary and replica shards and handle search and indexing operations.
- Ingest Nodes: Nodes that contain an ingest pipeline used for pre-processing documents before indexing.
- Client Nodes: Nodes that only handle client requests without storing data. They serve as load balancers and can be useful for separating client traffic from data nodes.
- Master-Eligible Nodes: These nodes perform cluster management tasks such as creating or deleting indices, assigning shards to nodes, and monitoring the health of the cluster. Master-eligible nodes also participate in the election of a new master node in the event of a failure.
Each node in Elasticsearch is assigned a unique name and can communicate with other nodes in the cluster over a network. Elasticsearch uses a discovery mechanism to find and join other nodes in the cluster. There are several discovery mechanisms available, such as unicast discovery, multicast discovery, and cloud discovery.
A cluster in Elasticsearch is a group of one or more nodes working together to store and manage data. When multiple nodes are connected and working together in a cluster, Elasticsearch automatically distributes data and load balances queries across all the nodes in the cluster.
Sharding is the process of breaking down a large index into smaller parts called shards, which can be distributed across multiple nodes in a cluster. Each shard is a self-contained index that can be stored and managed independently of other shards. By breaking an index into shards and distributing them across multiple nodes, Elasticsearch can handle large amounts of data and scale horizontally.
Elasticsearch can automatically balance the data across nodes in the cluster using the shard allocation feature. Each index is divided into multiple shards, and Elasticsearch can automatically distribute these shards across multiple nodes to ensure data availability and scalability.
In summary, the basic architecture of Elasticsearch with nodes involves multiple nodes (data and master-eligible nodes) working together in a cluster to store and manage data, with Elasticsearch automatically distributing the data across the nodes for scalability and reliability.
?Project Management ?Developing public administrative systems and pharmacies in the health sector
11 个月your summary is easy to understand about ES