ElasticSearch

ElasticSearch

Elasticsearch is a distributed search and analytics engine built on Apache Lucene. Since its release in 2010, Elasticsearch has quickly become the most popular search engine and is commonly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence use cases.


Elasticsearch?uses inverted index, a data structure that maps words to their document locations, for an efficient search. Elasticsearch's distributed architecture enables the rapid search and analysis of massive amounts of data with almost real-time performance.


How to handler search in ElasticSearch

  1. Index Your Data


  • Create index



PUT https://localhost:9200/products        


Note: You can use the POST method instead of PUT to create an index in Elasticsearch. But when you used POST method, elasticsearch will automatically generate a unique name for the index.If you use PUT method, you can create an index with a specific name. So I usually use PUT method to create index



  • Define Mapping

PUT https://localhost:9200/products/_mapping

{

  "properties": {

    "name": {

      "type": "text"

    },

    "price": {

      "type": "float"

    }

  }

}        




  • Add Documents:

POST https://localhost:9200/products/_doc/1
{

  "name": "Product A",

  "price": 49.99

}        


2. Define a Query:



  • Match Query:? It is designed to perform a full-text search on text fields within documents


POST /https://localhost:9200/products/_search

{

  "query": {

    "match": {

      "name": "Product A"

    }

  }

}        


  • Bool Query: Combines multiple queries using boolean logic.


POST /your_index_name/_search

{

  "query": {

    "bool": {

      "must": [

        { "match": { "field1": "value1" } },

        { "match": { "field2": "value2" } }

      ],

      "should": [

        { "term": { "field3": "value3" } }

      ],

      "must_not": [

        { "range": { "field4": { "lt": 10 } } }

      ]

    }

  }

}        


- "must": Use the "must" clause to specify conditions that all documents must satisfy. In the example, it requires documents to match both "field1": "value1" and "field2": "value2".


- "should": The "should" clause specifies optional conditions. At least one of these conditions should match for a document to be considered. In the example, it searches for documents where "field3" is equal to "value3" but doesn't require it.


- “must_not": The "must_not" clause excludes documents that match the specified conditions. In the example, it excludes documents where "field4" is less than 10.


  • Range Query: Searches for data within a specific range.


POST /your_index_name/_search

{

  "query": {

    "range": {

      "field_name": {

        "gte": 10,   // Greater than or equal to

        "lte": 100   // Less than or equal to

      }

    }

  }

}

        


  • Term Query: Searches for exact matches.


{

  "query": {

    "term": {

      "field_name": "exact_term"

    }

  }

}        



NOTES:

the main difference is that the match query is used for full-text search, allows for partial matches, assigns scores, and can handle fuzziness, while the term query is used for exact term matching and does not assign scores.


  • Fuzzy Query: Searches for similar terms

{
  "query": {
    "fuzzy": {
      	"field_name": {
             "value": "search_term",
             "fuzziness": 2
      	           }
             }
          }
   }        

For example, if you use a fuzzy query with a fuzziness of 2 for "search_term" equal to "elephent," it will match terms like "elephant," "elephat," "eliphant," and "elephants," among others


  • Wildcard Query: Searches with wildcard patterns.

{
  "query": {
    	"wildcard": {
      		"field_name": "wildcard_pattern"
   		}
 	}
}        

Asterisk (*) Wildcard:

- The asterisk * represents zero or more characters.

- For example, if you use the wildcard pattern "appl*" in a query, it will match documents containing terms like "apple," "apples," "appliance,"

etc.


Question Mark (?) Wildcard:

- The question mark ? represents a single character.

- For example, if you use the wildcard pattern "gr?y" in a query, it will match documents containing terms like "grey" or "gray."



  • Pagination

Pagination divides the search results into smaller pages or "pages" that can be displayed to the user. This is commonly used in web applications to show a limited number of results per page and allow users to navigate through the pages.


To implement pagination, you can use the "from" and "size" parameters in your search request. Here's an example:

POST /your_index_name/_search
	{
 		 "from": 0,       // Start from the first result
  		"size": 10,      // Return 10 results per page
  		"query": {
   		 	"match": {
      				"field_name": "search_term"
   		 	}
 	 	}
	}        


Distribute architecture of elastic search

The basic architecture of Elasticsearch consists of nodes, which are the basic building blocks of a cluster.

A node is a single instance of Elasticsearch that stores data and participates in the cluster's search and indexing capabilities. Nodes can be installed on a single machine or multiple machines, depending on the size and complexity of the data being indexed.

The nodes in Elasticsearch can be classified into 4 types: data nodes , master-eligible nodes, ingest nodes, and client nodes


  • Data Nodes: Nodes responsible for storing and indexing data. They hold primary and replica shards and handle search and indexing operations.
  • Ingest Nodes: Nodes that contain an ingest pipeline used for pre-processing documents before indexing.
  • Client Nodes: Nodes that only handle client requests without storing data. They serve as load balancers and can be useful for separating client traffic from data nodes.
  • Master-Eligible Nodes: These nodes perform cluster management tasks such as creating or deleting indices, assigning shards to nodes, and monitoring the health of the cluster. Master-eligible nodes also participate in the election of a new master node in the event of a failure.


Each node in Elasticsearch is assigned a unique name and can communicate with other nodes in the cluster over a network. Elasticsearch uses a discovery mechanism to find and join other nodes in the cluster. There are several discovery mechanisms available, such as unicast discovery, multicast discovery, and cloud discovery.

A cluster in Elasticsearch is a group of one or more nodes working together to store and manage data. When multiple nodes are connected and working together in a cluster, Elasticsearch automatically distributes data and load balances queries across all the nodes in the cluster.


Sharding is the process of breaking down a large index into smaller parts called shards, which can be distributed across multiple nodes in a cluster. Each shard is a self-contained index that can be stored and managed independently of other shards. By breaking an index into shards and distributing them across multiple nodes, Elasticsearch can handle large amounts of data and scale horizontally.

Elasticsearch can automatically balance the data across nodes in the cluster using the shard allocation feature. Each index is divided into multiple shards, and Elasticsearch can automatically distribute these shards across multiple nodes to ensure data availability and scalability.

In summary, the basic architecture of Elasticsearch with nodes involves multiple nodes (data and master-eligible nodes) working together in a cluster to store and manage data, with Elasticsearch automatically distributing the data across the nodes for scalability and reliability.

Cao ??c Khánh

?Project Management ?Developing public administrative systems and pharmacies in the health sector

11 个月

your summary is easy to understand about ES

要查看或添加评论,请登录

trong luong van的更多文章

  • Upload Files to S3 from Go Backend - Part 2 : Setup Amazon CloudFront vs Security

    Upload Files to S3 from Go Backend - Part 2 : Setup Amazon CloudFront vs Security

    Amazon CloudFront is a content delivery network (CDN) that delivers your content with low latency and high transfer…

  • Upload Files to S3 from Go Backend - Part 1 : Setup S3 Bucket vs Security

    Upload Files to S3 from Go Backend - Part 1 : Setup S3 Bucket vs Security

    Step 1: Sign in to the AWS Management Console Step 2: Create an S3 Bucket Navigate to the S3 service from the console…

  • Cache avalanche , Cache penetration , Cache breakdown

    Cache avalanche , Cache penetration , Cache breakdown

    Cache avalanche Cache avalanche is a scenario where lots of cached data expire at the same time or the cache service is…

  • Mysql Master-Slave Replication setup on Docker

    Mysql Master-Slave Replication setup on Docker

    In the world of database management, replication plays a vital role in ensuring data availability, scalability, and…

    2 条评论
  • How to Install PostgreSQL on Amazon Linux

    How to Install PostgreSQL on Amazon Linux

    Step 1- Launching and Configuring Your EC2 Instance 1. Log in to AWS services and select EC2.

    2 条评论
  • Solid

    Solid

    The SOLID principles are a set of design guidelines that help developers write more maintainable, scalable, and…

  • Change Data Capture

    Change Data Capture

    What is CDC ? Change Data Capture (CDC) is the process of recognizing when data has changed in source system so that a…

  • SQL injection

    SQL injection

    What is SQL injection? SQL injection is an attack that occurs when malicious SQL (Structured Query Language) code is…

  • ORM

    ORM

    What is ORM? ORM stands for object-relational mapping. It's a programming technique that allows data to be mapped…

    1 条评论
  • What is ACID Database?

    What is ACID Database?

    Transaction is a collection of sequel queries that are treated as one unit of work Atomicity group multiple operations…

社区洞察

其他会员也浏览了