ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

ElasticSearch

trong luong van

? Senior Software Engineer | Solutions Architect

å‘å¸ƒæ—¥æœŸ: 2024å¹´4æœˆ20æ—¥

Elasticsearch is a distributed search and analytics engine built on Apache Lucene. Since its release in 2010, Elasticsearch has quickly become the most popular search engine and is commonly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence use cases.

Elasticsearch?uses inverted index, a data structure that maps words to their document locations, for an efficient search. Elasticsearch's distributed architecture enables the rapid search and analysis of massive amounts of data with almost real-time performance.

How to handler search in ElasticSearch

Index Your Data

Create index



PUT https://localhost:9200/products

Note: You can use the POST method instead of PUT to create an index in Elasticsearch. But when you used POST method, elasticsearch will automatically generate a unique name for the index.If you use PUT method, you can create an index with a specific name. So I usually use PUT method to create index

Define Mapping

PUT https://localhost:9200/products/_mapping

{

  "properties": {

    "name": {

      "type": "text"

    },

    "price": {

      "type": "float"

    }

  }

}

Add Documents:

POST https://localhost:9200/products/_doc/1
{

  "name": "Product A",

  "price": 49.99

}

2. Define a Query:

Match Query:? It is designed to perform a full-text search on text fields within documents

POST /https://localhost:9200/products/_search

{

  "query": {

    "match": {

      "name": "Product A"

    }

  }

}

Bool Query: Combines multiple queries using boolean logic.

POST /your_index_name/_search

{

  "query": {

    "bool": {

      "must": [

        { "match": { "field1": "value1" } },

        { "match": { "field2": "value2" } }

      ],

      "should": [

        { "term": { "field3": "value3" } }

      ],

      "must_not": [

        { "range": { "field4": { "lt": 10 } } }

      ]

    }

  }

}

- "must": Use the "must" clause to specify conditions that all documents must satisfy. In the example, it requires documents to match both "field1": "value1" and "field2": "value2".

- "should": The "should" clause specifies optional conditions. At least one of these conditions should match for a document to be considered. In the example, it searches for documents where "field3" is equal to "value3" but doesn't require it.

- â€œmust_not": The "must_not" clause excludes documents that match the specified conditions. In the example, it excludes documents where "field4" is less than 10.

Range Query: Searches for data within a specific range.

POST /your_index_name/_search

{

  "query": {

    "range": {

      "field_name": {

        "gte": 10,   // Greater than or equal to

        "lte": 100   // Less than or equal to

      }

    }

  }

}

Term Query: Searches for exact matches.

é¢†è‹±æŽ¨è

All Databases are Equal, but Some Databases are More Equal than Others

All Databases are Equal, but Some Databases are Moreâ€¦

Vincent Granville 5 ä¸ªæœˆå‰

How Yelp built and scaled their (near) Realtime Search

Arpit Bhayani 2 å¹´å‰

Unlocking the Power of AI with MongoDB Atlas Vector Search

Unlocking the Power of AI with MongoDB Atlas Vectorâ€¦

Kesha Williams 3 ä¸ªæœˆå‰

{

  "query": {

    "term": {

      "field_name": "exact_term"

    }

  }

}

NOTES:

the main difference is that the match query is used for full-text search, allows for partial matches, assigns scores, and can handle fuzziness, while the term query is used for exact term matching and does not assign scores.

Fuzzy Query: Searches for similar terms

{
  "query": {
    "fuzzy": {
      	"field_name": {
             "value": "search_term",
             "fuzziness": 2
      	           }
             }
          }
   }

For example, if you use a fuzzy query with a fuzziness of 2 for "search_term" equal to "elephent," it will match terms like "elephant," "elephat," "eliphant," and "elephants," among others

Wildcard Query: Searches with wildcard patterns.

{
  "query": {
    	"wildcard": {
      		"field_name": "wildcard_pattern"
   		}
 	}
}

Asterisk (*) Wildcard:

- The asterisk * represents zero or more characters.

- For example, if you use the wildcard pattern "appl*" in a query, it will match documents containing terms like "apple," "apples," "appliance,"

etc.

Question Mark (?) Wildcard:

- The question mark ? represents a single character.

- For example, if you use the wildcard pattern "gr?y" in a query, it will match documents containing terms like "grey" or "gray."

Pagination

Pagination divides the search results into smaller pages or "pages" that can be displayed to the user. This is commonly used in web applications to show a limited number of results per page and allow users to navigate through the pages.

To implement pagination, you can use the "from" and "size" parameters in your search request. Here's an example:

POST /your_index_name/_search
	{
 		 "from": 0,       // Start from the first result
  		"size": 10,      // Return 10 results per page
  		"query": {
   		 	"match": {
      				"field_name": "search_term"
   		 	}
 	 	}
	}

Distribute architecture of elastic search

The basic architecture of Elasticsearch consists of nodes, which are the basic building blocks of a cluster.

A node is a single instance of Elasticsearch that stores data and participates in the cluster's search and indexing capabilities. Nodes can be installed on a single machine or multiple machines, depending on the size and complexity of the data being indexed.

The nodes in Elasticsearch can be classified into 4 types: data nodes , master-eligible nodes, ingest nodes, and client nodes

Data Nodes: Nodes responsible for storing and indexing data. They hold primary and replica shards and handle search and indexing operations.
Ingest Nodes: Nodes that contain an ingest pipeline used for pre-processing documents before indexing.
Client Nodes: Nodes that only handle client requests without storing data. They serve as load balancers and can be useful for separating client traffic from data nodes.
Master-Eligible Nodes: These nodes perform cluster management tasks such as creating or deleting indices, assigning shards to nodes, and monitoring the health of the cluster. Master-eligible nodes also participate in the election of a new master node in the event of a failure.

Each node in Elasticsearch is assigned a unique name and can communicate with other nodes in the cluster over a network. Elasticsearch uses a discovery mechanism to find and join other nodes in the cluster. There are several discovery mechanisms available, such as unicast discovery, multicast discovery, and cloud discovery.

A cluster in Elasticsearch is a group of one or more nodes working together to store and manage data. When multiple nodes are connected and working together in a cluster, Elasticsearch automatically distributes data and load balances queries across all the nodes in the cluster.

Sharding is the process of breaking down a large index into smaller parts called shards, which can be distributed across multiple nodes in a cluster. Each shard is a self-contained index that can be stored and managed independently of other shards. By breaking an index into shards and distributing them across multiple nodes, Elasticsearch can handle large amounts of data and scale horizontally.

Elasticsearch can automatically balance the data across nodes in the cluster using the shard allocation feature. Each index is divided into multiple shards, and Elasticsearch can automatically distribute these shards across multiple nodes to ensure data availability and scalability.

In summary, the basic architecture of Elasticsearch with nodes involves multiple nodes (data and master-eligible nodes) working together in a cluster to store and manage data, with Elasticsearch automatically distributing the data across the nodes for scalability and reliability.

Cao ??c KhÃ¡nh

?Project Management ?Developing public administrative systems and pharmacies in the health sector

11 ä¸ªæœˆ

your summary is easy to understand about ES

èµž

å›žå¤

1 æ¬¡å›žåº”

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

trong luong vançš„æ›´å¤šæ–‡ç«

Upload Files to S3 from Go Backend - Part 2 : Setup Amazon CloudFront vs Security

2024å¹´6æœˆ3æ—¥

Upload Files to S3 from Go Backend - Part 2 : Setup Amazon CloudFront vs Security

Amazon CloudFront is a content delivery network (CDN) that delivers your content with low latency and high transferâ€¦
Upload Files to S3 from Go Backend - Part 1 : Setup S3 Bucket vs Security

2024å¹´6æœˆ1æ—¥

Upload Files to S3 from Go Backend - Part 1 : Setup S3 Bucket vs Security

Step 1: Sign in to the AWS Management Console Step 2: Create an S3 Bucket Navigate to the S3 service from the consoleâ€¦
Cache avalanche , Cache penetration , Cache breakdown

2024å¹´5æœˆ30æ—¥

Cache avalanche , Cache penetration , Cache breakdown

Cache avalanche Cache avalanche is a scenario where lots of cached data expire at the same time or the cache service isâ€¦
Mysql Master-Slave Replication setup on Docker

2024å¹´5æœˆ12æ—¥

Mysql Master-Slave Replication setup on Docker

In the world of database management, replication plays a vital role in ensuring data availability, scalability, andâ€¦

2 æ¡è¯„è®º
How to Install PostgreSQL on Amazon Linux

2024å¹´5æœˆ11æ—¥

How to Install PostgreSQL on Amazon Linux

Step 1- Launching and Configuring Your EC2 Instance 1. Log in to AWS services and select EC2.

2 æ¡è¯„è®º
Solid

2024å¹´5æœˆ10æ—¥

Solid

The SOLID principles are a set of design guidelines that help developers write more maintainable, scalable, andâ€¦
Change Data Capture

2024å¹´5æœˆ7æ—¥

Change Data Capture

What is CDC ? Change Data Capture (CDC) is the process of recognizing when data has changed in source system so that aâ€¦
SQL injection

2024å¹´5æœˆ6æ—¥

SQL injection

What is SQL injection? SQL injection is an attack that occurs when malicious SQL (Structured Query Language) code isâ€¦
ORM

2024å¹´5æœˆ3æ—¥

ORM

What is ORM? ORM stands for object-relational mapping. It's a programming technique that allows data to be mappedâ€¦

1 æ¡è¯„è®º
What is ACID Database?

2024å¹´4æœˆ26æ—¥

What is ACID Database?

Transaction is a collection of sequel queries that are treated as one unit of work Atomicity group multiple operationsâ€¦

See all articles

ElasticSearch

trong luong van

? Senior Software Engineer | Solutions Architect

How to handler search in ElasticSearch

é¢†è‹±æŽ¨è

Distribute architecture of elastic search

trong luong vançš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

SQL Based Rollups for Streaming Data

The Unexplored and Hidden Potential of Elasticsearch

ElasticSearch

Timescale Newsletter ?? Postgres-Powered AI

A Win for MongoDB as they Acquire Voyage AI

RisingWave Newsletter March 2024

Advance Indexing with Couchbase and Node.js

The Guide To Google Pub/Sub

MongoDB Atlas Vector Search can be utilized with LangChain

Mastering Efficiency: How to Harness the Full Potential of Elastic Search

How to handler search in ElasticSearch

é¢†è‹±æŽ¨è

Distribute architecture of elastic search

trong luong vançš„æ›´å¤šæ–‡ç«

Upload Files to S3 from Go Backend - Part 2 : Setup Amazon CloudFront vs Security

Upload Files to S3 from Go Backend - Part 1 : Setup S3 Bucket vs Security

Cache avalanche , Cache penetration , Cache breakdown

Mysql Master-Slave Replication setup on Docker

How to Install PostgreSQL on Amazon Linux

Solid

Change Data Capture

SQL injection

ORM

What is ACID Database?

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

SQL Based Rollups for Streaming Data

The Unexplored and Hidden Potential of Elasticsearch

ElasticSearch

Timescale Newsletter ?? Postgres-Powered AI

A Win for MongoDB as they Acquire Voyage AI

RisingWave Newsletter March 2024

Advance Indexing with Couchbase and Node.js

The Guide To Google Pub/Sub

MongoDB Atlas Vector Search can be utilized with LangChain

Mastering Efficiency: How to Harness the Full Potential of Elastic Search

é¢†è‹±æŽ¨è

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†