登录查看更多内容

Elastic Search Health Status Red Lesson learn Harder Way |

Soumil S.

Sr. Software Engineer | Big Data & AWS Expert | Spark & EMR | Data Lake(Hudi | Iceberg) Specialist | YouTuber

发布日期: 2020年6月13日

Hello! I’m Soumil Nitin Shah, a Software and Hardware Developer based in New York City. I have completed by Bachelor in Electronic Engineering and my Double master’s in Computer and Electrical Engineering. I Develop Python Based Cross Platform Desktop Application , Webpages , Software, REST API, Database and much more.

In this article i will share my personal experience working with elastic search and lesson i learned the harder way. i have been working on elastic search from past 6 months and lot of time i was dealing with health status as red such as failed to unassigned shards in this article i will share how to you can debug and get it right.

Step 1: Identify the Problem:

curl -XGET localhost:9200/_cluster/allocation/explain?pretty

Lets see the sample Response

{
  "index" : "testing",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "INDEX_CREATED",
    "at" : "2018-04-09T21:48:23.293Z",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "XXXXXXXXX",
      "node_name" : "XXXXXXXX",
      "transport_address" : "127.0.0.1:9300",
      "node_decision" : "no",
      "weight_ranking" : 1,
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists"
        }
      ]
    }
  ]
}

In this case, the API clearly explains why the replica shard remains unassigned: “the shard cannot be allocated to the same node on which a copy of the shard already exists”. To view more details about this particular issue and how to resolve it,

If it looks like the unassigned shards belong to an index you thought you deleted already, or an outdated index that you don’t need anymore, then you can delete the index to restore your cluster status to green:

Following are Reason

* Reason 1: Shard allocation is purposefully delayed

* Reason 2: Too many shards, not enough nodes

* Reason 3: You need to re-enable shard allocation

* Reason 4: Shard data no longer exists in the cluster

* Reason 5: Low disk watermark

* Reason 6: Multiple Elasticsearch versions

Solutions

After spending lot of hours on internet and reading lot of stackoverflow post here are some things you can try to get the cluster back to green \

Clear the Cache

POST /<indexname>/_cache/clear

2. Increase max allocation retries

PUT <indexname>/_settings
{
  "index.allocation.max_retries" : 10
}

3. Delete all the Scroll

DELETE /_search/scroll/_all

4. Increase the Timeout

PUT /<indexname>/_settings?pretty
{


  "settings": {


    "index.unassigned.node_left.delayed_timeout": "10m"


  }

5 Increase the replica to 1 wait for some time and change it back to 0

PUT indexname/_settings
{
  "index.number_of_replicas":1
  
}



PUT <indexname>/_settings
{
  "index.number_of_replicas":0
  
}

6 if you are managing your cluster and not AWS try re routing

POST /_cluster/reroute?prettyp
{


    "commands" : [


        {


          "allocate_empty_primary" : {


                "index" : "constant-updates", 


                "shard" : 0,


                "node" : "<indexname>", 


                "accept_data_loss" : "true"


          }


        }


    ]


}

Last thing i would say if nothing work delete the cluster or index and start again

Some suggestions always it good idea to follow following things

Shards : How many Shards ? usually 25 Gb ~ per shards is a good idea so say you have 250GB of data in this case go for 10 Shards or 20 shards

Replica How many do you need ? usually its good idea to have 1 replica in QA and 3 Replica in PROD [Note data size will increase with your replica ]

Happy ELK

References :

https://stackoverflow.com/questions/44383601/aws-elastic-search-forbidden-8-index-write-api-unable-to-write-to-index
https://aws.amazon.com/premiumsupport/knowledge-center/elasticsearch-red-yellow-status/
https://www.elastic.co/blog/red-elasticsearch-cluster-panic-no-longer
https://stackoverflow.com/questions/48337264/elastic-search-cluster-is-shown-as-red-how-to-recover
https://hellokangning.github.io/en/post/fixing-elasticsearch-with-red-status/

Md Salim Hossain

3 年

root@master:~# POST /logstash-2021.07.28/_cache/clear Please enter content (application/x-www-form-urlencoded) to be POSTed: what sholud i do?

要查看或添加评论，请登录

Soumil S.的更多文章

Single Table Design vs. Multiple Table Design: A Comparison for Tenant-Based Data Processing

2025年3月29日

Single Table Design vs. Multiple Table Design: A Comparison for Tenant-Based Data Processing

When it comes to organizing data for multi-tenant applications, one of the key architectural decisions is how to manage…
Join us for an exciting workshop at the Iceberg Summit 2025 | Hands on Labs

2025年3月25日

Join us for an exciting workshop at the Iceberg Summit 2025 | Hands on Labs

We’ll be diving into AWS Managed Iceberg and exploring the latest features of S3 table buckets. Gain hands-on…

4 条评论
Building a High-Performance Data Analytics Service with Apache Arrow Flight and DuckDB and S3 Tables

2025年3月21日

Building a High-Performance Data Analytics Service with Apache Arrow Flight and DuckDB and S3 Tables

Introduction In today's data-driven world, organizations need efficient ways to access and analyze their data stored in…
Query S3 Tables from AWS Lambda Using DuckDB and Glue IRCC Endpoints

2025年3月16日

Query S3 Tables from AWS Lambda Using DuckDB and Glue IRCC Endpoints

Introduction Processing large-scale data stored in Amazon S3 quickly and efficiently has always been a challenge. With…

1 条评论
Query String Nested JSON Data in New S3 Table Buckets (Iceberg) with DuckDB via IRCC

2025年3月13日

Query String Nested JSON Data in New S3 Table Buckets (Iceberg) with DuckDB via IRCC

In the rapidly evolving data landscape, the ability to efficiently store and query complex JSON data has become…

1 条评论
DuckDB Now Supports Querying New S3 Table Buckets via Glue IRCC Endpoints

2025年3月13日

DuckDB Now Supports Querying New S3 Table Buckets via Glue IRCC Endpoints

DuckDB continues to push the boundaries of fast, in-memory analytics by now supporting querying of new S3 table buckets…

4 条评论
Learn How to Query S3Table Buckets (Managed Iceberg) with Trino | Hands-on Labs

2025年2月27日

Learn How to Query S3Table Buckets (Managed Iceberg) with Trino | Hands-on Labs

This hands-on lab demonstrates how to query S3 Table Buckets (Managed Iceberg) using Trino. The tutorial covers…

4 条评论
Learn How to Perform Dual Write: S3 Table Buckets and Unmanaged Iceberg on EMR EC2, and Sync with AWS Glue | Required Configuration

2025年2月25日

Learn How to Perform Dual Write: S3 Table Buckets and Unmanaged Iceberg on EMR EC2, and Sync with AWS Glue | Required Configuration

Introduction Managing large-scale data lakes efficiently requires advanced techniques like dual write, where data is…

1 条评论
Enhancing Query Performance with Bloom Filters in Apache Iceberg

2025年2月23日

Enhancing Query Performance with Bloom Filters in Apache Iceberg

Introduction In large-scale data processing, optimizing query performance is crucial. Apache Iceberg, a powerful table…

2 条评论
S3 Incremental File Processing with Pessimistic Locking using S3 Lock

2025年2月17日

S3 Incremental File Processing with Pessimistic Locking using S3 Lock

What is Pessimistic Locking? Pessimistic locking is a concurrency control mechanism that prevents multiple processes…

2 条评论

See all articles

Elastic Search Health Status Red Lesson learn Harder Way |

Soumil S.

Sr. Software Engineer | Big Data & AWS Expert | Spark & EMR | Data Lake(Hudi | Iceberg) Specialist | YouTuber

Step 1: Identify the Problem:

Following are Reason

* Reason 1: Shard allocation is purposefully delayed

* Reason 2: Too many shards, not enough nodes

* Reason 3: You need to re-enable shard allocation

* Reason 4: Shard data no longer exists in the cluster

* Reason 5: Low disk watermark

* Reason 6: Multiple Elasticsearch versions

Solutions

References :

Soumil S.的更多文章

社区洞察

其他会员也浏览了

C++ Parallel STL Benchmark

The Type-Traits Library: Correctness

No One Will Read This Series - Flipping the Switch: The Rise of Binary Code

History of Computing - a collection of linked in posts from 2021

Know Your Limits.

Profiling Your Code: Identifying Bottlenecks and Optimizing for Speed

Know Your Big-O!

Garbage Collector in GoLang

Step 1: Identify the Problem:

Following are Reason

* Reason 1: Shard allocation is purposefully delayed

* Reason 2: Too many shards, not enough nodes

* Reason 3: You need to re-enable shard allocation

* Reason 4: Shard data no longer exists in the cluster

* Reason 5: Low disk watermark

* Reason 6: Multiple Elasticsearch versions

Solutions

References :

Soumil S.的更多文章

Single Table Design vs. Multiple Table Design: A Comparison for Tenant-Based Data Processing

Join us for an exciting workshop at the Iceberg Summit 2025 | Hands on Labs

Building a High-Performance Data Analytics Service with Apache Arrow Flight and DuckDB and S3 Tables

Query S3 Tables from AWS Lambda Using DuckDB and Glue IRCC Endpoints

Query String Nested JSON Data in New S3 Table Buckets (Iceberg) with DuckDB via IRCC

DuckDB Now Supports Querying New S3 Table Buckets via Glue IRCC Endpoints

Learn How to Query S3Table Buckets (Managed Iceberg) with Trino | Hands-on Labs

Learn How to Perform Dual Write: S3 Table Buckets and Unmanaged Iceberg on EMR EC2, and Sync with AWS Glue | Required Configuration

Enhancing Query Performance with Bloom Filters in Apache Iceberg

S3 Incremental File Processing with Pessimistic Locking using S3 Lock

社区洞察

其他会员也浏览了

C++ Parallel STL Benchmark

The Type-Traits Library: Correctness

No One Will Read This Series - Flipping the Switch: The Rise of Binary Code

History of Computing - a collection of linked in posts from 2021

Know Your Limits.

Profiling Your Code: Identifying Bottlenecks and Optimizing for Speed

Know Your Big-O!

Garbage Collector in GoLang