登录查看更多内容

Hadoop Multi Data Centre Migration

Vishal Garg

Product Owner - Intelligent Application Platforms (Agentic AI,Gen AI, NLP, AIML, Data Serving App Development, Data Fabric/Mesh, Data Storage Solutions, MLOPs, Cloud Platforms) at Ericsson |Ex-IBMer

发布日期: 2020年11月26日

MapR Cluster Migration Via Multi Data Centre Setup

Task Despeciation: - It was really challenging when I started this work of migrating the MAPR Hadoop Cluster to be migrated from one Data Centre Setup to another.

Approaches/Options: -

To achieve the migration of existing cluster I considered two approaches i.e. build a new cluster and mirroring the data over the wire using TCP/IP. Pro of this approach is we can independently build a new cluster and promote volumes. Con was the time taken to mirror >2PB of user data for a live growing cluster. The estimate was around 4-6months with the N/W speed.
Merge the two Data Centre setups and grow the existing clusters followed by drain of old infrastructure. The pro of this approach was the end users will never know that we have changed the setup underneath & faster turnaround time. Con was we never knew data will seamlessly flow in between two setups.

Challenges: - Following were the challenges which we faced during this mammoth project while adopting the option B.

The biggest challenge was different N/W, subnet and IP range for new setup.
Customizing the nodes from DNS names, domain name perspective being different naming convention from existing HUB.
User: Group setup to ensure cluster data remain valid post cutover
Moving the control components (Zk, RM, CLDB) without hampering the working of existing cluster.
RPC communication between New and Old Infrastructure for the time when they co-exist.
DB synchronization for Metadata DBs, openTSDB (time series data for metrics)
One of the complexities was managing the SSL certs for a secure cluster.
Getting the Proxy and getting HA’s of various components.

Pictorial Depiction of Solution

Final validation involves volume, data, user validations which should ensure once we dissolve DC1 there is no data loss. I have preferred Balancer settings to rapid for this phase.

Risks:-

Product team doesn’t support multi DC setups
Split brain scenario can happen during Zk migration leading to cluster failure.

Ankita Sen

Currently: Senior Software Developer | Former: Hadoop Admin in Analytics | ERICSSON

4 年

Vishal Garg How was the Replication Factor of Data/Volumes was taken care of during migration?

PRAVEEN KT

Product Area Architect - Analytics Platform(Big Data , AI/ML, Cloud AWS)

4 年

Good one

查看更多评论

要查看或添加评论，请登录

Vishal Garg的更多文章

Snowpipe in action for Realtime ingestion

2023年4月1日

Snowpipe in action for Realtime ingestion

In addition to my post on LinkedIn https://www.linkedin.
MLOPs monitoring Solution

2022年4月1日

MLOPs monitoring Solution

Really glad to showcase my work/POC for Model Monitoring using Data Bricks and Microsoft Azure ML. Problem Statement/s…

5 条评论
Spark via Kubernetes using MapR as Data Storage Layer

2022年1月9日

Spark via Kubernetes using MapR as Data Storage Layer

Some real good implementation in my Data Platform. We have used NFSV3 to expose the HDFS/MFS for data storage and used…

3 条评论
Data Tiering using Mapr

2021年4月4日

Data Tiering using Mapr

The industry started with a single server Data Base Management Systems with scalability limited to single server. The…

2 条评论

Hadoop Multi Data Centre Migration

Vishal Garg

Product Owner - Intelligent Application Platforms (Agentic AI,Gen AI, NLP, AIML, Data Serving App Development, Data Fabric/Mesh, Data Storage Solutions, MLOPs, Cloud Platforms) at Ericsson |Ex-IBMer

MapR Cluster Migration Via Multi Data Centre Setup

Vishal Garg的更多文章

社区洞察

其他会员也浏览了

Increasing/decreasing the size of Hadoop Datanode dynamically

Understanding YARN (Yet Another Resource Negotiator)

Hadoop Market - Forecast(2024 - 2030)

Understanding What Data is Stored in the Name Node

Technology adoptions for data processing and analysis

A Comprehensive Guide to Hadoop YARN - Yet Another Resource Negotiator.

Integration of LVM with Hadoop

Contribute Limited Amount Of Storage Of DataNode In Hadoop Cluster

Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

Fetch Bulk Data from HBase Using Spark Multi-Executor With foreachPartitionAsync

MapR Cluster Migration Via Multi Data Centre Setup

Vishal Garg的更多文章

Snowpipe in action for Realtime ingestion

MLOPs monitoring Solution

Spark via Kubernetes using MapR as Data Storage Layer

Data Tiering using Mapr

社区洞察

其他会员也浏览了

Increasing/decreasing the size of Hadoop Datanode dynamically

Understanding YARN (Yet Another Resource Negotiator)

Hadoop Market - Forecast(2024 - 2030)

Understanding What Data is Stored in the Name Node

Technology adoptions for data processing and analysis

A Comprehensive Guide to Hadoop YARN - Yet Another Resource Negotiator.

Integration of LVM with Hadoop

Contribute Limited Amount Of Storage Of DataNode In Hadoop Cluster

Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

Fetch Bulk Data from HBase Using Spark Multi-Executor With foreachPartitionAsync