Hadoop Multi Data Centre Migration
Vishal Garg
Product Owner - Intelligent Application Platforms (Agentic AI,Gen AI, NLP, AIML, Data Serving App Development, Data Fabric/Mesh, Data Storage Solutions, MLOPs, Cloud Platforms) at Ericsson |Ex-IBMer
MapR Cluster Migration Via Multi Data Centre Setup
Task Despeciation: - It was really challenging when I started this work of migrating the MAPR Hadoop Cluster to be migrated from one Data Centre Setup to another.
Approaches/Options: -
- To achieve the migration of existing cluster I considered two approaches i.e. build a new cluster and mirroring the data over the wire using TCP/IP. Pro of this approach is we can independently build a new cluster and promote volumes. Con was the time taken to mirror >2PB of user data for a live growing cluster. The estimate was around 4-6months with the N/W speed.
- Merge the two Data Centre setups and grow the existing clusters followed by drain of old infrastructure. The pro of this approach was the end users will never know that we have changed the setup underneath & faster turnaround time. Con was we never knew data will seamlessly flow in between two setups.
Challenges: - Following were the challenges which we faced during this mammoth project while adopting the option B.
- The biggest challenge was different N/W, subnet and IP range for new setup.
- Customizing the nodes from DNS names, domain name perspective being different naming convention from existing HUB.
- User: Group setup to ensure cluster data remain valid post cutover
- Moving the control components (Zk, RM, CLDB) without hampering the working of existing cluster.
- RPC communication between New and Old Infrastructure for the time when they co-exist.
- DB synchronization for Metadata DBs, openTSDB (time series data for metrics)
- One of the complexities was managing the SSL certs for a secure cluster.
- Getting the Proxy and getting HA’s of various components.
Pictorial Depiction of Solution
Final validation involves volume, data, user validations which should ensure once we dissolve DC1 there is no data loss. I have preferred Balancer settings to rapid for this phase.
Risks:-
- Product team doesn’t support multi DC setups
- Split brain scenario can happen during Zk migration leading to cluster failure.
Currently: Senior Software Developer | Former: Hadoop Admin in Analytics | ERICSSON
4 年Vishal Garg How was the Replication Factor of Data/Volumes was taken care of during migration?
Product Area Architect - Analytics Platform(Big Data , AI/ML, Cloud AWS)
4 年Good one