From Panic to Precision: Tale of Migrating the Biggest RDS Instance Overnight

From Panic to Precision: Tale of Migrating the Biggest RDS Instance Overnight

Hey! Welcome to Zeta Scoops — an infinite-part series on the life and times of Zetanauts revolutionizing banking technology globally.


Today, we delve into the incredible story of Pavan, the Expeditor in our SRE team, and how he migrated 21 RDS instances from 3 AWS Zones into one without losing a single business day!


Pause and think of when you saw a task and thought, “How on earth am I supposed to do this!”. That feeling right there is what swept through Pavan’s entire being when he found out that he needed to migrate 21 RDS (Relational Database Service) instances spread across three different AWS Zones into a single Zone. The migration involved first decrypting all the instances and then, once moved, encrypting them again. Scattered RDS instances cause a drop in efficiency and push costs up, among other communications challenges.

Simply Put:

Folks in 21 offices (RDS) across three towns (Zones) were working on a project. Each office specialized in a specific task and contributed towards the project. Pavan had to move all these offices along with the people working in them to a single town, all while ensuring that not a single working day was lost.

All the offices had to be in one town.

For more context, the total cumulative data to be moved was about 15TB. If 1MB is one person, then 15 TB is 15000000 people.

Pavan was tasked to move entire neighborhoods — buildings and people — in a way that no one misses a day’s worth of work. That’s bonkers!

Pavan computed the number of teams using the same Zone, how they could be put into clusters, and the time it would take to migrate to a single zone. The answer was a year.

So, like a true human, Pavan’s first response was logical: abject panic. Once back to his senses, Pavan created an action plan: connect with different teams, tell them about the migration, and ask for a suitable time for each team to stop working and migrate. The catch was that interdependent teams must agree to move together.

Since all the offices worked together, they must agree to move together.

After a bout of verbal and intellectual encounters with all the teams, Pavan concluded that “It’s not happening,” which, to be fair, is a fair response. The teams couldn’t stop operations for a prolonged period of time. The downtime for client apps and platforms, in general, was massive and thus impossible.

Pavan ironically had the wind knocked out of him. He was back to the drawing board.

Pavan saw no other way than to automate the process. This meant facing his tiny but present fear: coding. “I knew it was the only way to get this done,” says Pavan. He created scripts, weaving together commands and leveraging online resources for guidance. Weeks passed before Pavan successfully made an automation tool that would automatically help move databases into different Zones. The buildings and people in them would be moved on a bullet train.

Pavan reached out to the teams working with the biggest database (biggest office), and this time around, instead of asking for days, he requested only a window of 12 hours. “This is all it would take if everything goes right,” Pavan says almost sheepishly. If migrating the most significant database instance took less than 12 hours, then migrating the smaller one would take less time.

Almost a year’s worth of planning, producing, communicating, failures, and wins had come to this: a solid winter evening where a new Zone would be designated, and Pavan would gain closure: one way or the other.

The migration began at midnight. The first lines of codes churned into the system, and somehow, Pavan could “hear the codes running.” It’s funny how excitement coupled with a minor caffeine addiction works as a mild drug. It is crucial to understand that Pavan’s automation tool wasn’t just transferring data but decrypting it first and then encrypting it after moving. That’s a massive layer of complexity.

Pavan was confident of success, but the fear of failure kept tingling in the back of his neck. Hours would pass, and the finish line was finally in sight. The victory music was about to cue in when a slight niggle sprung: a particular loading bar had stopped moving. It looked frozen. Pavan was fixated on the fine green line inside a loading bar, hoping (pleading) for it to move. And it did. Right to the end. The task was complete. The clock said 6:40 AM.

The migration of the largest database (biggest office) instance took less than 7 hours — a massive feat. The city’s biggest neighborhood was transported to a new town, and everyone went to work the next day.

Pavan successfully migrated the most extensive database, saved significant costs, and streamlined operations. His automation tool would later migrate all remaining RDS instances with significantly lower downtime.

Indomitable, yet sometimes anxious, spirits like Pavan’s are why Zeta has surpassed every challenge in its path and now stands atop and alone in the Banking Tech arena. If you want to catch up with Pavan or connect with him on a fine evening, this is his LinkedIn account. If you wish to stay updated and amazed by stories from Zeta, please consider following this account or drop a follow on our Instagram page.


#WeAreZetanauts, and we’re always looking for folks to join us on our mission to build next-gen banking tech. Click here to check out career opportunities with Zeta.



Aayush Jaiswal

Software Engineer at Bank of America | DevOps | Ansible & OpenShift

4 个月

Great work ????

回复
Trinadh Amireddy

Lead Member of Technical Staff at Salesforce

6 个月

Good work Pavanteja Achanta keep it up

要查看或添加评论,请登录