登录查看更多内容

AWS S3 Migration

Todd Palmer

发布日期: 2015年7月9日

TrainingPeaks imports over 35 different file formats to help you analyze your workout and provide immediate feedback to athletes and coaches around the globe. Storage of this information is something we take seriously. Over the years TrainingPeaks has used several storage mechanisms for your workout files including disk, database, MongoDB's Grid Filesystem (GridFS), and now Amazon S3.

Problem

To provide the reliability our customers expect from our platform, we run MongoDB in a High Availability (HA) configuration called a replica set. This duplicates data between multiple servers in case of a single server failure. In addition we have a mirrored instance that runs on a four hour delay, hourly and daily backups in case of catastrophic failure. Since moving to the Mongo GridFS implementation in 2012, TrainingPeaks has seen exponential growth in workout file storage. In 2014 we were storing over four terabytes in Mongo GridFS, with workout files going back to the year 2001. As our storage requirements grew, so too did our costs and maintenance complexities. Updates, maintenance, and backups took longer, carried more risk and reduced our ability to innovate.

Solution

In 2014 we identified workout file storage as an area of high operational and infrastructure risk and started investigating alternatives. One alternative was Amazon's Simple Storage Service or S3, which was even more attractive since moving the entire TrainingPeaks platform to Amazon Web Services (AWS) in August of 2013. Ultimately Amazon's S3 won out with a combination of reliability, simplicity and cost. A project was started to migrate from all customer workout data from Mongo GridFS to S3.

Migration

A project like this is akin to replacing the engine of a car while you are going down the road at 60 miles an hour and takes careful planning and testing. In late 2014 we started writing new incoming files to both Mongo GridFS and Amazon S3 in parallel. To ensure reliability, one of every 50 files uploaded was queued and verified as stored correctly in both places. To ensure performance we we monitored and compared the write performance. Millions of files later with a few minor quirks worked out, we were confident and started migrating in our test environments.

The code to read files from S3 was implemented and deployed to our test environments behind feature flags allowing us to switch back and forth between GridFS and S3. After thorough testing and performance monitoring, this was released to production but turned off with a feature flag until all old files were migrated from Mongo GridFS into S3.

The migration process ran in production for five days migrating and verifying over 38 million workout files. It worked backwards in time to the first file uploaded for a workout on November 22, 2001. When the migration completed we turned off the feature flag in production and began reading workout files from S3. We continued to write files to Mongo GridFS in parallel in case of any unforeseen problems requiring us to rollback. After a few weeks of constant monitoring without a single problem, we turned off the feature flag and stopped writing files to Mongo GridFS operating completely on Amazon S3.

Cleanup of Mongo GridFS took a few more weeks. A cleanup process ran for six days deleting migrated files out of Mongo. After deleting the files, we started rolling new Mongo instances into our replica set and removing the old instances. This is the only way to reclaim the storage space used by Mongo even if the data is deleted. In the end our Mongo storage space was reduced by over 92%.

S3 Configuration and Use

Each one of our environments, production, testing and development have their own S3 bucket that can only be written to by that environment. We use machine level S3 permissions to prevent cross talk without exposing keys in configuration and preventing deployment errors.

Files are written to S3 with a per person based key using recommended S3 best practices to prevent hot S3 nodes that could degrade performance. Initial performance testing showed respectable 100 ms response times, and the actual times in production are well below this mark even under constant heavy load.

Our production S3 bucket is backed up to a secondary S3 bucket on a nightly basis. Our backup bucket is configured with an automatic object lifetime policy to archive objects to Amazon’s Glacier data archiving service after 60 days providing a secure durable copy of our customer data.

Summary

At TrainingPeaks we take your data as seriously as you take your workout. Our migration to using S3 for storage is a behind the scenes detail you shouldn’t have to worry about, to provide you with the best training and analysis platform to meet your next challenge.

Todd Palmer的更多文章

SaaS Operational Maturity

2015年8月14日

SaaS Operational Maturity

A New Platform The TrainingPeaks platform has been around in one form or another since 1999. Since then, our user base…

AWS S3 Migration

Todd Palmer

Problem

Solution

Migration

S3 Configuration and Use

Summary

Todd Palmer的更多文章

社区洞察

其他会员也浏览了

10+ AWS Projects for Students to Showcase Cloud Skills

Deploying Applications on AWS EC2

Azure developer associate - What is Azure developer associate?

Step-by-Step Guide: Creating an AWS Portfolio Website

Run DeepSeek R1 on AWS EC2 Using Pulumi and Ollama

Use .NET Aspire to Deploy Azure Video Indexer

Would you rather fight a horse-sized Duck or 100 duck-sized Horses? - Containers and Microservices Part 2

Day 41: Setting up an Application Load Balancer with AWS EC2 ??

AWS IaaC launch EC2 Instance

Publicly available container app stores

Problem

Solution

Migration

S3 Configuration and Use

Summary

Todd Palmer的更多文章

SaaS Operational Maturity

社区洞察

其他会员也浏览了

10+ AWS Projects for Students to Showcase Cloud Skills

Deploying Applications on AWS EC2

Azure developer associate - What is Azure developer associate?

Step-by-Step Guide: Creating an AWS Portfolio Website

Run DeepSeek R1 on AWS EC2 Using Pulumi and Ollama

Use .NET Aspire to Deploy Azure Video Indexer

Would you rather fight a horse-sized Duck or 100 duck-sized Horses? - Containers and Microservices Part 2

Day 41: Setting up an Application Load Balancer with AWS EC2 ??

AWS IaaC launch EC2 Instance

Publicly available container app stores