登录查看更多内容

Migration Story: Moving High Scale Data and Compute from AWS to Azure

Conrad Egusa

CEO at Publicize and CEO at Espacio Media Incubator

发布日期: 2017年11月16日

+ 关注

This article was authored by Tomer Rosenthal, Ilana Kantorov, and Ari Bornstein on the Microsoft Developer Blog.

Background

We recently worked with Emedgene, a company that participated in the Microsoft Partners Program, on developing a next-generation genomics intelligence platform which incorporates advanced artificial intelligence technologies to streamline the interpretation and evidence presentation process. With this platform, healthcare providers will be able to provide individualized care to more patients through improved yield.

Emedgene continues to grow and, given their positive collaboration experience with Microsoft engineers, decided to migrate their solution from AWS to Azure with support from Microsoft. This code story demonstrates how to migrate the compute resources to Azure, transfer more than 100 TB of blob storage and handle application secrets without embedding the Azure SDK in the application code.

Architecture & Migration

A key part of Emedgene’s architecture is the provisioning of new EC2 Spot instances to execute computationally heavy analytics processes that require the input of large sets of genomics data in S3. Each analytics job metadata is enqueued in a queue for processing by the EC2 instances. The number of the instances is dynamic and varies according to the number of messages in the queue.

Compute: EC2 instances are provisioned using another EC2 instance that is monitoring a Redis Queue for additional jobs.

Data: The genomic datasets can comprise over one million individual files with a cumulative size limit of 100TB. In order to perform very fast analytics, Emedgene needs to copy the different sets of files from S3 to the instances attached disks each time an instance is provisioned and gain higher throughput and lower network latency between the instances and the data. In Azure, we will copy these files from Azure Data Lake Store to the VM’s attached disks the same way.

To support native scalability without using another application module, like the solution in AWS that included Redis Queues and additional EC2 instances, we used Virtual Machine Scale Sets (VMSS). VMSS enables us to monitor an Azure Service Bus queue for messages and provision new instances when the queue reaches a certain threshold. Once the application finishes its task, it invokes a script (Self Destroy Instance) that deletes the VM instance from VMSS. The script can be invoked in a Docker container for maximum flexibility in the deployment process.

Note: We considered working with Azure Batch Low Priority VMs but scaling with Azure Service Bus and custom VM images are not fully supported.

The DevOps flow

The Continuous Integration / Continuous Deployment (CI/CD) process is managed with Jenkins. While Jenkins provides a lot of flexibility, we needed a way to provision and manage Azure resources in the pipeline. To do this, we used Azure CLI 2.0 but we also needed to be able to propagate the results from each command to the next such as names and paths.

For example, this code is the result of a CLI command. We want to take the “name” property and propagate it to another command since it is dynamic.

Visit the Microsoft Developer Blog for the rest of the article.

Migration Story: Moving High Scale Data and Compute from AWS to Azure

Conrad Egusa

CEO at Publicize and CEO at Espacio Media Incubator

Background

Architecture & Migration

The DevOps flow

更多精彩文章

社区洞察

其他会员也浏览了

Topics – The Redpanda Newsletter (Issue #023)

An opinionated review of AWS ReInvent 2020

AWS update of Week 20 (15May-21May)

Comprehensive Guide to Microsoft Service Fabric: Advanced Features, Integration with Azure, and Use Cases

AWS re:Invent 2023: A Recap of Adam Selipsky's Keynote

AWS re:Invent: The "non GenAI" themes you should know about

AWS re:Invent 2022

Databricks Serverless Compute

AWS CloudWatch- Infrequent Access with Lambda and Serverless | Save upto 50% on CW cost.

Background

Architecture & Migration

The DevOps flow

Espacio's Tamara Davison on TechCrunch: Hola Code tackles the real migration crisis

2019年1月28日

An entrepreneur’s guide to Toronto’s tech scene

2018年9月28日

An entrepreneur’s guide to NYC’s tech scene

2018年8月5日

On our acquisition of PortugalStartups.com

2017年12月7日

Microsoft Technical Communications Director Joins The Sociable Advisory Board

2017年12月5日

Our belief in The Tech Panda led us to acquire it

2017年10月26日