登录查看更多内容

From Apache Spark to Ray- How Amazon Saved $100 Million by This Switch

Aashiya Mittal

Technical Content Writer @ OnGraph Technologies Limited | BA in Web Content Creation

发布日期: 2024年11月10日

Imagine a company so large that even the smallest performance improvements can lead to millions in savings. At Amazon, where data processes power everything from logistics to customer insights, even a minor tweak in efficiency can transform operations.

Recently, Amazon made a game-changing shift in its database management strategy.

Amazon's Business Data Technologies (BDT) team made a significant move by migrating their data processing tasks from Spark to Ray, tackling challenges with exabyte-scale data.

Here's how they saved $120 million a year:

The Problem

Merge Operation Struggles: As Amazon's data grew, "merge" operations became increasingly slow and unreliable, taking days or weeks to complete.
Compaction Jobs Delays: Spark's performance began to lag when dealing with exabyte-scale data. Traditional Spark jobs were taking too long to finish, and scaling solutions were limited.

The Solution

Moving to Ray: BDT tested Ray and discovered it could handle much larger datasets—12 times bigger than Spark—and was 91% more cost-efficient. Ray’s advanced task orchestration and zero-copy shuffling contributed to better resource usage and faster processing speeds.
Serverless Design: BDT switched to a serverless architecture, using Ray on EC2, DynamoDB, and other AWS services for job tracking and management.

Results

82% more efficient: Ray sped up compaction tasks, reducing execution time from half a minute to less than a tenth of a second.
$100 million savings: By shifting to Ray, Amazon reduced its computational costs by $100 million annually.
250,000 vCPU years saved: Ray reduced the computational need by roughly 250,000 vCPU years every year.
Improved reliability: Ray's reliability increased from 85% to 99.15%, closely matching Spark’s 99.91%.
Reduced memory usage: Ray consumed 55% of server memory, leading to optimized memory usage for large-scale operations.

Ray’s speed and scalability allowed Amazon to meet its massive data processing needs, improving both cost and operational performance dramatically.

Future Outlook

Ray is a strong contender for large-scale data operations, particularly for solving specific, complex problems. The team is working on adapting Ray’s compaction algorithm to integrate with Apache Iceberg, a feature expected to improve processes in 2025. Ray’s flexibility makes it a valuable tool for organizations willing to invest in tailored solutions to tackle challenging and costly issues.

要查看或添加评论，请登录

Aashiya Mittal的更多文章

Shocking Cyber Gap: 17% of Employees Fall for Scams!

2025年3月18日

Shocking Cyber Gap: 17% of Employees Fall for Scams!

A recent cybersecurity drill revealed a major workplace security gap—17% of employees still click on phishing links…
What's New in Flutter 3.29: Every Developer Should Know

2025年3月17日

What's New in Flutter 3.29: Every Developer Should Know

Flutter 3.29 has arrived, bringing a host of updates that enhance performance, fidelity, and development efficiency…
Safe AI Agents in High-Stakes Industries – A Necessity, Not an Option

2025年3月15日

Safe AI Agents in High-Stakes Industries – A Necessity, Not an Option

AI agents are evolving fast, but when it comes to high-stakes industries like healthcare, finance, and defense, the…
AI Agents in 2025: Your Business Won’t Work the Same Again

2025年3月14日

AI Agents in 2025: Your Business Won’t Work the Same Again

AI is no longer just a tool—it’s becoming a fully autonomous force reshaping industries. In 2025, AI agents are not…
Did You Check Your Company's Carbon Footprint in 2025?

2025年3月13日

Did You Check Your Company's Carbon Footprint in 2025?

Let's be honest—climate change is no longer a distant problem; it's happening now. Every business, including mine, has…
SAFe 6.0 & Lean Portfolio Management: A New Approach to Business Agility

2025年3月12日

SAFe 6.0 & Lean Portfolio Management: A New Approach to Business Agility

In today’s fast-moving digital landscape, businesses must adapt quickly, make data-driven decisions, and continuously…
How SAFe 6.0 is Changing the Game for Agile – 6 Big Updates You Should Know!

2025年3月11日

How SAFe 6.0 is Changing the Game for Agile – 6 Big Updates You Should Know!

If you’ve been working with Agile at scale, you’re probably familiar with SAFe? (Scaled Agile Framework?)—one of the…
The Future is Now: 10 Tech Trends to Watch in 2025

2025年3月10日

The Future is Now: 10 Tech Trends to Watch in 2025

Technology is evolving at lightning speed, and 2025 is set to be a game-changer. We’re witnessing innovation that’s…
Tech Trends 2025: Navigating the Next Wave of Digital Transformation

2025年3月9日

Tech Trends 2025: Navigating the Next Wave of Digital Transformation

If 2024 taught us anything, it’s that technology is evolving faster than ever. But staying ahead isn’t just about…
Why Most Businesses Aren't Ready for Generative AI – And How to Fix It

2025年3月8日

Why Most Businesses Aren't Ready for Generative AI – And How to Fix It

Introduction Generative AI (GenAI) is rapidly reshaping industries, offering unprecedented automation, efficiency, and…

See all articles

Amazon's Business Data Technologies (BDT) team made a significant move by migrating their data processing tasks from Spark to Ray, tackling challenges with exabyte-scale data.

Here's how they saved $120 million a year:

The Problem

The Solution

Results

Future Outlook

Aashiya Mittal的更多文章

Shocking Cyber Gap: 17% of Employees Fall for Scams!

What's New in Flutter 3.29: Every Developer Should Know

Safe AI Agents in High-Stakes Industries – A Necessity, Not an Option

AI Agents in 2025: Your Business Won’t Work the Same Again

Did You Check Your Company's Carbon Footprint in 2025?

SAFe 6.0 & Lean Portfolio Management: A New Approach to Business Agility

How SAFe 6.0 is Changing the Game for Agile – 6 Big Updates You Should Know!

The Future is Now: 10 Tech Trends to Watch in 2025

Tech Trends 2025: Navigating the Next Wave of Digital Transformation

Why Most Businesses Aren't Ready for Generative AI – And How to Fix It