登录查看更多内容

How We Saved 20x Storage Costs by Fixing Our AWS S3 Glacier Implementation: A Deep Dive

safeINIT

AWS Experts - Security, DevOps, Automation | AWS Partner Advanced Tier Services

发布日期: 2024年10月24日

Managing IoT data at scale is quite the adventure - from device registration to storage, processing, and historical analysis, each step brings its own set of challenges. Recently, our team dove into a storage optimization project that started as a simple cost-cutting exercise but ended up teaching us valuable lessons about AWS S3 storage management.

Quick disclaimer: This was very much a "fix-it" operation. Sure, we could have designed things better from the start, but let's be honest - sometimes you inherit decisions or make choices that need revisiting. Here's our story of turning a storage mess into a win for our infrastructure.

The Architecture

Our initial setup was straightforward:

Hundreds of IoT devices streaming sensor data (temperature, humidity, noise levels) to an AWS IoT endpoint over MQTT
An AWS IoT rule routing each payload (100-300 bytes) to:

a. A Kinesis Data Stream for real-time processing

b. An S3 bucket for long-term data retention, where each payload became a single S3 object

A bucket lifecycle policy that automatically transitioned objects older than 90 days to Glacier Flexible Retrieval - a decision that would later prove problematic

The Wake-Up Call

The true cost of our storage decisions remained hidden among our overall AWS spending until we needed to integrate historical data with our reporting system. Like many teams, we turned to Amazon Athena for these ad-hoc queries. That's when things got interesting.

A deep dive into AWS Cost Explorer revealed alarming spikes in S3 costs during Athena queries. The culprits? Request-Tier2 and Request-Tier3 charges. But the real shock came when we examined our bucket metrics:

? GlacierObjectOverhead: ~57 GB

? GlacierS3ObjectOverhead: ~13.3 GB

? StandardStorage: ~2.7 GB

? GlacierStorage: ~666 MB

To put this in perspective: we were using about 3.5GB of actual storage with a staggering 20x overhead. Here's why:

Per AWS documentation, each Glacier object requires:

a. 32 KB of index/metadata storage (charged at Glacier rates)

b. 8 KB of S3 metadata storage (charged at Standard rates)

The math was painful: with approximately 10 million objects, we were:

Paying for massive metadata overhead
Incurring frequent Glacier retrieval costs
Getting zero benefit from Glacier storage due to our small object sizes

Three key realizations emerged:

Our transition to Glacier was unnecessary for our use case
The overhead costs far outweighed any storage savings
We needed to revert everything to Standard storage for our reporting needs

The Solution: A Three-Step Migration

Since AWS doesn't provide a direct path from Glacier to Standard storage, we developed a three-phase approach using several AWS services. Here's how we did it:

Phase 1: Object Discovery

Implemented S3 Inventory Reports to catalog our bucket

a. Generated daily reports of all objects and their storage classes

b. Configured CSV format output to a separate bucket

Puzzle piece number one - S3 inventory report

Used AWS Glue Crawler to create a table schema
Queried the inventory with Athena to identify Glacier objects:

SELECT bucket, key FROM "sensor_inventory"."data" where storage_class = '"GLACIER"';

Phase 2: Glacier Restoration

Used the CSV output from our Athena query as input for S3 Batch Operations
Initiated bulk restore operations with a 14-day restoration period

领英推荐

These 5 Trends in Cloud Computing Poised to Reshape…

Data Center Knowledge 2 个月前

Artificial Intelligence Infrastructure to Surpass…

Analytics Insight? 3 个月前

Texas Hyperscale Data Center Growth

TradeSTAR, Inc 4 个月前

a. The extended period gave us buffer time for unexpected issues

b. Files would temporarily return to Standard storage

We chose 14 days to give us enough time in case something else popped up

Phase 3: Storage Class Migration

Launched another S3 Batch Operation using the same CSV
Configured in-place copy operations:

I. Source: Restored objects

II. Destination: Same bucket/key

III. New storage class: Standard

Important note: This process updates the "Last modified date" of all objects. If your application logic depends on this timestamp, plan accordingly.

The Plot Twist: Versioning Complications

Just when we thought we were done, the bucket metrics told a different story - our storage usage hadn't decreased. After some investigation, we discovered an overlooked detail: bucket versioning was enabled. Our copy operations had created new Standard versions while preserving the old Glacier versions as noncurrent. Back to the drawing board!

Fortunately, AWS had already thought of this scenario. The solution was straightforward: S3 Lifecycle Rules. We added a simple rule to clean up noncurrent versions:

Lifecycle rules can take up to 48 hours before they are executed

With this final piece in place, our storage optimization was truly complete. The noncurrent Glacier versions were cleaned up automatically, and our metrics finally showed the expected reduction in storage usage.

Lessons Learned

Our storage optimization journey revealed several key insights that might help others avoid similar challenges:

Size Matters

i. Small objects (under 128KB) are rarely cost-effective for Glacier storage

ii. AWS now enforces this best practice by preventing transitions for objects under 128KB

iii. Always consider object size distribution when planning storage lifecycles

Hidden Costs

i. Storage class metadata overhead can dwarf actual storage costs

ii. Each Glacier object carries 40KB of overhead (32KB + 8KB)

iii. Regular cost analysis is crucial, especially as data volumes grow

Best Practices

i. Analyze your data patterns before implementing lifecycle rules

ii. Consider consolidating small objects before archival

iii. Document your storage decisions and their rationale

iv. Regularly review AWS's latest service updates and constraints

Process Insights

i. S3 Inventory Reports are invaluable for large-scale storage analysis

i. Always check versioning settings before bulk operations

i. Plan for sufficient restoration time when working with Glacier

iv. Test your migration process with a small subset first

This experience taught us that while AWS provides powerful storage options, their effective use requires understanding both the technical details and economic implications. What started as a costly oversight became a valuable learning opportunity, leading to better storage management practices across our organization.

Remember: sometimes the best way to learn cloud best practices is to clean up after not following them. As AWS's recent 128KB transition constraint shows, even cloud providers learn and adapt their services based on customer experiences like ours.

How We Saved 20x Storage Costs by Fixing Our AWS S3 Glacier Implementation: A Deep Dive

safeINIT

AWS Experts - Security, DevOps, Automation | AWS Partner Advanced Tier Services

The Architecture

The Wake-Up Call

The Solution: A Three-Step Migration

Phase 1: Object Discovery

Phase 2: Glacier Restoration

领英推荐

Phase 3: Storage Class Migration

The Plot Twist: Versioning Complications

Lessons Learned

safeINIT的更多文章

社区洞察

其他会员也浏览了

2024 Storage Technology Trends: What You Need to Know

Finding the Right AI-Ready Data Infrastructure for Cloud and Internet

What Are the Key Drivers Behind the Rapid Growth of the Data Industry?

AWS Snowball- Simplifying Data Transfer and Edge Computing EP:16

AWS Parallel Computing Service: High-performance computing at any scale is here

AWS re:Invent 2023: A Recap of Adam Selipsky's Keynote

Separating Knowledge, Compute, and Storage is the Next Big Leap in Data Platforms

Hyper-Scale Computing Market Steady 21.4% CAGR Expected from 2022 to 2032 || Report By Market.us

AI Data Center Network Architecture Requirements: 400/800G Optical Transceivers

DATA CENTER

The Architecture

The Wake-Up Call

The Solution: A Three-Step Migration

Phase 1: Object Discovery

Phase 2: Glacier Restoration

领英推荐

Phase 3: Storage Class Migration

The Plot Twist: Versioning Complications

Lessons Learned

safeINIT的更多文章

From HIPAA Compliance to HIPAA Clarity: Enriching AWS CloudWatch Alarms Notifications

Overview of AWS Identity and Access Management

Overview of Access Control in AWS

How Team Members Can Access Resources in AWS

Tips and Best Practices to Set Up Granular Access Control in AWS

社区洞察

其他会员也浏览了

2024 Storage Technology Trends: What You Need to Know

Finding the Right AI-Ready Data Infrastructure for Cloud and Internet

What Are the Key Drivers Behind the Rapid Growth of the Data Industry?

AWS Snowball- Simplifying Data Transfer and Edge Computing EP:16

AWS Parallel Computing Service: High-performance computing at any scale is here

AWS re:Invent 2023: A Recap of Adam Selipsky's Keynote

Separating Knowledge, Compute, and Storage is the Next Big Leap in Data Platforms

Hyper-Scale Computing Market Steady 21.4% CAGR Expected from 2022 to 2032 || Report By Market.us

AI Data Center Network Architecture Requirements: 400/800G Optical Transceivers

DATA CENTER