登录查看更多内容

How AWS S3 E-Tags Work

Marco Rizk

Staff Engineer @ Synapse Analytics | Building & Scaling AI Products

发布日期: 2024年2月29日

Many use-cases would run more efficiently if files are only downloaded when they are updated or changed. When using AWS S3, one of the important object properties is it's "Etag" which is some sort of a checksum that's used by AWS to check on file completeness files on upload or download.

Comparing a local file Etag with an AWS Etag can be a tricky business because of how AWS calculates them. Here is a summary of my findings going through their documentation, code base and random blogs.

Small Files

Small files are uploaded in a single request and the E-tag is the md5 digest of the file.

Large Files

For larger files AWS uses multipart upload. And here is where E-tag calculation gets tricky. The E-tag of a multipart file is calculated as follows:

Split the file into chunks using a proper chunksize (this is important for later)
Upload each chunk and calculate its md5 digest
Concatenate md5 digests for all chunks
Calculate the md5 hash for the concatenated digests and append "-" followed by number of parts

So far so good, but if you don't know the chunk size being used then there is some trouble ahead since there are lots of chunksizes that correspond to the same number of parts. Finding a small set of chunksizes to calculate possible Etag values is crucial to make this comparison possible. After a decent amount of reading, debugging and monitoring browser network tabs. Here are the values used most commonly

8388608 used by Aws Cli and Boto3
15728640 used by S3 cmd
17179870 used by S3 Browser Console
Factors of 1MB used by common uploaders

Finally i summarized all of this into a github gist. hope it makes someone's day easier.

File Mover

Transfer your files from anywhere to anywhere

3 个月

Very nice indeed, thank you for this information.

Thiago Maior

CEO at EZOps Cloud | Leading the future of DevOps with secure and efficient solutions allied with AI-powered innovation

1 年

Nice content!

1 次回应

查看更多评论

要查看或添加评论，请登录

Marco Rizk的更多文章

Optimizing JSON for LLMs: Reducing Token Count & Cutting Costs

2025年3月6日

Optimizing JSON for LLMs: Reducing Token Count & Cutting Costs

JSON Is Not LLM-Friendly JSON is a widely used format for structuring data, but when working with large language models…
Can you do object detection with just one kernel ?

2024年10月30日

Can you do object detection with just one kernel ?

Every now and then I like to ask questions that challenge how well I understand some topic. For a while I've had this…
A curious case of data leakage in computer vision

2022年8月29日

A curious case of data leakage in computer vision

Over the past few weeks, I've been working on a defect detector for one of our client's production lines. The problem…

2 条评论
Does Faster mean better ? A quick story of optimization

2022年5月3日

Does Faster mean better ? A quick story of optimization

During the past couple of weeks we were looking into optimizing one of our image processing pipelines. To paint you a…
Django 3 & Channels 3, a bad recipe. What can you do about it ?

2021年11月6日

Django 3 & Channels 3, a bad recipe. What can you do about it ?

A few weeks back we upgraded to Django 3.2.

4 条评论
What to consider when selecting YOLO for real-time applications

2020年1月26日

What to consider when selecting YOLO for real-time applications

Real-time object detection has seen a taken a huge leap from where it was a decade ago, the rise of Deep Neural Nets…
Using ACE Algorithm for Optimal Multiple Regression in Engineering Applications

2019年1月10日

Using ACE Algorithm for Optimal Multiple Regression in Engineering Applications

When performing multiple linear regression, it's often the case that the predictors (x1,x2,..

8 条评论

See all articles

How AWS S3 E-Tags Work

Marco Rizk

Staff Engineer @ Synapse Analytics | Building & Scaling AI Products

Marco Rizk的更多文章

社区洞察

其他会员也浏览了

Part 3/4: AWS Lambda for Beginners

Terraform in 6 Weeks - Week 2: Build a Scalable Static Website

Kubernetes Master and Worker Node Setup on AWS in 7 Easy Steps

Day 65 - Working with Terraform Resources

AWS authentication in X++ for Vendor Central & others

AWS CLI TASK 1

Deploying ArgoCD in a multi-cluster AWS environment: A Step-by-Step Guide

AWS Elastic Kubernetes Service

Start EKS kubernetes with eksctl

Marco Rizk的更多文章

Optimizing JSON for LLMs: Reducing Token Count & Cutting Costs

Can you do object detection with just one kernel ?

A curious case of data leakage in computer vision

Does Faster mean better ? A quick story of optimization

Django 3 & Channels 3, a bad recipe. What can you do about it ?

What to consider when selecting YOLO for real-time applications

Using ACE Algorithm for Optimal Multiple Regression in Engineering Applications

社区洞察

其他会员也浏览了

Part 3/4: AWS Lambda for Beginners

Terraform in 6 Weeks - Week 2: Build a Scalable Static Website

Kubernetes Master and Worker Node Setup on AWS in 7 Easy Steps

Day 65 - Working with Terraform Resources

AWS authentication in X++ for Vendor Central & others

AWS CLI TASK 1

Deploying ArgoCD in a multi-cluster AWS environment: A Step-by-Step Guide

AWS Elastic Kubernetes Service

Start EKS kubernetes with eksctl