How AWS S3 E-Tags Work

How AWS S3 E-Tags Work

Many use-cases would run more efficiently if files are only downloaded when they are updated or changed. When using AWS S3, one of the important object properties is it's "Etag" which is some sort of a checksum that's used by AWS to check on file completeness files on upload or download.

Comparing a local file Etag with an AWS Etag can be a tricky business because of how AWS calculates them. Here is a summary of my findings going through their documentation, code base and random blogs.

Small Files

Small files are uploaded in a single request and the E-tag is the md5 digest of the file.

Large Files

For larger files AWS uses multipart upload. And here is where E-tag calculation gets tricky. The E-tag of a multipart file is calculated as follows:

  • Split the file into chunks using a proper chunksize (this is important for later)
  • Upload each chunk and calculate its md5 digest
  • Concatenate md5 digests for all chunks
  • Calculate the md5 hash for the concatenated digests and append "-" followed by number of parts

So far so good, but if you don't know the chunk size being used then there is some trouble ahead since there are lots of chunksizes that correspond to the same number of parts. Finding a small set of chunksizes to calculate possible Etag values is crucial to make this comparison possible. After a decent amount of reading, debugging and monitoring browser network tabs. Here are the values used most commonly

  • 8388608 used by Aws Cli and Boto3
  • 15728640 used by S3 cmd
  • 17179870 used by S3 Browser Console
  • Factors of 1MB used by common uploaders


Finally i summarized all of this into a github gist. hope it makes someone's day easier.


File Mover

Transfer your files from anywhere to anywhere

3 个月

Very nice indeed, thank you for this information.

回复
Thiago Maior

CEO at EZOps Cloud | Leading the future of DevOps with secure and efficient solutions allied with AI-powered innovation

1 年

Nice content!

要查看或添加评论,请登录

Marco Rizk的更多文章

社区洞察

其他会员也浏览了