Don’t Ignore Response Codes

Don’t Ignore Response Codes

On a dark, gloomy day in Melbourne with rain pouring down our office windows, team was debating whether to conduct performance test for AWS S3 & Cloudfront or not. Finally we agreed that it is a good idea to run a performance test.

The objectives of the test was as follows:

  1. Identify any configuration related issues with Cloudfront and S3.
  2. Determine whether these services can handle the anticipated load.

At this point you might wonder, Cloudfront can handle 100,000 requests/sec per distribution and S3 can handle 5500 GET requests/sec, so why the need? Well, so did we, until we discovered an issue none of us had anticipated. If left unfixed, would have impacted end-user experience (and sales).

To test this scenario:

  • Static content (Product images, JS, CSS, etc) from our existing production copied to the new infrastructure.
  • Existing CDN logs from production used to design the application simulation model (ASM) to test the new infrastructure.
  • Static content URL's data extracted from access logs for load testing. Replaying static content URL's to test Cloudfront & S3 based on the ASM.
  • Test scenario created in JMeter
  • Load generated and analysis done through Octoperf.

Below is the HTTP response code pie chart generated at the end of the test. Nothing too exciting, except I have 7.6% HTTP 404 responses and the remaining 92.4% HTTP 200 responses.

No alt text provided for this image

At this point, I started thinking, I don’t see ~7.5% 404’s for static content in our production. Plus data used in testing is an exact copy of production data. So there is no reason to see such a high failure rate.

I raised my finding with the team, and within an hour of our investigation, we identified the root cause of the issue. The real issue was how Windows and Linux OS handles file structure.

The existing production architecture runs on Windows OS. The Windows file system isn't case sensitive, so it treats names (upper & lower case) as the same file. Whereas the Linux file system (S3 bucked mounted) is case sensitive.

What this means, from a customer point of view is that they will receive a 404 for product images (as an example). This is because the requested image URL path resolves to a location in the S3 bucket that does not exist. For example, the following is a response received for one of the static requests from Cloudfront. Notice the "N" in the file path — the real folder in the S3 bucket ends with "n" and NOT "N."

No alt text provided for this image

Since the Windows file system is not case sensitive, it resolved to the path that already existed.

Take away from this post are:

  1. During your analysis don't ignore Non HTTP 200 response codes. They can tell a lot about your system health.
  2. There is merit in using production data to performance test your system.
  3. You don't have to always run an end-to-end test. Understand the objective of the test and accordingly design your tests.

--------------------------------------------------------------------------------------------------------------

Thanks for reading!

If you enjoyed this article feel free to share on social media ??

Say Hello on: Linkedin | Twitter

Peter England

Performance Engineer

4 年

I agree don't ignore the response codes but you could have made the discovery of the 404 error through F12 dev tools or fiddler?

Cristian Camilo Moyano Bautista

Senior Performance Engineer en Sonatype

4 年

Excellent, thank you for sharing, I'd add something based on my experience: including response assertions or MD5 (or any other as appropriate) helps mitigate those associated risks

Andrew Testa

Principal Performance Engineer at Sky

4 年

Would there be any reasons for ignoring non 2xx responses?

Test without assertion or checkpoints is jr subject...

Guillaume Betaillouloux

Co-founder at OctoPerf.com

4 年

I think I recognize this screenshot ??

要查看或添加评论,请登录

Harinder Seera的更多文章

社区洞察

其他会员也浏览了