“No” to Testing in Production!?… “Yes” to Observability???
Cat being scared of testing in prod! Photo by Mikhail Vasilyev on Unsplash

“No” to Testing in Production!?… “Yes” to Observability???

???A few days ago I was lurking on the real estate website and found a very curious listing… it was a TEST LISTING! On the customer-facing website! Of course, I looked around for more! And, of course, I found more… ??

From this article, you will learn:

  • is testing in production a good idea
  • what are the possible risks
  • what are the alternatives
  • how to do it safely

"Test Property Listings". URL and favicon are removed intentionally (the point of the article is not guilt & shame)


1 - Production

Let’s start with a question! What is a “production”?

An environment where your software is deployed and used by the ACTUAL customers/users

For example, if there is a website and it is public (as in “can be visited by typing a URL in your browser”), then you could say that this is a PRODUCTION website (generally speaking, there might be some exceptions, of course).


2 - Let’s test there, shall we?!

So the question becomes:

Do you need to do testing IN PROD?

Short answer: yes, sort of!

Long answer: it depends (haha, of course ??). It depends on what do you mean by the word “testing” and by your goals of performing such an activity.

If your definition of testing is similar to this:

Testing is 
Interacting with a product in order to
Learn and discover information about it
        

…then yes, in some cases you need to! Would you not like to KNOW information about our production!?


3 - D A N G E R ?

You established that you need to KNOW about the current state of the production, but is testing it actually dangerous?

It could be! Here are some examples of when this testing could cause trouble:

  • ?? you discover a runtime bug that will crash a service and cause a surprise incident/outage (aka unintentionally killing prod)
  • ?? you “pollute” and “corrupt” production data. Taking this real estate website as an example, you are creating test listings for “billions of dollars”. If you run any analytics reports on your data or have a feature “average selling price in the suburb XYZ” this test listing COULD AFFECT the result. Then some people might make wrong decisions based on these “fake/biased results” (customers looking to buy or rent property and doing math on the attractiveness of the deal, external analysts, product managers, etc.)
  • ??? you might create security risks (often for testability, test users might some hacks to bypass 2FA and captcha, or a weak shared password, thus creating a potential vector to attack/enter the system)
  • ?? compliance risk (in some cases having test data and users in prod might have legal consequences. Like “failing an audit” and losing your compliance status!)
  • …and more

Of course, it is not always “caps lock dangerous” kind of danger! It could be totally fine to do this! But you need to know what you doing, why, and WHAT IS THE RISK!

For example, running an automated post-deployment “smoke UI test” COULD BE a perfectly valid (and mostly safe) example of an acceptable scenario for prod testing

Did you notice how the words “could be” are capitalised? Because it is not a universal truth! An alternative approach is to rely more on observability (aka o11y)!


4 - Observability! ??

Observability seagull. Photo by Raphael Andres on Unsplash

So what is this “observability”? It is a property characterising the “ability of the system to be observed”. Ability to look inside and find the answer to your question. I like how Thejas Ramashekar described it here:

You have sensors thrown all over the forest and You don't even know what you want to do with them. But when somebody says "hey whats the tiger count?" You say "select count('animals") from sensors where shape=~/tiger/ and sound=~/growl/"

How does this interfere with the “UI Smoke test”? Well, because if you have deployed SENSORS to answer various questions, you can set up DASHBOARDS with WIDGETS (visual representation of these questions) that will ALERT you when something is NOT WORKING as expected.

Some examples:

  • how many users have logged in during the last minute
  • what is the error rate on your API endpoints
  • average amount of goods customers add to the basket
  • successful payment rate
  • …etc.

You are getting answers to any question, in real time!

If you have good observability tooling, you do not RELY ON your smoke test in production anymore! Alerting will notify you FASTER than a slow and unreliable UI test (if failed, you will very likely try to re-run it first, then try to debug “what’s wrong”, then (20 minutes later) you will panic and ring the bell)

If an alert COULD HIGHLIGHT the same information in a few minutes (and more reliably!), but your UI test couldn’t… Maybe… you don’t need the automated test!?

Exception: situations when low traffic is generated by your customers!

During low traffic, alerts could be useless/blind. For example, if your target audience is homogeneous and not active (in the same timezone and they only use your product during business hours).

You need to generate traffic to make sensors alive again! Even artificial traffic, like the one from your tests, would do! But even in this case, the UI Smoke test is a solution for enabling alerting, not so much a testing thing! ?

5 - Do it safely!

Sign telling you to look for better ways! Photo by Joshua Hoehne on Unsplash

Despite all that, there might be situations when you need to test in prod! So you need to make sure you do it in a safe manner.

Here’s some tips on how this COULD BE achieved:??

Use canary and feature flags

Enabling new code to work only for a “certain test user” or with “certain test conditions” (like secret cookie value). This way, if something goes wrong the scope will be limited to your test user! And you can turn the whole new feature off by flipping the flag from “true” to “false”!

Do it in the environment BEFORE the prod

Look again at the real estate website screenshots, what is the good reason to do the “How would property listing look like in the feed?“ kind of testing in prod? This testing should be done using an appropriate dev/test environment and tools (component testing or visual testing).

Rely on observability and alerting

We already discussed that above. Rely on those more! But if you testing in prod for whatever reason, pay attention to “the radars” from time to time to be sure you didn’t break anything “at scale”!


The end

If you read till the end, you are the best! If you enjoyed it, don’t forget to react or leave a comment. This helps the algorithm to show it to other relevant people. More people see and react means I am more motivated to write more. I write more and YOU will benefit from reading it. ????


Are you a manual tester? Or a beginner automation engineer? Want to be better? Want to learn more about test automation and not sure how to go on this journey? I can help!

I coach people on test automation within the JavaScript ecosystem. UI/API/Unit/Performance/Contracts, you name it… You can book a free first consultation to talk about your needs and goals here: https://ivanandcode.com/coaching

Ivan invites you to click on the link!


Konstantin Sakhchinskiy

Sr. QA Engineer and Team Lead

12 个月

Nice story ?? thanks for sharing it ??

Ivan Karaman

Experienced QA Engineer | Test Automation Coach | Quality Advocate

12 个月

Do you test in prod? Curious to hear why! ??

回复

要查看或添加评论,请登录

Ivan Karaman的更多文章

社区洞察