Unlocking Performance Engineering outcomes with realistic data !!

Unlocking Performance Engineering outcomes with realistic data !!

There are various scenarios where you encounter production getting blasted even after countless Performance tests.

Performance in prod looks like getting out of hands with increasing concurrency but during load tests - it was a stair case pattern only.


Have you wondered why !!

There could be many potential causes like:

  1. Workload mismatch
  2. Shared traffic from other apps
  3. Concurrent access by ETL jobs
  4. Unrealistic data been inserted

But we will focus on "role of data" been used for load tests.

Let's understand data first

Data is characterised by 3 properties:

  1. Volume - it's the data volume in terms of size or user record count . e.g. running test over a 10 GB data volumes having record for 10k customers
  2. Velocity - it's the rate at which data changes , it includes both new customer been added and existing records been updated.
  3. Variation - It's the most difficult and hence, most commonly skipped in test data. This represents realistic variation in test data.

let's take an example:

You have to test over an eCommerce site where product is represented in following manner:

But apart from product Id all other object values remains same for next 10k products.

In this scenario, even though you created or replicated 10 GB data volumes from somewhere but you don't understand it completely (that's not your fault man !!) and that's where you skip data characterisation in above 3 aspects.

But What does it cost to do and to NOT do it

For understanding data , it's necessary to:

  1. Look at how data has been generated - whether data has been generated using a database script or api or some automation script? Each process can be okay but how and what constraints been overlooked need to be understood.
  2. How data represents actual objects - you cannot set 200 products with same price or unrealistically low or high price. Why so ?? you are not actually gonna buy them . One of the reason I give here is , filters and sort functions both impacts the search and database queries way too much if used with a dummy or unrealistic data.
  3. Get data validated with architects and product managers (they are the best person to set right expectation for variation in end product)

So cost is efforts and time !!

What the cost to NOT doing it ?? It's could be the following outcome:


So with a brief example, we went through significance of data in load testing. It's impact and cost of doing and NOT doing it thoroughly.

You might be able to sail you boat as usual with dummy data but only as long as there are more critical flaws in your production which overshadows the under utilization of realistic data.

Thanks, Happy reading :)

Hope you like it and somewhat concur with what I'm trying to emphasize over here.







Rahul Sharma

None at *none

2 个月

Yep. I am good Prateek. :)

回复
Rahul Sharma

None at *none

2 个月

Wonderful insights on how "realistic" data plays a crucial role in performance testing. Thank you for sharing this Prateek

要查看或添加评论,请登录

Prateek Jain的更多文章

社区洞察

其他会员也浏览了