登录查看更多内容

Unlocking Performance Engineering outcomes with realistic data !!

Prateek Jain

?Performance Architect ?Platform Engineering ?SRE ?Observability

发布日期: 2024年12月31日

There are various scenarios where you encounter production getting blasted even after countless Performance tests.

Performance in prod looks like getting out of hands with increasing concurrency but during load tests - it was a stair case pattern only.

Have you wondered why !!

There could be many potential causes like:

Workload mismatch
Shared traffic from other apps
Concurrent access by ETL jobs
Unrealistic data been inserted

But we will focus on "role of data" been used for load tests.

Let's understand data first

Data is characterised by 3 properties:

Volume - it's the data volume in terms of size or user record count . e.g. running test over a 10 GB data volumes having record for 10k customers
Velocity - it's the rate at which data changes , it includes both new customer been added and existing records been updated.
Variation - It's the most difficult and hence, most commonly skipped in test data. This represents realistic variation in test data.

let's take an example:

You have to test over an eCommerce site where product is represented in following manner:

But apart from product Id all other object values remains same for next 10k products.

In this scenario, even though you created or replicated 10 GB data volumes from somewhere but you don't understand it completely (that's not your fault man !!) and that's where you skip data characterisation in above 3 aspects.

领英推荐

Delta Live Tables — Part 4— Data Processing and…

Krishna Yogi Kolluru 8 个月前

Selected Data Engineering Posts . . . February 2024

Axel Schwanke 1 年前

Data Engineering Best Practices: The Secret Sauce to…

Zara Harvey 5 个月前

But What does it cost to do and to NOT do it

For understanding data , it's necessary to:

Look at how data has been generated - whether data has been generated using a database script or api or some automation script? Each process can be okay but how and what constraints been overlooked need to be understood.
How data represents actual objects - you cannot set 200 products with same price or unrealistically low or high price. Why so ?? you are not actually gonna buy them . One of the reason I give here is , filters and sort functions both impacts the search and database queries way too much if used with a dummy or unrealistic data.
Get data validated with architects and product managers (they are the best person to set right expectation for variation in end product)

So cost is efforts and time !!

What the cost to NOT doing it ?? It's could be the following outcome:

So with a brief example, we went through significance of data in load testing. It's impact and cost of doing and NOT doing it thoroughly.

You might be able to sail you boat as usual with dummy data but only as long as there are more critical flaws in your production which overshadows the under utilization of realistic data.

Thanks, Happy reading :)

Hope you like it and somewhat concur with what I'm trying to emphasize over here.

Rahul Sharma

None at *none

2 个月

Yep. I am good Prateek. :)

Rahul Sharma

None at *none

2 个月

Wonderful insights on how "realistic" data plays a crucial role in performance testing. Thank you for sharing this Prateek

1 次回应

查看更多评论

要查看或添加评论，请登录

Prateek Jain的更多文章

Understanding Non-Java Threads: A Deep Dive into JVM Internals and Performance Engineering

2025年3月19日

Understanding Non-Java Threads: A Deep Dive into JVM Internals and Performance Engineering

As a Performance Engineers, one thing which always baffled me is to see plethora of different threads in a thread dump.…

3 条评论
Bottleneck Diagnosis through Heap Analysis - Essential part of Performance Engineering

2025年3月13日

Bottleneck Diagnosis through Heap Analysis - Essential part of Performance Engineering

Have been running a Spring boot application and even able to test it up to 1100 request per second. More details here…

8 条评论
Exploring lesser used features of G1GC : Java Performance Engineering

2025年3月11日

Exploring lesser used features of G1GC : Java Performance Engineering

G1GC aka generation garbage collector came out as revolutionary GC algo. Where heap is divided not into young, old and…

2 条评论
Part 1: Scaling My First Spring Boot API to Achieve 1000 Requests Per Second

2025年3月3日

Part 1: Scaling My First Spring Boot API to Achieve 1000 Requests Per Second

Have been exploring and studying Spring boot application inside out. Little bit about System under test.

2 条评论
Analyzing Performance with Java Flight Recorder

2025年2月28日

Analyzing Performance with Java Flight Recorder

Let's look at native Performance profiling tools among the plethora of open source and licensed tool. 5 Key Things to…

18 条评论
All about 'Disaster Recovery' in the Cloud

2025年1月8日

All about 'Disaster Recovery' in the Cloud

Cloud disaster recovery is essential for modern businesses aiming to achieve resilience and maintain operational…
Optimizing JVM performance with 'jstat'

2025年1月4日

Optimizing JVM performance with 'jstat'

What could you do if you realise no tools been instrumented for Jvm performance diagnosis and analysis !! One of the…
Performance Engineering: Relationship between SOLID principles and Scalability in Software Development

2025年1月1日

Performance Engineering: Relationship between SOLID principles and Scalability in Software Development

SOLID principle refers to set of 5 design guidelines that helps create a highly scalable, performance agnostic and…
How to solve your Performance Problem without increasing the capacity !!

2024年12月30日

How to solve your Performance Problem without increasing the capacity !!

One of the most captivating and immediate action is to increase the capacity either vertically or horizontally or both.…

5 条评论
Treat Performance as THE key feature

2020年7月7日

Treat Performance as THE key feature

In the highly competitive digital era , there is no more option of keeping performance as just 'good to do' quality…

See all articles

Unlocking Performance Engineering outcomes with realistic data !!

Prateek Jain

?Performance Architect ?Platform Engineering ?SRE ?Observability

Have you wondered why !!

领英推荐

But What does it cost to do and to NOT do it

Prateek Jain的更多文章

社区洞察

其他会员也浏览了

Delta Live Tables — Part 5— Exploring Advanced Features and Optimization Techniques in Delta Live Tables

Kafka Schema Registry

Microsoft Fabric for Data Engineering: Best Practices for Pipeline Automation

Architecting Data Pipelines

Avoid These Airflow Mistakes: Best Practices for Reliable Data Pipelines

Data Engineering Concepts: Using an API to Send Data

Data Nugget June/July 2024

DBT ZERO TO HERO

Solving the CDC | Log Based | ETL | Data engineering

How to build Robust Data Transformation Pipeline with Dbt?

Have you wondered why !!

领英推荐

But What does it cost to do and to NOT do it

Prateek Jain的更多文章

Understanding Non-Java Threads: A Deep Dive into JVM Internals and Performance Engineering

Bottleneck Diagnosis through Heap Analysis - Essential part of Performance Engineering

Exploring lesser used features of G1GC : Java Performance Engineering

Part 1: Scaling My First Spring Boot API to Achieve 1000 Requests Per Second

Analyzing Performance with Java Flight Recorder

All about 'Disaster Recovery' in the Cloud

Optimizing JVM performance with 'jstat'

Performance Engineering: Relationship between SOLID principles and Scalability in Software Development

How to solve your Performance Problem without increasing the capacity !!

Treat Performance as THE key feature

社区洞察

其他会员也浏览了

Delta Live Tables — Part 5— Exploring Advanced Features and Optimization Techniques in Delta Live Tables

Kafka Schema Registry

Microsoft Fabric for Data Engineering: Best Practices for Pipeline Automation

Architecting Data Pipelines

Avoid These Airflow Mistakes: Best Practices for Reliable Data Pipelines

Data Engineering Concepts: Using an API to Send Data

Data Nugget June/July 2024

DBT ZERO TO HERO

Solving the CDC | Log Based | ETL | Data engineering

How to build Robust Data Transformation Pipeline with Dbt?