Global Performance Testing - part 2: 
                            Test design
By Jakub (Kuba) Dering

Global Performance Testing - part 2: Test design


Designing performance tests

Performance test design is an extremely important part of your work as the accuracy of your tests need to represent a load profile expected on a production system. Now that you have the non-functional requirements defined, as well as the list of critical services - you can translate them into an actionable test scripts, simulating load expected on your production system. The more cases you'll cover with your tests, the less time you'll spend troubleshooting on a live product and there are important aspects of the load generation you may be missing implicitly. The test design is not only limited to the actions taken by the virtual user - it is also the architecture of your test scripts that's equally important. Since most of performance testers work with user-oriented test scripts, I'll cover some of the subjects using a web store as an example.

Test development

Whether you use a user interface to build your test out of action blocks, or write dozens of lines of code in a scripting language, you need to remember the code you generate is equally important as the application code you are testing and it's prone to the same errors the developers are facing, this includes bugs, maintainability and portability issues. "But I'm not a developer" - you might say. Sooner than later, you may find yourself spending more time fixing your current code than implementing new tests, or analyzing your test results. It is important to follow the same software development principles as the developers to avoid some common mistakes in programming. If you're not familiar with coding principles, I suggest you familiarize with the most common ones, like DRY, YAGNI or KISS here. Clean code helps you transfer the tests between different testing tools and establishing common development rules in testing teams allows multiple testers working on the same project. Programming is complex enough - don't make it more complicated with the code that has to be deciphered every time you read it.

Your tests performance matters

Requirements of thousands of transactions per second are nothing new in the modern services. Typical performance tests use a few load generators to simulate the load of thousands of machines at once and if your tests utilize too many resources, your project will also increase the hardware requirements. This is important if you work with shared testing infrastructure, like in Performance Center, and of course for hardware costs. Typical performance test structure consist of 3 steps : user initialization, action loop and user end. The most common mistake in performance test is putting too many runtime calculations and operations in Action block. When preparing the test, before writing any line of code I ask myself a question whether this line even belongs in the block I'm trying to implement it. Try limiting your action loop only to IO calls and assertions - caching and static data are your friends. It's a good practice to measure the performance of your tests by mocking the services you plan to access and create as many threads as you can - then see how much resources you utilize only on preparing your input data.

Not all your users have to render a browser

While measurements of rendering times is a very important topic, you don't have to render it for every single virtual user you're simulating. First of all - it costs you a great deal of resources - a single browser instance requires at least one dedicated core to measure your load and rendering correctly and accurately. To save yourself a few hundred cores, you can have a few users using your web driver of choice and the rest of the users can perform the same actions over http protocol. This way you can assess the impact of your load on the rendering side without oversubscribing your resources.

More threads, higher waits

It's a common practice to record the user actions using some proxy service and record the wait times as pauses between the calls. As a result, you receive a single iteration consisting of actions performed by your virtual users. You'll most likely include the steps such as opening the website, log in steps, browsing some pages, placing some orders (for a shop) and a logout action but... this is unlikely what the users would do in real life and configuring your scenario like this may underutilize some resources unintentionally. When designing your test scenarios, it's important to review also the end user actions - this is where Realtime User Monitoring and Observability come in handy - you can check the real pauses the users make between the calls and what are the typical actions they invoke during their session. Remember that your end users don't necessarily focus solely on browsing your website - they probably have a few chat windows and tabs opened like you do right now. Some users don't even log out and rely on application to timeout the session . If your virtual users act too fast and in tight loops, you'll probably achieve the required throughput and volumes but you'll miss a very important factor - memory utilization. Number of concurrently opened sessions is one of the KPIs not explicitly mentioned by non-functional requirements you may receive and it's probably going to be the major difference you'll notice comparing memory utilization in production against your tests with comparable load. If not captured correctly, your live application may struggle to perform a successful garbage collection, which can be perceived as a memory leak and will block your application from further usage.

Focus on deliverables

When designing your tests, it is important to know what will be the final outcome of each test you execute and whether it brings any value to the SDLC. One of the misconceptions of performance testing I see is related to the purpose of performance tests. Most often performance testers focus on generating the load against the tested environment - casually checking the hardware utilization on the target machines and checking the error rates, along with the response times. While these are valuable metrics and are informative of the sate of the test, measurements of these does not really require any workforce and people looking at the graphs - this can be easily automated. The real value of the tests comes from actionable report containing the purpose of each test, risks, recommendations and findings - in most of the cases you won't find these information in the graphs monitoring your hardware. Not all details can be aggregated and grouped by your APM tool and you can expand your reporting by utilizing your performance testing tools - remember that the tools you're using give you a unique, end-user perspective of the application and it's not something you can capture from the application directly. I've covered the topic of data-oriented reporting in my first article.

Know your audience

Reports you'll generate would probably land on many peoples' desks and your role is to make sure they only get the data they require. From business perspective, it's important to present the data in a manner matching the non-functional-requirement document. Make sure your tests cover all the mentioned KPIs and critical services defined by your tests. This is important for further decision-making and incomplete data will result in delay of software delivery - most likely you'll have to repeat the tests again. Spare the technical details, unless it's an immediate blocker, or you have some SLAs breached - then it requires some explanation. Technical report however should be discussed with the developers and architects - they'll be most likely interested with hardware utilization, spikes in execution or a set of exceptions noticed during the tests. It's worth developing a single dashboard capturing the most critical resources so you can monitor it during the test and extract same data for reporting purposes. Most often, you'll end up with 2 report sets, but it does not mean you have to run 2 set of tests. Modern testing tools allow grouping and filtering of transactions from the generated reports and your task is to make sure you can distinguish the two during report extraction.









Roland Mees

315 ppm CO2 - 9,9 °C in 1957, 420 ppm CO2 - 11,8 °C in 2023 (= +1,9 °C)

2 年

This is a very good post, thank you Jakub (Kuba)!

Prasanna Deshpande

Performance Engineer

2 年

Nice and informative article!! Thanks for sharing!!

Lloyd Watts

AI / Machine Learning Researcher, Founder/CEO/Chief Scientist at Neocortix and Audience, Engineering Fellow at Femtosense, Caltech Ph.D.

2 年

Jakub Dering - Fantastic article. Great guidance.

Noemi K?cicka

Starszy Doradca w Santader Bank Polska S.A.

2 年

Good job! ??

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了