JMeter is my bottleneck!!!*
Apache Foundation's JMeter(TM)

JMeter is my bottleneck!!!*

Yes, you read that right - I'm guilty of using JMeter for high volume-low throughput testing. SLAs for most of my tests are counted in tens of thousands TPS, combined with the latency of sub-100ms. I use Apache Kafka and MQ protocols, along with some REST services to simulate end-user behaviour. There's very little frontend testing but that is coming soon too.

But JMeter is widely used by many organizations, why would you complain?

Maybe it's just me (and I'm really hope I'm not the only one here), but any time I try using features by the book, they almost always hit me hard. There are so many features, presumably supported and well documented - but never designed to work at scale. And by scale i mean < 50K TPS - which is not a crazy high volume comparing to some well known companies around.

JMeter is my choice mainly for flexibility - my tests are fairly complex and the communication protocols change pretty often. A batch of messages may get sent over MQ, with an acknowledgment on the 2nd queue waiting to be picked up, but the response for each message from the batch will be waiting on a kafka topic (or not, depends on the scenario). Of course - I have to measure every single one of them and assert the response after. I rely on my own, custom libraries with increased metrics - to make sure my measurements truly end on last-mile.

My tests are extremely modular, where every single code repetition has to be well justified to be accepted in the repository, because I believe the performance tests are no less code than any other piece of logic you include in your application. They're subject to code reviews and periodical refinement. They're also extremely verbose - it's a combination of operations from different countries, each of them having a different load profile, with special type activities at some odd hours. Thread group count (user paths, vuser scripts, you name it) for a single test script is currently at over 60, and it's still growing.

Now here are the things that hurt me the most and how I solved/plan to solve them

Distributed testing is broken

Current distributed test setup is pretty straightforward - there is a master node and slave nodes (load generators) - master node forwards the test scripts to its slaves and periodically collects the test results (sampler results), to drop them on a report file. The slaves are not aware of each other - they cannot communicate in any way (at least by default) and they're exact clones of each other - the differences can be only applied in the properties files, but they won't differ for example per test scenario.

With a fairly complex test scenario, I want certain users to use only a few machines only (sometimes even just one), and for some, I don't care where they're running as long as I can generate my throughput.

Having an exact clone for each machine creates another problem - they're EXACTLY the same. Now if everything in my test is a parameter - like the number of threads, throughput or number of entries allocated from the data file - I have to change those every time I change the number of hosts dedicated for the test - forget adding another host in runtime because you don't want to re-run everything and you can see your environment is not even sweating. Data distribution is easily mitigated with redis - but the thread/throughput counts I'll hate forever

But these are still functional problems - right? After roughly 20K TPS, we've noticed a significant drop of throughput on a distributed setup, which was a result of heavy polling from the master node on each of the slaves - due to massive amounts of data produced for the reports. That was the day we've moved completely out of the distributed setup and now each test/set of tests run as a separate process.

"And now you have to run with 20 TPS, after that a short spike of 200TPS"

If you're familiar with that phrase (or similar), you most likely know the pain of setting up your test scenario to replicate the exact load profile of your application. Setting up the pacing, and I mean accurate pacing, not just approximation is not a trivial task - in fact, with the default JMeter implementation, it's the most time-consuming part of my test execution. There are some brilliant plugins to help us achieve that, like https://jmeter-plugins.org/wiki/ThroughputShapingTimer/, but they also don't support the full scale testing and parametrization of your test - at scale. Ideally, once the test is written - the architects would prepare an excel with their NFRs, and it would get directly translated to a performance test scenario. JMeter is far from that - but hopefully the performance community contributes to the functional aspect of load profiling (or I will, next week)

Reports are broken

Don't get me wrong- they get the job done for 1-2 tests, but there is no way to embed them in any meaningful way, also forget data extraction or lookups (html files, really?. I don't see the point of shoveling the data between my reporting tool, excel and some confluence page, also seeing the preliminary results during the test execution helps saving lots of time, especially if you had misconfigured your test. After few (dozen) rounds of tests, we've dropped the built-in reports completely in favor of something else. InfluxDB? Existing plugins also limit you to some aggregated charts - and as soon as you need anything more advanced - you end up forwarding your logs to your log aggregator and create some custom reports on your own. Also - that's your only viable option for tests in the pipeline.

JMeter is a Java Application

No, really, this might be a shocker for you, so take a deep breath and think of every challenge you have tweaking your java web application. Now - when was the last time you've checked if you face similar issues with your JMeter tests? Long GC Pauses? Allocations outside of TLAB? Young/Old gen memory utilization? The good news is, because it's a Java application you can apply the same monitoring as you have for your application under test, and if you don't - you better start today and might learn something about accuracy of your tests.

It's free, right? Right?

Yes, you don't pay for the tool - but if you plan to use it at scale, it takes your time to set things up and manage your tests properly and to use it right it really takes a team of experts to fill in the blanks - that is test configuration, monitoring and reporting. What it really shines at is the concurrent code execution and measuring stuff.

Before you start using this old, rusty and ever-problematic tool, first read the list of the features it can do, but then ask yourself whether YOU can do it with it.

Arijit Chanda

Senior Member Of Technical Staff at Salesforce

11 个月

As open-source software, JMeter offers flexibility. The use case mentioned in your blog could have been approached differently, perhaps by creating a custom wrapper. It's important to shift the perspective from viewing a tool as a bottleneck to recognising it as part of the solution.

回复
Juan Manuel T.

Software Quality Assurance Manager | Sr. Software Performance Engineer | Quality First. Always.

11 个月

This will always be the trade-off of “free” vs licensed tools: you save money on licensing in exchange of providing your own technical expertise to leverage the tool. I’d say you start with the tools which can help you get set up, and once you know exactly what your pain points are you can move to a licensed tool that most perfectly tailors to your needs. Problem? Know you have to learn that tool, even when learning curve is shortened by your foundational knowledge. I feel the real problem is after so many years of it being an open source tool, community is not as active or interested in sharing their niche custom solutions. For example a customer I worked with had on the fly load generators spinning up in a highly configurable CI/CD distributed testing pipeline, have never seen their “framework” available in public, even when it is based on open source tools they just don’t share it. In any case, you’d have to be devops, developer, architect and tester to build your IDP with JMeter whereas if you pay a licensed product you outsource those hats and are just a performance engineer. In the end nothing is free, you just pay with a different currency.

To process (manage) the large volume of files and the number of calls per second, I think I will use the plugin "Rotating JTL Listener is a Listener that allows writing sequence of JTL files, limited by number of samples per file." https://github.com/Blazemeter/jmeter-bzm-plugins/blob/master/rotating-listener/RotatingListener.md Then i will not have a single large file but several smaller ones. Then I would calculate on a granularity of 1 sec if the test duration is at least 5min or a granularity of 0.1 sec if the test duration is less than 2min. I will get a consolidated result over 1 sec of the number of calls on the same url, the min, max, average time and the 90 percentile. And I will make the graphs and result table with the granularity of 1sec or 0.1sec. Some results will be statistically wrong like the 90 percentiles but that's not a big deal.

For 50 000 requests per second and 10GB file result, there are not many tools that can manage and analyze it. Yes Apache JMeter standard does not know how to manage these 10 GB volumes natively.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了