Speedscale Review - Are load tests relics from the past?

Speedscale Review - Are load tests relics from the past?

Every once in a while I get approached by a company targeting the domain I create content in: Performance Testing. This time, it’s Speedscale. They have a product that proposes flipping the entire performance engineering process upside-down, which definitely caught my attention. Just a heads up : this article is not sponsored by anyone. I’ve spent over a month trying out this tool with various technology stacks and applications, so here’s my review. Also before you take my word for whatever I’m saying, they have a free trial you can test for yourself - which I strongly encourage you to do.?

But before we start the review, let’s talk about the problems they’re trying to solve and how they are tackled right now by performance testing teams. Have you ever wondered why all the load testing tools rely heavily on correlation, parameterization and load shaping?


Test Planning

The current process of test development starts with analysis and prediction of the upcoming workload. We analyse the usage patterns of all the services, determine any seasonality and peaks that occur on the existing system. We talk to the business stakeholders to adjust these numbers by any upcoming events and anticipated changes in the user base, launch of new services and anything that is worth testing or the business have problems with.

It’s also the right moment to ask about any challenges the organisation faces.

Then we proceed with test planning and forge the gathered data and insights into a test plan. We choose the injection points, define services that will be tested, all of that will serve as our test scope. It includes the components we’re testing, test strategy, test scenarios? injection tools, load profiles, volumes, data and environment requirements, test criteria, expected results. All of that to make sure we focus on the critical path and cover the main aspects of the business expectations.?

After we have determined our test scope, we proceed with scripts development. For web applications, the script development usually comes down to recording your browser session via a proxy and then correlating and parameterising your test to make sure each virtual user represents a separate end-user entering your application. We add some randomness to the test scripts to make sure all users are not exactly equal. For example if most of your end-users are just browsing your page and only 5% are actually buying something, you’d want that reflected in your test script as well so you don’t end up over-saturating certain services. User pauses are also extremely important. Real users will take their time to read stuff on their screen before taking further actions and this will have an impact on your system utilisation even if the user doesn’t do anything at this very moment. As you can see, the end-user simulation can get complex very quickly.?

For APIs, the test scenarios are usually easier to develop. Web APIs typically don’t have a context of the user session - the most common API, REST holds information about your object states and as long as the object exists and the user is authorised to modify it - it can do so with a single atomic request. That’s why when it comes to API testing, the load profiles are way easier to get right. You just need to make sure you know the address or an ID of specific object and you can successfully simulate a correct saturation of your system with a multi-threaded implementation?

Benefits and necessities of the current process

A friend of mine asked me recently why can’t we just replay all the load from production. This sounds like a sensible thing to do at a first glance, but here’s what we have always considered over the years of performance engineering and why we develop our load tests the way we do:

  • Security

Technically you could just record all you network with Wireshark, store all your packets in a file and re-send them anywhere you want . There will be however a few issues you’ll face. First one is the encryption. On an encrypted network, each packet is encrypted with a specific public and private key. This encryption happens with every request so just re-sending these packets will result in a lot of garbage being sent in.??

But you’ve got all the certificates required and you can extract the payload - what happens next??

Authentication?

Modern authentication architectures are designed to keep the user authorised for the shortest time possible. The authentication usually happens over some cookie or header - which would expire after some short period. So for every request you send, you’ll need to make sure the user is authorised.?

Data Ownership

Systems also make sure you can only modify the data you own so if you use some user credentials and try to modify an object that doesn’t belong to a user, you’ll most likely get an error. That’s why most of the load tests either persist the data mapped to specific user, or we make these tests completely idempotent and extract the IDs during the runtime.

CRSF protection?

To protect you from various attacks, modern websites also put various tokens and values with some expiration dates. These would also have to be dynamically extracted from between the requests.

  • Resources

Most applications under test are browser-based. And browsers tend to handle a lot of stuff for users. HTTP protocol is stateless and for your convenience all the user data, authentication, flow continuity is gracefully handled by your browser. As soon as you try to simulate a large user base, you’ll know you have to limit the resource usage per user. That’s why our scripts typically keep the correlation logic to a bare-minimum. Some tools are better at it, some not - but that’s the reason why we spend so much time on test development. Business logic is often pretty straightforward to code. All this effort to limit your per user resource from a multi-core, ram-hungry browser, to a single thread, that spends its free time mostly on waiting? - but that’s a story for another day

  • Simple Test Scalability

Now that we have simplified the application user to the most prime form, shaping the test workload is as straightforward as at gets. Using pacing and arrival rates we can invoke desired concurrency and saturation on the system under test. Tester defines how many threads are required for a specific scenario and we can measure the response times of particular transactions from the end-user’s perspective while the system is saturated to a desired level.

Very often we’re testing the application for some future workload profiles that our production system hasn’t seen yet. They may even include services we haven’t developed yet

  • Predicability

Since you have designed your test scripts, you know exactly what to expect for your response. The intention of the request is embedded in your test scenario. This means you can put many advanced assertions given the context of your requests and the comparison of two user iterations is as easy as comparing two sticks. Since you have a full control over the virtual user, you don’t? need to spend much time on if the request had failed during your test. It’s usually a question of why.

  • Test Simplicity

If your load tests are limited to a few general, non-complex scripts, they’re easier to maintain. The test development process isn’t free and as your application evolves, your tests have to adapt to these changes. This means you have to keep them as simple as possible to avoid re-engineering your whole test suite. Modern load testing tools let you automate the correlation process so test updates come down to re-recording your user session and most of the parameters will be re-populated. But it still requires effort to validate your test scripts and very few companies actually validate these scripts until they need to execute them, and that’s usually late in the SDLC stage. If you find yourself updating your scripts while everyone is waiting for your test results - then you’re blocking the pipeline

So as you can see, it’s a lot of stuff to look after to have a decent load test plan with working test scripts. And this doesn’t even touch on environment availability, analysis, planning etc.?



Speedscale Review

Now, what if you COULD just record your production traffic and run it on your performance or pre-production environment?

Remember the process I told you about??


- First, we analyze the usage patterns

- We create a test plan

- Simplify the load usage by creating the test scripts and test scenarios

- Run the test

- Measure the outcomes

- Troubleshoot any problems?

- Deliver the report with recommendations

If you already have a load profile that actually represents your production states, you start with selecting the busy period of your application and just run the same on your integration environment. You already know the load profile is representative of your actual product usage and within minutes you got yourself a fully working regression test you can run anywhere.?

Alright, maybe it’s not that easy and it takes a bit of prep time. You still have to define a few correlations and parameters here and there but you don’t have to do it every time you want to run a new test. You just do it per endpoint - but these rules are applicable for each environment you have - from small development to production.?

The way it works is when you install Speedscale on your Kubernetes environment, it installs sidecars next to your application pods. These sidecars record the traffic going in and out of your application and these requests get persisted on your cloud. The magic of correlation and parameterization happens during the transformation stage. Once you replay your traffic, Speedscale replaces all the relevant and unique fields with the values specified in your transformation stages.?





Let’s look at the main app.


Services Tab

The service tab shows you the instrumented applications with inbound and outbound traffic. Here you can see the throughput of these transactions for each service you have instrumented.

Here I have a custom implementation of? the JPetstore with some artificial traffic already produced. You can see that Speedscale had detected my backends and incoming traffic to the application. It also detected my MySQL database.?


JPetStore Service Detected



There’s my peak load I want to reproduce - I select the time range, hit replay and select all the test parameters I’m interested in.?


Saving a snapshot

If I want to re-use a specific workload often, I can create a snapshot and persist it for a longer time. For now I just want to replay whatever was executed in this environment, without any mocks, and on a 1:1 scale. I chose the target environment, hit ?play” and the test should start.


Replaying the traffic from user interface

You can monitor the latency and assertion pass rate. By default, Speedscale runs assertions on all the standard fields - they are customisable, but very generic.



A nice add-on is that you get a peek into the payload and response of every single request. I suppose that might be taking up some storage on your drive if you really need large scale tests. For this case, you can select a low data mode, which only stores the request data and status codes cutting down your space requirement. The live report also shows you the statistics of hardware usage and latency per endpoint. That’s the bare minimum to validate the test state in case you want to stop it.

New service testing

That’s for regression testing - but what if the service is under development and you want to assess its scalability in a larger environment? Well, similar to the proxy recording in a traditional load testing tool, you open your application page, and execute your test scenario.?

My test scenario is to order 101 Spotted Dalmatians for a certain fashion designer.


I have already instrumented my Speedscale app to group my request by Session ID so I can easily turn it into a snapshot.? Each snapshot consists of response-requests pairs. This data is used later for a variety of functionalities we’ll talk about in a while.

After the test is completed, the analysis module runs all assertions on applicable transactions and you can review them easily.?



Transforms

Transforms Configured for my JPetstore service

It took me a while to configure my Speedscale operator before I could actually fully record my session the way I’ve shown you. The correlation and parameterization you may be familiar with, is replaced with transforms. Transformations are applied both to your requests and responses. You get to replace your payload using various functions, the list includes the most common data type selectors like JSON, XML and more generic ones like Regex or Substring, there’s support for payload compression, loading and storing local variables. It’s the place you can also redact some sensitive data in case you want to move your production load to a lower-tier environment.

The interesting transforms are the ones labelled ?Smart”. I know the AI fever is everywhere and suddenly any tool without AI is worthless but this implementation makes sense to me. First of all - it’s none of those LLMs or generative AI.?

The Smart Replace function creates an index of your request-response pairs and performs a lookup on recognized values. This means, if your data set has a low or finite cardinality, chances are you won’t even have to look at how your requests are built and how your mock would respond. That takes away the need for payload parameterization. I haven’t tried this feature much but the demos and showcases of implementation look very promising. What I did try more extensively is parameterization with csv file content and it works just like it would in other load testing tools.All these transform functions allow you to shape your requests and cover some basic scenarios. The main limitation of this approach is you can’t really ?develop” a complex scenario, as you’ll see

  • there are no for or while loops
  • you can’t wait for some element to be available
  • each request becomes ?a” request
  • Assertions can be made for successful calls but you can’t really assert a transaction success in the context of your virtual user. But it’s not like most of you do it anyways.

So the main question is, if you’re only limited to simple scenarios, what is speedscale really suitable for??

A.P.I.s?

Web services and backends are typically atomic in nature, the context of the session rarely exists and each call is a modification of the object state, rarely subject to locks. If you’ve ever designed a test for an API, you’ll know its complexity is way smaller and most of the requirements really come down to throughput and latency.

But do we always have load tests tests for the APIs? Do we break down our tests per component and load-test them one by one?


Sample service architecture


? In most cases, the answer is no. To optimize the most expensive load testing resource - your time, load tests are usually structured to hit the application from one point. This way you can have the biggest test coverage with minimal effort and time. You don’t have to maintain and execute all the tests at once.?

This seems like a good idea until once or twice you’ll learn you’ve been testing too late in the cycle and there’s no more time to fix some performance defects before your release and it’s time to introduce the ?shift left” which essentially means testing at the earliest stages of the development cycle. I’m not sure if you’ve ever seen an integrated environment on an ?early stage” but it’s rarely functioning. Features are not ready - and I’m not talking about performance, infrastructure undergoes some major changes, there are some components that require maintenance, etc.

This is where service mocking comes into place.?



Mocked services


Even if a functionality is not ready, you can easily replace it with a mocked service until it’s fixed. Using the same trained model, you can train your mock to mimic your application logic and you can continue with your main test, while concurrently testing out and iterating any fixes you might need for a component that has been failing during your test. That’s extremely important during the early stages of the development as you might expect that the biggest challenge of shifting left is dependency and availability management. Not running the tests. The more complex your architecture is, the more challenges you’re going to have managing that, while the expectations for you will remain the same: getting the test results as fast as possible.



Supported technology stacks

List of supported technology stacks

The list of supported technologies for both mocking and traffic replay is really impressive. From the list here I believe most of your micro services will be supported out of the box.

k8s native

You may all know the fuss of setting up your load generators. Some vendors force you to use their own cloud setup, with all the security requirements. Some load generators are pretty static and the teams would fight over these. Some people like to run the tests from their laptops. I don’t judge.?

Speedscale runs your load tests inside your k8s cluster. New pods are created in place of your injection point. This, in theory, should mimic your network latency pretty well, that is, if you’re not simulating the end users.?

Other use cases

That’s more or less it about standard load testing, but what are the other potential use cases for the traffic replay feature ? Here are the ones I thought of, that would normally take up time - where the 1-1 traffic replication is desired and the development of these scenarios takes a lot of time.

Finding a session and creating a snapshot

When running manual tests, how often have you had to specify steps to reproduce to invoke a specific bug? It usually involves a tester finding a bug, reproducing it manually again to make sure the issue is reproducible - writing down steps to reproduce and then the developer need to follow the same steps, usually using different tooling in order to replicate the issue in another environment. That’s a lot of time required for both parties. What if you could just filter out the tester’s session, trace all the calls that were executed and replay the same on a local environment??

Creating a snapshot from a high load

Or when you have to investigate a problem from your production system and you wish you had it running constantly on debug logging level because you can’t pinpoint the root cause easily. Now you can literally set your load testing environment to debug and replay the same load there. Development of a test closely mimicking the production load or certain conditions can be really a challenging task and adding to the fact that these are not executed every day, it might be simply not worth it. In that case, just select the problematic period, capture all the traffic and replay it using your predefined transforms.

What could be improved

If you’d ask me what the tool is lacking, I’d probably point out the limitations in a web application traffic replication. Right now it’s clearly designed for APIs and backend testing and to make it a full-fledged load testing tool, the support for more complex test scenarios is a must. I would label Speedscale more as a complementary solution for your existing load test. If your goal is to increase your test coverage or vectors of load injection, this might be a good fit for you, and even faster to implement if your load test already utilizes this vector indirectly.

The reporting and analysis part isn’t the most verbose. It’s good for initial test assertion but that’s about it. I also don’t like the idea of reporting the assertion pass rate. Normally a failed transaction is a failed transaction and I don’t care if the headers look alright to me if the user has been served an error. The assertion per transaction should be rather a binary result, and not 80% OK

?Because you’re constrained by generic assertions, you may find yourself troubleshooting problems you wouldn’t normally have with your load testing tool. With standard load tests, the only problems you’d face were from your test design or the application. If you choose to replay your production load, you’ll also have to learn to distinguish production problems as well as potential transform and Smart replacement issues might occur here and there. Clearly this application is designed for companies and teams with extremely strong analytical skills so if you don’t have these, it might not be a good fit for you. You should expect higher error rates compared to your standard load tests and I guess it’s expected of you to accommodate a little bit of chaos. If you have ever troubleshooted and inspected production load, you’ll know what I mean.?

Outro

Before you go, I just wanted to remind you, I have only tested this tool on my private lab, at a very low scale. While I try to be as accurate as I can, I can also make mistakes and you really shouldn’t trust random people on Linkedin. Folks at Speedscale let you test your app for 30 days for free so you can see yourself. I think the idea of traffic replay might be useful and innovative as it can take away the pain of shaping your load every time - it seems great for problem replication and rapid reaction to expected or unexpected outages. Most importantly, it may let you focus on solving performance problems sooner.

Stéphane Mader

Senior-PM(NeoLoad)@Tricentis - Associate@TimeForThePlanet

3 个月

Great evaluation, thank you! And, yes, have been looking at SpeedScale as well and thinking that the "transformation" for complex non-API scenarios is where SpeedScale reaches its limits. Thibaud Bussière Bruno Duval Julian Ho

Rebecca Clinard

Performance Engineering | Observability | DataScience | OpenTelemetry | Technical Evangelist

3 个月

Very thorough!! I had checked out this company last year. The name SpeedScale is misleading, no? It implies the solution would either help to scale applications or improve their speed. It does not. It’s a testing/scripting/replay framework to mimic production traffic.

Andrew Lee

Just a Performance Engineer with a bit of Cost Optimisation and Observability thrown in

3 个月

Very interesting, but a point solution for in fact one of the relatively easier performance test cases of API testing? Or am I missing something

Dirk Loosen

Director at IT Ecology

3 个月

Thanks Jakub Dering , a very worthwhile read. I will check out the tool too.

Matthieu Leroux-Huet

Performance Engineer | Solving Performance Testing at scale

3 个月

That's going straight to my reading list. Thanks.

要查看或添加评论,请登录

Jakub Dering的更多文章

社区洞察

其他会员也浏览了