Know Your Transports - Performance testing guide for integrated systems
Disclaimer: This article is for performance testers testing applications integrated via queues and topics, if you're using http transports, your performance testing tool is probably doing it right already.
Introduction
While supporting performance testers from various projects as a part of Centre Of Excellence and during interviews I see various implementations of tests using queues and topics as a main transport channel. Producer-Consumer relation is not an easy subject and becomes neglected for a sake of faster time-to-test times and in the process, testers lose important information in observability and end-to-end measurements. With this article I'll explain few approaches to kafka and mq message processing to get the most of the reporting and I hope you'll consider these next time you prepare a test using queues and topics.
Problem Description
Typical process for both queue and topic transports consist of 2 steps, sending a request (producer) to one queue and getting a response back from another (consumer), when required. Unlike an HTTP call, where the user receives the response on the same socket it sent the request to, without closing the connection, the two steps can be separated and it's not required for the same thread sending the message to pick up a corresponding response. But it does require to have two separate connections to designated queues. Standard implementation would look like this:
The application is responsible to pick up the messages from the request queue and put a response on the reply queue as soon as it has finished processing and the response is ready. With this diagram, we can already determine what do we count as end-to-end response time. This will become important for later discussion. Note that the response times from the client is not fully visible by the application, and it can only measure it since the message was put successfully to a request queue, until it has put a response on the reply queue. Put and get times from the customer side are completely invisible to the application you're testing. If not measured in the right place, you'll see a difference as soon as the customers start using your application, but then it may be too late to address issues arising.
Common test implementations
Fire-and-forget
One of the test implementation I've seen, and it was not an isolated case, was the usage of a virtualized service, listening to the messages on reply queue and a limited number of producers, constantly producing stable throughput.
While the errors were measured on the application side (with verbose logging turned on), there was no assertion done on the response message, no counting could be done between the messages sent and received and definitely there was no way to measure the response times end-to-end. The risks of this implementation are (to name a few):
Separate producers from consumers
The scenario is simple, you create a group of producers and a group of consumers. Producers generate the messages and send them to response queue, and consumers pick them on the other side. This is the most common implementation I've seen and a default behavior for built-in libraries. It gives you flexibility on loading the right throughput against the application as the sending threads do not wait for the response and the pacing times are relevant to iteration duration and the consumers do the validation and assertion of the responses. What it's missing however, is the correlation between a request sent and response received and because of that - you can't measure the end-to-end time.
Dual-connection thread
This is the most common configuration I'm using for mq communication. At the beginning of the thread I establish 2 connections, one for the request queue and another one for the reply queue. The thread runs in a loop, sending a request and immediately locking the threads, trying to capture the response on the reply queue. In pseudo-code, it looks like this:
领英推荐
//connect to your queue manager
MQConnection connection = MQ.connect()
//create queue connections from upfront
Queue putQueue = connection.accessQueue("request queue name")
Queue getQueue = connection.accessQueue("reply queue name")
timeout = 1000 // timeout can be acquired from client systems
//this is your thread loop (or user action)
while True:
startTransaction()
messageId = generateUniqueID()
putQueue.sendMessage("This is a sample message", messageId)
response = getQueue.receiveMessage(messageId, timeout)
if response:
success = assertResponse(response)
endTransaction(success)
else:
endTransaction(failure)
break
//remember to close your connections
putQueue.close()
getQueue.close()
connection.close()
The dual-connection thread guarantees precise measurements of end-to-end timings and lets you validate the response in the context of the message sent. without switching the thread context or forwarding the message anywhere - it also costs time. The end-to-end can be now reported directly from the load generator - representing the response times your clients can expect from their side. Of course no solution is perfect and there are two drawbacks comparing to the examples mentioned above:
Inter-thread communication
If you're required to split the producers and consumers, like in case of kafka transports, it takes some more effort to get a full end-to-end measurements. The base for this approach is described in section "Separate producers from consumers", but we need to make sure the producers and consumers can communicate with each other, so we can correlate the requests with replies. The scripts start with defining common variables, accessible by both thread groups. They need to be defined from upfront and since you're accessing the same data blocks, you need to make sure the data structures you're using are thread-safe, to avoid data races:
// a set of message IDs your consumer would know to act upon
ConcurrentSet expectedResponses
// map of received responses from the topic,
ConcurrentMap<StringID, String> receivedResponses
Now, the producer code. It's similar in its form to the dual-connection thread- we're still locking the thread until we receive the response, only this time - the response is delivered in the map containing the received responses:
//initialization of producer
Producer producer()
timeout = 1000
while True:
messageId = generateUniqueID()
startTransaction()
//start timestamp to calculate timeout
startTime = time.now()
producer.sendMessage("Sample Test message", messageID)
// notifying the consumer group we're expecting a message with our ID
expectedResponses.add(messageId)
//waiting for response until we timeout
while (time.now - startTime < timeout):
//checking if the response has been received
response =receivedResponses.get(messageID);
if response:
success = assertResponse(response)
endTransaction(success)
// remember to remove the entries from common collections to
// avoid data leaks
receivedResponses.remove(messageId)
startNewIteration()
//notify consumer group not to wait anymore for this message
expectedResponses.remove(messageID)
endTransaction(failed)
The last piece of the puzzle is the consumer code. Consumer polls all the messages from the topic since its last polling, but we limit the messages we plan to process to entries of IDs in "expected responses"
Consumer consumer()
while True:
// consumer polls the messages from the topic. Polling returns a list
// of messages
Message[] responses = consumer.poll()
for response in responses:
if response.id is in expectedResponses:
receivedResponses.put(response.id, response.payload)
expectedResponses.remove(response.id)
And... there you go! You can modify the number of threads for producers and consumers for the most performance and you can measure the times end-to-end now, and correlate the request with the response on runtime.
Common mistakes in tests
I hope this helps you understand the importance of the right test architecture. The more accurate your tests are, the less time you'll spend troubleshooting problems on production.
Senior Specialist - Quality Engineering
2 年Thanks for sharing the knowledge ??
Vice President- Non Functional Testing Sr. Lead at NorthernTrust
2 年Nice read!