Know Your Transports - Performance testing guide for integrated systems
By Jakub (Kuba) Dering

Know Your Transports - Performance testing guide for integrated systems

Disclaimer: This article is for performance testers testing applications integrated via queues and topics, if you're using http transports, your performance testing tool is probably doing it right already.

Introduction

While supporting performance testers from various projects as a part of Centre Of Excellence and during interviews I see various implementations of tests using queues and topics as a main transport channel. Producer-Consumer relation is not an easy subject and becomes neglected for a sake of faster time-to-test times and in the process, testers lose important information in observability and end-to-end measurements. With this article I'll explain few approaches to kafka and mq message processing to get the most of the reporting and I hope you'll consider these next time you prepare a test using queues and topics.

Problem Description

Typical process for both queue and topic transports consist of 2 steps, sending a request (producer) to one queue and getting a response back from another (consumer), when required. Unlike an HTTP call, where the user receives the response on the same socket it sent the request to, without closing the connection, the two steps can be separated and it's not required for the same thread sending the message to pick up a corresponding response. But it does require to have two separate connections to designated queues. Standard implementation would look like this:

Brak alternatywnego tekstu dla tego zdj?cia

The application is responsible to pick up the messages from the request queue and put a response on the reply queue as soon as it has finished processing and the response is ready. With this diagram, we can already determine what do we count as end-to-end response time. This will become important for later discussion. Note that the response times from the client is not fully visible by the application, and it can only measure it since the message was put successfully to a request queue, until it has put a response on the reply queue. Put and get times from the customer side are completely invisible to the application you're testing. If not measured in the right place, you'll see a difference as soon as the customers start using your application, but then it may be too late to address issues arising.

Brak alternatywnego tekstu dla tego zdj?cia

Common test implementations

Fire-and-forget

One of the test implementation I've seen, and it was not an isolated case, was the usage of a virtualized service, listening to the messages on reply queue and a limited number of producers, constantly producing stable throughput.

While the errors were measured on the application side (with verbose logging turned on), there was no assertion done on the response message, no counting could be done between the messages sent and received and definitely there was no way to measure the response times end-to-end. The risks of this implementation are (to name a few):

  • missing replies
  • no assurance of message delivery
  • response times could be way higher than reported by the application in case of queue manager saturation
  • responses could be malformed, or presenting different state than the application

Separate producers from consumers

The scenario is simple, you create a group of producers and a group of consumers. Producers generate the messages and send them to response queue, and consumers pick them on the other side. This is the most common implementation I've seen and a default behavior for built-in libraries. It gives you flexibility on loading the right throughput against the application as the sending threads do not wait for the response and the pacing times are relevant to iteration duration and the consumers do the validation and assertion of the responses. What it's missing however, is the correlation between a request sent and response received and because of that - you can't measure the end-to-end time.

Dual-connection thread

This is the most common configuration I'm using for mq communication. At the beginning of the thread I establish 2 connections, one for the request queue and another one for the reply queue. The thread runs in a loop, sending a request and immediately locking the threads, trying to capture the response on the reply queue. In pseudo-code, it looks like this:

//connect to your queue manager
MQConnection connection = MQ.connect()

//create queue connections from upfront
Queue putQueue = connection.accessQueue("request queue name")
Queue getQueue = connection.accessQueue("reply queue name")
timeout = 1000 // timeout can be acquired from client systems

//this is your thread loop (or user action)
while True:
     startTransaction()
     messageId = generateUniqueID()
     putQueue.sendMessage("This is a sample message", messageId)
     response = getQueue.receiveMessage(messageId, timeout)
     if response: 
          success = assertResponse(response)
          endTransaction(success)
     else:
          endTransaction(failure)
          break
//remember to close your connections
putQueue.close()
getQueue.close()
connection.close()     
             


The dual-connection thread guarantees precise measurements of end-to-end timings and lets you validate the response in the context of the message sent. without switching the thread context or forwarding the message anywhere - it also costs time. The end-to-end can be now reported directly from the load generator - representing the response times your clients can expect from their side. Of course no solution is perfect and there are two drawbacks comparing to the examples mentioned above:

  • Number of active connections can be excessive, comparing to customer connections. This is typically not a problem for a queue manager but worth remembering if you compare active connections with the queue manager between production systems and your tests
  • Pacing of your load becomes highly dependent on response times from application and you'll find yourself tweaking your thread and connection config more often, depending on average response times.

Inter-thread communication

If you're required to split the producers and consumers, like in case of kafka transports, it takes some more effort to get a full end-to-end measurements. The base for this approach is described in section "Separate producers from consumers", but we need to make sure the producers and consumers can communicate with each other, so we can correlate the requests with replies. The scripts start with defining common variables, accessible by both thread groups. They need to be defined from upfront and since you're accessing the same data blocks, you need to make sure the data structures you're using are thread-safe, to avoid data races:


// a set of message IDs your consumer would know to act upon
ConcurrentSet expectedResponses
// map of received responses from the topic, 
ConcurrentMap<StringID, String> receivedResponses
        


Now, the producer code. It's similar in its form to the dual-connection thread- we're still locking the thread until we receive the response, only this time - the response is delivered in the map containing the received responses:


//initialization of producer
Producer producer()
timeout = 1000

while True:
    messageId = generateUniqueID()
    startTransaction()
    //start timestamp to calculate timeout
    startTime = time.now()
    producer.sendMessage("Sample Test message", messageID)
   
    // notifying the consumer group we're expecting a message with our ID
    expectedResponses.add(messageId)
    
    //waiting for response until we timeout
    while (time.now - startTime < timeout):
          //checking if the response has been received
          response =receivedResponses.get(messageID);
          if response:
              success = assertResponse(response)
              endTransaction(success)
              // remember to remove the entries from common collections to               
              // avoid data leaks
              receivedResponses.remove(messageId)
              startNewIteration()
    //notify consumer group not to wait anymore for this message
    expectedResponses.remove(messageID)
    endTransaction(failed)
    

        

The last piece of the puzzle is the consumer code. Consumer polls all the messages from the topic since its last polling, but we limit the messages we plan to process to entries of IDs in "expected responses"

Consumer consumer()



while True:
    // consumer polls the messages from the topic. Polling returns a list 
    // of messages
    Message[] responses = consumer.poll()
    for response in responses:
         if response.id is in expectedResponses:
              receivedResponses.put(response.id, response.payload)
              expectedResponses.remove(response.id)
            

And... there you go! You can modify the number of threads for producers and consumers for the most performance and you can measure the times end-to-end now, and correlate the request with the response on runtime.

Common mistakes in tests

  • Connections established too often - mq and kafka connections are typically established at the startup of the application and are held opened for its entire runtime. Make sure you simulate the same behavior during your tests. Re-opening the same connections can trigger a performance penalty on these components
  • Lack of transaction measurements - By using an API, you have the golden opportunity to measure the put and get times as a separate transaction. You don't have to include these numbers in your final report, but they'll help you determine level of saturation of the transport components. If you see a delay from your load generator, there's a big chance your application faces the same delay on the other side.

I hope this helps you understand the importance of the right test architecture. The more accurate your tests are, the less time you'll spend troubleshooting problems on production.

RaghunathReddy Karthanaparthi

Senior Specialist - Quality Engineering

2 年

Thanks for sharing the knowledge ??

回复
Chintan Vyas

Vice President- Non Functional Testing Sr. Lead at NorthernTrust

2 年

Nice read!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了