Navigating the Depths of Spring Boot Transactions: Unraveling Anti-Patterns and a Real-World Showcase

Navigating the Depths of Spring Boot Transactions: Unraveling Anti-Patterns and a Real-World Showcase

When reflecting on software engineering careers, one common challenge that developers often encounter is the perplexing issue of `Connection is not available, Request timed out after 30000ms.'

Our initial instinct often leads us to believe that the database is no longer accessible, prompting us to swiftly contact our #DevOps or #SRE colleagues for assistance. However, after their thorough investigation, they often return with the verdict that the database is operating as expected, and there are no platform-related issues at hand.

Subsequently, the situation can become quite daunting, and it might even lead to some humorous hair-pulling moments :D

Let's delve into two real-world examples that shed light on the challenges Spring Boot applications may face when establishing connections with #Amazon #Redshift or #PostgreSQL and how to take appropriate debugging steps. In one scenario, the culprit is overnight #ETL jobs that temporarily sever all connections to free up the connection pool. Moments later, these connections are once again made available.

Another common situation involves long-running queries or other operations within the same transaction, often surpassing the 30-second threshold. This presents a formidable challenge.

Spring Boot and Redshift Connection Woes

Upon investigating the issue, it became evident that it is failing to release connections within the Spring Boot application. The connection pool, configured with #HikariConfig, seemed to be retaining these connections inexplicably. Our application was configured with two data sources, one for #PostgreSQL and the other for the Redshift database.

Our initial scrutiny focused on the Hikari pool configuration, which appeared as follows:

  • poolConfig.minimumIdle = 50
  • poolConfig.maximumPoolSize = 50
  • poolConfig.leakDetectionThreshold = 55000

This identical configuration was applied to both the PostgreSQL and Redshift data sources.

In an attempt to replicate the issue with the PostgreSQL database, we intentionally stopped the database instance in our development environment. It became evident that the Spring Boot app was unable to establish a connection during this period, but as soon as the database was up and running again, we could successfully query the data from PostgreSQL.

It was apparent that the ETL job, responsible for terminating DB connections, executed its task in a unique manner, leading to the issue on our end.

Navigating Amazon EC2 Checks

Amazon EC2 performs automated checks on every running EC2 instance, assessing both hardware and software conditions. These status checks are inherent to EC2 and cannot be disabled or removed.

Despite these checks, our app's issues remained undetected.

As a workaround, given our inability to modify the ETL configuration and the permission to reboot the app during the night, we leveraged #AWS features. Specifically, for those still utilizing the EC2 approach with a load balancer and Autoscaling group, you may be familiar with Load Balancer (ELB) health checks within the Target groups. Enabling ELB health checks is a potential solution.

Of course, your Spring Boot app should be configured with the #actuator. In #Terraform, you can implement this with the following attribute:

module "api_autoscaling" {
  ...
  health_check_type = "ELB"
  ...
}        

This configuration enables both ELB and EC2 health checks, ensuring that Autoscaling policies facilitate swift reboots when necessary.

Spring Boot Long-Running Transactions

To delve into the realm of long-running transactions, it's crucial to establish robust #monitoring and #observability tools. These tools enable us to delve deep into the transaction flow, providing valuable insights for troubleshooting and optimizing performance. In the context of distributed systems, incorporating #OpenTelemetry becomes indispensable, allowing us to monitor and observe the entire system's behavior. While I used #NewRelic for this showcase, the market offers an array of exceptional tools, and you might have your own favorites.

Within the New Relic platform, the "Monitor" navigation panel features a "Transactions" view, a treasure trove of information. Here, you can access a list of all your transactions, thoughtfully sorted by various metrics:

  • Slowest average response time
  • Most time-consuming
  • Highest error rate
  • and more

As a diligent software engineer, it's wise to incorporate a morning routine of checking these transactions, employing different sorting options to proactively identify potential issues before they escalate into production incidents.

When you delve into the details of specific transactions, you'll uncover a wealth of information about their historical performance, including:

  • Average response time
  • Median response time
  • 95th percentile response time
  • Apdex score
  • Average error rate
  • Average throughput
  • and more

It's important to note that these calculations are context-sensitive, depending on the time frame you select. New Relic provides a range of time frame options, from as short as 30 minutes to custom intervals. Keep in mind that averages might sometimes yield false positives based on the chosen time frame.

For me, the most vital section of the transaction information page is the "Transaction Details" block. Here, you can dive deep into each transaction, examining its performance history. This section is a gold mine for software engineers, as it provides a comprehensive overview, from transaction summaries to intricate database queries. You can uncover essential details, identify slow components, measure segment execution times in milliseconds, and more.

When you navigate to the "Trace Details" section, you'll encounter something akin to this:

Trace Details with the professionally blurred Method's name

In the provided screenshot, New Relic grants us the ability to delve deep into Spring Boot traces, where we can scrutinize all the intricate details. Unfortunately, the screenshot I've shared might be incomplete (my apologies for that), but it does offer a glimpse of the journey. The starting point in the Spring Boot trace, not visible in the screenshot, is the REST endpoint request definition, such as /api/endpoint-name(GET). This is followed by the step: Java/javax.servlet.ServletRequestListener/requestInitialized.

I won't delve too deeply into filters in this discussion and will bypass the steps leading up to the DispatcherServlet/service phase. In the world of Spring, all incoming requests are channeled through a single servlet known as the DispatcherServlet (front controller). The front controller pattern is fundamental in web application development, where one servlet receives all incoming requests and dispatches them to various components within the application.

It's essential to grasp that the DispatcherServlet serves as the "gateway" to all other controllers and endpoints. Also, keep in mind that interceptors like Spring Security are typically invoked before the DispatcherServlet handles the request.

Now, for the most intriguing part (cue the drumroll). As we can observe, the next step involves establishing a database connection even before reaching our controller. This is indicated by the line Java/com.zaxxer.hikari.HikariDataSource/getConnection, which occurs prior to the call to Java/com.xxxx.Controller/get.

When examining the code, don't be surprised if you come across a @Transactional annotation attached to the controller's method itself, as shown in the example below:

@RequestController
@RequestMapping("/api")
class Controller() {
  
  @GetMapping("/endpoint-name")
  @Transactional(readOnly = true)
  fun getSomething(): ResponseObject {}
}        

While it's technically possible to make the controller @Transactional, it's a common recommendation to limit transactional behavior to the service layer. The persistence layer should generally not be transactional.

This separation of concerns is essential. The controller's primary role is to handle incoming requests, extract parameter data, invoke one or more service methods, and combine the results into a response sent back to the client.

The rationale behind avoiding transactional controllers is not merely a matter of technical feasibility. By making the controller transactional, you effectively tie up the database connection until the end of the transaction. In highly distributed applications with a significant volume of calls per minute, this approach can lead to connection exhaustion. Simply increasing the number of connections is not a sustainable solution, as it introduces its own set of challenges, a topic worthy of its own discussion.

This practice of making controllers transactional might be considered an anti-pattern and is generally best avoided. We typically aim for transactions to complete within a few hundred milliseconds at most. However, there can be exceptions, and it's important to remember that this article does not delve into the concept of Single Responsibility, which is another critical aspect of design and architecture.

Now, I'd like to pose a question to you and the readers: How much time do you invest in Monitoring and Observability, both as individuals and within your organizations? Your insights and experiences would be invaluable to the ongoing conversation.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了