JVM Monitoring
RadhaKrishna Prasad
PerformanceEngineeringSME | SRE | Corporate Trainer - Performance Engineering | CloudPerformanceTesting | Chaos and Resilience | Observability | DevOps |
Definitions:
Monitoring
Observing performance data in real time to find and correct resource, throughput, or response time problems.
Trending
The analysis of data with the intention of identifying noticeable patterns.
Forecasting
The projection of those identified patterns on business growth patterns to understand the impact on business processes.
Capacity Planning
The response to forecasts that ensures the integrity of business processes.
Capacity/Load model:
Typical WAS/J2EE Application Components:
What kinds of Problems does JVM Monitoring Help Solve?
Request / Transaction problems
- Slow or Hung requests
- Intermittent performance problems
- Correlation to remote EJB containers, CICS, IMS, MQ
Memory leaks
- Monitor JVM heap size, memory usage and garbage collection patterns and Heap snapshots
Resource monitoring
- Connection Pools, JDBC, Thread pool, etc
Problem Recreation
- Provides production data for hard to re-create problems via integration with monitoring tool that we are using.
How is it doing today and how will it do tomorrow?
- Historical and Trending reports
Questions to Ask when troubleshooting:
- Is the problem repeatable or recreated?
- Did it ever work?
- If it did, what changed – configuration, additional installation, product upgrade etc.
- Does environment matter e.g. works in test/development but not in production
- What is the topology of the environment
- What external systems are involved?
- Any connectivity (firewall), security – authentication, expired passwords issues?
- Is there any workload considerations
- Is the problem happening under heavy workloads?
- Network or bandwidth issues?
- Is there a pattern to the problem e.g. every Monday morning at 10 AM?
These are generic troubleshooting checklist that will apply to any kind of problem not just in the WebSphere environment. Most of the time errors are configuration changes where human typing is involved or a process is not followed.
Monitoring Levels
- Monitor Vertical levels, not Horizontal levels
Monitoring On Demand
- Change monitoring level as needed without restarting either the applications or the application servers
- No need to pinpoint specific classes or methods in advance (i.e., no need to designate what needs to be monitored)
“Level 1” – Request Level - Production
- 100% of System Resource information
- 100% of incoming requests/transactions
“Level 2” – Component Level – Problem Determination
- View major application events (EJB’s, servlets, JDBC, JNDI, etc.)
“Level 3” – Method Level - Tracing
- Adds method trace information for problem determination and performance analysis.
Using the Tool Efficiently:
Everyone assumes they need method level data for every transaction in Production
- What would you do with that much data?
- Gain Application/Transaction Understanding in Test/QA, workload understanding in Production
- Use Traps and Alerts to find anomalies and collect detailed data
Test/QA
- Use L2/L3 for Transaction/Application Analysis
- Top Methods Used (L3)
- Most CPU Intensive methods (L3)
- Top Slowest Methods (L3)
- Transaction Component (L2) Trace
- Transaction Method (L3) Trace
- SQL Profile (L2)
Some Performance Tuning tips to improve performance:
Here are a few other things that we can try to help improve performance. Please note, that these suggestions are given without detailed knowledge of the environment / architecture / open issues.
- Increase web container max keep-alives.
- Increase web container thread pool.
- Increase database connection pool.
- Adjust maximum and minimum heap sizes.
- Disable explicit garbage collection.
- Enable concurrent I/O at O/S level.
- Pre-compile JSPs.
- Increase the priority of the app server process at OS level.
- If there are many short living objects, tuning MinNewSize and MaxNewSize JVM parameters would help.
- Changing limit for operating system (AIX, Solaris) may help improve performance.
- Enable dynamic caching, if possible.
- Creating new indexes or re-organizing indexes will help improve performance of database intensive transactions.
- Adjusting prepared statement cache size may also help.
- Adjust O/S parameters: tcp_time_wait_interval and tcp_fin_wait_2_flush_interval.
- Verify System, Java and App Server Runtime Environment
- Check Server Statistics Compare key performance metrics side-by-side
- Validate Throughput vs Response Time Quantify Application Scalability
- Request rate during stress runs
- Throughput vs. Garbage Collection (GC) Tune JVM to minimize GC frequency
- Throughput vs. Total GC time Avoid paging (has large effect on end user response time)
- Throughput vs. Heap size after GC Good indicator of potential memory leaks
- WebSphere Resources Utilization Analysis Verify application does not over-tax app server resources
- Check Average CPU time per Transaction Based on threads running application classes in workload mix
- Transaction with very high CPU in spike interval
- Check for Top Methods Used Identify hot methods by count
- CPU consumption for each method
- High average response time per method
- Memory Analysis Reporting Quick check to detect presence of a leak
- Memory Leak: Avg. Heap Size after GC vs. Requests
- Memory Leak: Average Heap Size after GC vs. Live Sessions
Thanks for reading the article:) Happy Learning.