Troubleshooting memory leak on a MySQL InnoDB cluster
Taylor Chu
Sr. Cloud Support DBE at Amazon Web Services (AWS)| RDS Core SME / EBS SME / RDS MySQL SME / Aurora MySQL SME / Aurora Infrastructure SME
A 3 node MySQL innodb cluster suddenly exhausted 256GB of memory and swap space on the primary node. But the replicas are normal, the workload has not changed.
Check that the database connections are not increasing. The customer wants to know if there is a problem with the MySQL database.
User is concerned about numa issues but confirms that innodb_numa_interleave is not enabled.
Troubleshooting Steps
By default, MySQL 8.0 enabled performance schema, we can enable setup_instruments to monitor thread memory usage.
Enable:
UPDATE performance_schema.setup_instruments SET ENABLED = 'YES' WHERE NAME LIKE 'memory/%';
Disable:
UPDATE performance_schema.setup_instruments SET ENABLED = 'NO' WHERE NAME LIKE 'memory/%';
We can use the following queries to investigate memory status:
"High number of bytes used event"
select EVENT_NAME, HIGH_NUMBER_OF_BYTES_USED from performance_schema.memory_summary_global_by_event_name where HIGH_NUMBER_OF_BYTES_USED > 0 order by 2 desc limit 10;
"Per Thread_id current allocated memory"
select thread_id,user,current_allocated from sys.memory_by_thread_by_current_bytes limit 5;
select sum(CURRENT_NUMBER_OF_BYTES_USED) from performance_schema.memory_summary_global_by_event_name;
"Event current allocated memory"
select event_name,current_alloc from sys.memory_global_by_current_bytes;
领英推荐
"Per Host Summary"
SELECT NOW() AS "Per Host Summary";
SELECT IFNULL(host, 'mysqld_background') AS host,current_count_used AS curr_count,sys.format_bytes(current_number_of_bytes_used) curr_alloc,count_alloc,sys.format_bytes(sum_number_of_bytes_alloc) total_alloc,count_free,sys.format_bytes(sum_number_of_bytes_free) total_freeFROM performance_schema.memory_summary_by_host_by_event_nameORDER BY current_number_of_bytes_used DESC;
Analyze
From above information, we can confirm that there is a MySQL memory leak because the operating system does not have more than 256 GB of memory.
But the group_rpl/THD_applier_module_receiver thread used 609.67 GiB, and the memory/mysqld_openssl/openssl_malloc event used 215GB.
It means MySQL allocates memory more than OS memory + swap space. It can cause OOM issues.
Using the key words by THD_applier_module_receiver and mysqld_openssl/openssl_malloc to search MySQL bug result is 97293. But this bug just says it was related to libssl/openssl & mysql versions < v8.0.18. There is no mention of which version fixed this issue.
User MySQL version is 8.0.26 and we can't disable openssl on group replication for security.
So we research the release notes by keyword "memory leak" to see if there are any similar bugs. Then we see the following bug:
SSL-related code was revised to avoid a potential memory leak. (Bug #31933295)
We also can easy search bug list with memory leak from following:
Suggestions
Upgrade to latest version 8.0.32 because this version fixed another 3 node cluster issue: Group Replication: In a 3 node cluster, all nodes were killed due to running out of memory. Subsequently, after all nodes were restarted successfully, attempting to bring the cluster back online caused the node that had been the primary to hang.(Bug #108339, Bug #34564856)
Another workaround is to disable group replication SSL related parameters:
Current setting:
DevEx Java Software Engineer with a passion for DevOps
1 周We same to have the same issue with OpenSSL eventhough we are using MySQL 8.0.39
Cloud Database Engineer
1 年Great Job !
Senior Database Administrator At Myntra
1 年Thanks for sharing, helps alot
Data & AI Specialist | xAWS | xMSFT | xActimize
2 年nice one...thanks for sharing tricks to narrow down the potential memory leakage in MySQL.
Snr Data Engr | Ex-AWS
2 年Nice job Taylor ! Memory leaks could be sometimes tricky to troubleshoot.