Troubleshooting memory leak on a MySQL InnoDB cluster

A 3 node MySQL innodb cluster suddenly exhausted 256GB of memory and swap space on the primary node. But the replicas are normal, the workload has not changed.

Check that the database connections are not increasing. The customer wants to know if there is a problem with the MySQL database.

No alt text provided for this image



User is concerned about numa issues but confirms that innodb_numa_interleave is not enabled.

No alt text provided for this image



Troubleshooting Steps

By default, MySQL 8.0 enabled performance schema, we can enable setup_instruments to monitor thread memory usage.

Enable:
UPDATE performance_schema.setup_instruments SET ENABLED = 'YES' WHERE NAME LIKE 'memory/%';

Disable:
UPDATE performance_schema.setup_instruments SET ENABLED = 'NO' WHERE NAME LIKE 'memory/%';        


We can use the following queries to investigate memory status:

"High number of bytes used event"

select EVENT_NAME, HIGH_NUMBER_OF_BYTES_USED from performance_schema.memory_summary_global_by_event_name where HIGH_NUMBER_OF_BYTES_USED > 0 order by 2 desc limit 10;        
No alt text provided for this image

"Per Thread_id current allocated memory"

select thread_id,user,current_allocated from sys.memory_by_thread_by_current_bytes limit 5;        
No alt text provided for this image
select sum(CURRENT_NUMBER_OF_BYTES_USED) from performance_schema.memory_summary_global_by_event_name;        
No alt text provided for this image



"Event current allocated memory"

select event_name,current_alloc from sys.memory_global_by_current_bytes;        
No alt text provided for this image

"Per Host Summary"

SELECT NOW() AS "Per Host Summary";
SELECT IFNULL(host, 'mysqld_background') AS host,current_count_used AS curr_count,sys.format_bytes(current_number_of_bytes_used) curr_alloc,count_alloc,sys.format_bytes(sum_number_of_bytes_alloc) total_alloc,count_free,sys.format_bytes(sum_number_of_bytes_free) total_freeFROM performance_schema.memory_summary_by_host_by_event_nameORDER BY current_number_of_bytes_used DESC;        
No alt text provided for this image


Analyze

From above information, we can confirm that there is a MySQL memory leak because the operating system does not have more than 256 GB of memory.

But the group_rpl/THD_applier_module_receiver thread used 609.67 GiB, and the memory/mysqld_openssl/openssl_malloc event used 215GB.

It means MySQL allocates memory more than OS memory + swap space. It can cause OOM issues.

Using the key words by THD_applier_module_receiver and mysqld_openssl/openssl_malloc to search MySQL bug result is 97293. But this bug just says it was related to libssl/openssl & mysql versions < v8.0.18. There is no mention of which version fixed this issue.


User MySQL version is 8.0.26 and we can't disable openssl on group replication for security.


So we research the release notes by keyword "memory leak" to see if there are any similar bugs. Then we see the following bug:

SSL-related code was revised to avoid a potential memory leak. (Bug #31933295)

https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-28.html


We also can easy search bug list with memory leak from following:

https://downloads.mysql.com/docs/mysql-8.0-relnotes-en.pdf


Suggestions

Upgrade to latest version 8.0.32 because this version fixed another 3 node cluster issue: Group Replication: In a 3 node cluster, all nodes were killed due to running out of memory. Subsequently, after all nodes were restarted successfully, attempting to bring the cluster back online caused the node that had been the primary to hang.(Bug #108339, Bug #34564856)

https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-32.html

Another workaround is to disable group replication SSL related parameters:

group_replication_ssl_mode

group_replication_recovery_use_ssl

Current setting:

No alt text provided for this image
Thomas Verhoeven

DevEx Java Software Engineer with a passion for DevOps

1 周

We same to have the same issue with OpenSSL eventhough we are using MySQL 8.0.39

Vijay Choudhary

Cloud Database Engineer

1 年

Great Job !

Nikhil Verma

Senior Database Administrator At Myntra

1 年

Thanks for sharing, helps alot

Casper Mak

Data & AI Specialist | xAWS | xMSFT | xActimize

2 年

nice one...thanks for sharing tricks to narrow down the potential memory leakage in MySQL.

Addy (Adekunle) Clement

Snr Data Engr | Ex-AWS

2 年

Nice job Taylor ! Memory leaks could be sometimes tricky to troubleshoot.

要查看或添加评论,请登录

Taylor Chu的更多文章

社区洞察

其他会员也浏览了