High Performance MySQL [Ch.2&3]

High Performance MySQL [Ch.2&3]

So last week we talked about chapter 1 of High Performance MySQL which discussed MySQL Architecture and History, this week I will discuss chapter 2 and 3 titled Benchmarking MySQL and Profiling Server Performance respectively.

Before starting a discussion on the internals of MySQL or any optimization techniques, it's essential to learn how to benchmark a server performance and how to profile for performance bottlenecks. Chapter 2 gives a good introduction on benchmarking strategies, techniques and tools while chapter 3 discusses profiling MySQL queries and diagnosing intermittent problems.

Benchmarking

During development and testing where there is no noticeable workload for the system to perform, it becomes very hard to predict how the system would behave under load, it's also not feasible to predict its capacity. Benchmarking lets you emulate fictional circumstances that help observe the system's behavior under load, measure its capacity, validate any assumptions made about the operational aspects of the system, reproduce edge cases and bad behaviors to see how the system reacts, test for performance, plan for growth and identify scalability bottlenecks.

The problem with benchmarking is that it's not real as the workload used benchmarking is as we mentioned before "fictional" which is usually very simple compared to real life workloads that are by nature, nondeterministic, varying and too complex to model. There are many factors that make benchmarking unrealistic, for example:

  • The data size as it might not be easy to emulate a production load during benchmarking.
  • The distribution of data and the varying nature of operations performed as benchmarking data might lead to an uneven distribution of executed queries which might not match what really happens on production and thus yields misleading results.
  • The pace at which queries are sent to the system as a naive benchmarking tool might send queries to the system as fast as it can in an unrealistic time window which will cause the system to unnecessarily behave badly.

It's very tricky to do capacity planning using benchmarking. The book mentions an example of a benchmark that might lead you to think that if you benchmark your current database server and then you benchmark a new server and you find out that the new server can do 40 times more transactions per second than the old one and then foolishly draw the conclusion that the new server will support 40 folds business growth. This is unrealistic because as the business grows 40 folds, its complexity is expected to grow way beyond that factor as queries get more complex, you will have more features and so on.

So now that we know what benchmarking can offer and what its shortcomings can be, let's talk a look at some strategies that the book proposes for benchmarking.

Benchmarking Strategies

There are two primary strategies to benchmarking, namely, benchmarking the whole application (full-stack) and benchmarking only MySQL (single-component). Usually you would want to benchmark the application as a whole because you care more about the entire application's performance than just MySQL, this includes the web server itself, networking, any application level optimizations, etc. But in some cases, you would want to compare 2 ways of writing a query or 2 different designs of a schema in which case benchmarking just MySQL would be quite useful since benchmarking the whole application is often a tricky/expensive process to setup.

Before we design the benchmark suite, we should determine what's the goal of the benchmark, basically, what is the question we are trying to answer. Do we want to know what's the best CPU for that server? Or do we want to know what's the best design of a schema to use? It's essential to determine a clear goal for the benchmark to achieve. There are common metrics that we would usually want to measure, for example:

  • Throughput: It's usually defined as the number of transactions per unit time and there are very common benchmarks like the TPC-C that many database vendors work hard to do well on. Those benchmarks measure online transaction processing which is the most suitable measure for multi-user interactive applications.
  • Response time: Measures the time a task takes to complete in any unit time as it fits the application. The raw response time measurements are rarely useful on their own, you would probably need to do some aggregations like max, min, average, median, percentiles to get any insights. Max and min response time are quire useless as they are the easiest to get affected by intermittent events like a temporary lag or a cache hit. Average is also quite misleading because a few outliers will end up skewing the average. Percentiles are a good way to get a reasonable idea of the response time, for example, if the 95th percentile is 200ms, this means that the task finishes in 200ms or less 95% of the time.
  • Concurrency: Measures how many concurrent tasks can be run at the same time. This is highly dependent on the nature of what you are measuring. For example, if you wish to measure how many concurrent users can browse a web page at the same time, it's misleading to consider the number of concurrently running queries on the server since thousands of users might incur only a very moderate number of concurrent queries which is the nature of the usage pattern of a web page (users take time to perform actions, they are not automatic). But for example, if you are measuring the concurrency of a worker which inserts data into the database, it makes sense to assume that an automatic workload will be sent to the database. A more accurate way of measuring concurrency is to only consider the number of threads that are doing work at a time (which doesn't really reflect how many end users can use the system concurrently, it just gives an idea on what is the maximum number of concurrent users given that all users performed some database action at the same). The book mentions that there will be a more detailed discussion of this in chapter 11.
  • Scalability: Measures how the system would perform under a changing workload. The basic idea is that a linearly salable system should be able to get twice as much work done if we double its resources. Of course not all systems are linearly salable, in fact most systems are not, instead they exhibit diminishing returns and degraded performance as the workload increases even if the resources increase. This type of benchmarking is essential and completes other benchmarks, for example, if we only benchmark for response time with a single threaded setup and get good results, we will have no idea how the system will perform under any degree of concurrency.

Benchmarking Tactics

Before we move into more detailed benchmarking tactics, let's consider some of the very common mistakes that might render an entire benchmark results useless and possibly lead to making the wrong decisions:

  • Using only a subset of the real data, such as using 1GB of data to benchmark an application that's expected to handle 10GBs of data. This might seems like a good option to cut corners but it will not reflect the real system behavior.
  • Using biased data while benchmarking, such as using random data which will cause results to be skewed.
  • Using irrelevant scenarios, such as using a single-user scenario for a multi-user system or benchmarking a distributed system on a single machine, etc.
  • ...

Merely avoiding these mistakes will produce a better quality benchmark results that are actionable.

The next step after setting the goal for a benchmark, is to decide whether to use a standard benchmark or design one of your own. If you are using a standard benchmark, make sure that it matches your usecase, for example, don't use a TPC-H benchmark which is an adhock, decision support benchmark to measure an OLTP (online transactions processing) system. If you are designing your own benchmark, the following is a somewhat general process of setting up the benchmark:

  • Take a snapshot of production data and make sure to have it ready for future restores in case the benchmark needed to be repeated.
  • Instead of just running a set of queries against this dataset, it's a good idea to log queries on production for a long period or for multiple periods during representative time windows to cover all system activities and use this log to simulate a normal production load.
  • Record results and plot them to make it easier to draw conclusions.
  • Try to automate the process or record it to be able execute it in exactly the same way in case you needed to repeat the benchmark for a different setting. This is crucial to be able to compare different settings against the same load.

A question presents itself, for how long should a benchmark runs? For a benchmark to be useful it has to run for a meaningful period of time that allows the system to reach its steady state which could take a lot of time. Before a benchmark starts, you would probably need to warm up the system, during the warm up process, you might see fluctuations that will eventually start to diminish and the system would start exhibiting a consistent behavior. As a rule of thumb, keep running the benchmark until the system looks like it has been steady for at least how long the initial warm up appeared to take.

Another question is, what data to capture during a benchmark? It's always a good idea to record as much data as you can in the rawest form possible during a benchmark. If after the benchmark concludes, you end up with extra data it would be a lot better than running a 24 hours long benchmark and end up with missing data. Recording raw data is a good idea because you can manipulate it in many different ways and still have the raw data if you get any new ideas.

Benchmarking Tools

The book mentions a variety of benchmarking tools that can be used to automate the benchmarking process. I won't mention them all, but will mention one in particular, it's called mysqlslap which is a single-component benchmarking tool (only benchmarks mysql). mysqlslap let's you create a test schema, specify the number of connections, run a test load of sql queries or even let it generate random SELECTs based on the provided schema and it reports time information. A complete documentation of mysqlslap can be found in the official mysql docs here.

If benchmarking is something you do very frequently, it's a good to pick a small set of tools, learn them well, build your scripts to run them, format their results, plot it, etc. and make them suit your needs.

This concludes chapter 2, now onto chapter 3.

Profiling Server Performance

As the book authors say, profiling server performance is a way to answer the most three common questions asked pertaining to a database server performance which are:

  • Finding out if the server is doing its work optimally.
  • Finding out why a certain query takes so much time.
  • Troubleshooting intermittent problems that might appear as stalls or temporary system freezes happening sporadically.

The book defines optimization as reducing response time, so naturally, it makes sense to be able to understand why exactly the server needs a certain amount of time to respond to a query in as detailed way as possible. This will enable us to know if the server is doing work that can be eliminated by maybe re-writing the query, adding an index, etc. or it can tell us that everything that server is doing is actually necessary and that it's best we can do which saves us a lot of time before we start optimizing something that's already at its maximum speed.

Another important rule the book mentions is to carefully pick your battles, that is to say, you need to learn how to decide on which queries to optimize and which to not. So for example, if a query only runs 5% of the time, any performance improvements however significant will at most improve the overall performance by 5% so other minor improvements in queries that run 50% of the time will be more worthy of our time.

What is Profiling?

Profiling a task is not as simple as measuring how much time a task takes to conclude, rather it requires measuring how much time the task takes broken down to individual child tasks that it needs to execute. This structure is useful to construct call graphs and understand what the server actually does to execute a query. It's also useful to be able to aggregate similar tasks in single items and sort tasks so that the important ones bubble up.

To make this clearer, let's take a look at a simplified profile produced for a SELECT query against a table called InvitesNew:

No alt text provided for this image

The able above shows a subset of the columns the profile would normally have for clarity. The first column shows the rank which is an indication of how important a task is (importance here is defined as its response time percentage of the total response time of all tasks), the second column is the aggregation of response time of all instances of a certain task, the third column shows the percentage of time all instances of this task take compared to the total execution time which determines its rank, the fourth column shows the number of instances of this task, the sixth column shows the average response time for each instance of the task and the sevenths column shows the task itself.

There are two types of profiling, execution time profiling which is useful if tasks are CPU bound, those types of queries would spend the majority of their execution time doing work. The other type which is wait analysis is useful for I/O bound queries that spend the majority of its execution time waiting to access resources or read/write data.

Instrumentation

All the previous only considered profiling queries on the MySQL level. In many cases, profiling the application might be easier and more useful as it might show problems early. Instrumentation is having a profiling active in every operation the application is performing. The book mentions a tool called Newrelic which is a great tool for profiling a web server as it shows the response time of each endpoint broken down to some finer grained tasks. This type of profiling can help figure out if an endpoint is slow because of a database query or because of something else which is quite useful before jumping ahead and profiling the database itself. This also let's you know if you need to optimize anything at all and which queries are more worthy of your time.

If you are looking to profile a mobile application, make sure to check Instabug APM as it enables you to create custom traces around operations that you think are worth measuring among a load of other useful features. You can read more about it here.

Profiling MySQL Queries

Profiling MySQL queries can be done on two levels:

  • Profiling the Server's Workload: This approach is useful if the application has no instrumentation in place so you actually don't know what queries to optimize. This helps to identify those trouble spots that need optimizing. MySQL has something called a slow-query-log which logs slow queries (slower than a certain threshold) which can be inspected and it's controlled by a variable called long_query_time which can be set to zero to log all queries on the server that can later be used to create a profile for the server. It's worth noting that logging queries incurs extra I/O cost that according to the authors is negligible for I/O bound workloads but noticeable for CPU bound workloads. Also logging all queries onto the server might eat up disk space quickly, so you would have to have a mechanism in place to regularly clean up those logs. There are lots third party that do a pretty good job at this.
  • Profiling a Single Query: If you already have instrumentation in the application or you did a whole server profile before, you probably have a list of queries that are troublesome and need optimizing. But in order to optimize that individual query, you need to understand why it takes a certain amount of time and doing which tasks. MySQL has a really useful command called SHOW PROFILE which is used to profile certain queries. To enable profiling, you need to set the variable profiling to 1, then you can start issuing queries to the server that you can later use the command SHOW PROFILES to see all recorded profiles and SHOW PROFILE FOR QUERY <id> to check the profile for a certain query.

Single Query Profiling Example

The book show an example of how useful a basic command such as SHOW PROFILE can be used. After performing SET profiling=1; to enable profiling, we issue the following query:

SELECT * FROM lists;

Then if we run SHOW PROFILES, we get a list of all the available query profiles which is only 1 in this case because we only ran 1 query. The output of which looks like this:

+----------+-------------+---------------------+
| Query_ID |  Duration   |        Query        |
+----------+-------------+---------------------+
|        1 | 0.16767900  | SELECT * FROM lists |
+----------+-------------+---------------------+

We can then inspect the profile of query with ID=1 by running SHOW PROFILE FOR QUERY 1; which produces a detailed table of all the operations it needed to execute. A truncated version of this table is shown below:

No alt text provided for this image

The problem with the raw output we get is that it has repeated tasks, like for example 5 tasks for checking permissions which clutters the output and makes it difficult to process. However, we can perform normal SQL queries on the profiling table as we do on any other table to format it in whichever way we wish, like for example:

SELECT STATE, SUM(DURATION) AS Total_R,
ROUND(
  100 * SUM(DURATION) /
  (SELECT SUM(DURATION)
   FROM INFORMATION_SCHEMA.PROFILING
   WHERE QUERY_ID = @query_id
),2) AS Pct_R,
COUNT(*) AS Calls,
SUM(DURATION) / COUNT(*) AS "R/Call"
  FROM INFORMATION_SCHEMA.PROFILING
  WHERE QUERY_ID = @query_id
  GROUP BY STATE
  ORDER BY Total_R DESC;

Which produces a rather more organized result as shown below:

No alt text provided for this image

As we can see the produced result similar to the profile we inspected before which is a lot more useful, notice that now we have grouped all 5 instances of the operation "checking permissions" in a single entry and so on. This enables us to easily see that this query spent half of it time copying data to a temporary table. This also enables us to see which operations are worth optimizing, so maybe if we have had suspected that sorting by the primary index is causing the query to be slow, now we can see that sorting the result set only takes 6% of the total time and thus not worth optimizing.

SHOW PROFILE is quite powerful and it can show a lot more data beyond just response times, it can show block I/Os, context switches, CPU utilization, IPCs, memory pressure, page faults, swaps, etc. For a full documentation, refer to MySQL docs here.

It's worth mentioning that SHOW PROFILE isn't the only profiling method MySQL natively supports, there is also SHOW STATUS which shows global counters and can offer more insights and be complementary to the profile we generated above. A full documentation for SHOW STATUS can be found here.

Optimization via Profiling

Having obtained the query profile with its detailed breakdown of tasks, the authors say that optimization should be obvious (makes me feel bad about myself xD), but I tend to agree to some extent. The profile can show you if a query creates a temporary table for example and then you can focus your efforts into what causes the temporary table to be created and how to get around that. It can show other problems like un-indexed reads, lock contention, etc. which will help direct your optimization efforts a lot more even if you currently don't know how to eliminate those.

Profiling Case Study

The book ends chapter 3 with an interesting case study of a system the authors were consulted to diagnose. The problem is as follows:

The customer complained that once every day or two, the server rejects connections with a max_connections error and it lasts for a few seconds to few minutes in a highly sporadic manner

The authors mention that this customer didn't attempt to fix the issue at all which made their life easier. Their arguments is that it's easier to debug a system that has not been tampered with or was subjected to any unknown changes (no weird trial and error debugging attempts.)

The debugging process went as follows:

  1. Get as much information about the system as possible starting with the basic information such as which mysql version is running, what's the storage engine, what are the server specs, etc. This helps in selecting the proper tools to debug the problem and spot any known problems with certain versions.
  2. Took a quick look and found nothing wrong with the queries.
  3. Installed their diagnostic toolkit and set it up to trigger on Threads_connected since the connections problem will most likely be tied to that. They say, the average was 15 threads but during the incident the number of threads used to go up to several hundreds.
  4. They recorded a wide range of statistics and profiles that I won't show here to keep this article short, but you can see them in the book.
  5. One of the important stats they recorded was the read/write activity which showed a weird behavior as the server was writing hundreds of megabytes of data per second to the disks. After some calculations, it appeared that the server was doing around 150MB of writes per second but the whole database was only 900MB and the workload was almost exclusively SELECTs.
  6. They noticed that the main thread in innoDB was trying to flush dirty pages which only appear in the status if it was delayed. They knew that this version of innoDB suffered from a problem called furious flushing which happens when innoDB fails to evenly distribute flushing dirty page and instead forces a checkpoint to flush a lot of data all at once (sometimes called checkpoint stall). This seemed like a plausible explanation but it wasn't the problem. They verified that it wasn't the problem by running SHOW STATUS and observing the number of Innodb_buffer_pool_pages_flushed counter which wasn't increasing that much. They mention this possibility to highlight the need for proving conclusions before getting carried away with what seems familiar.
  7. After some more extensive debugging focused on this disk issue, it appeared that the cause of the problem is a storm of bad queries all at once saturating that disk and causing it to be very latent. The problem turned out to be on the application level as the application cache used to flush and many instances of the application were trying to repopulate it all at the same time causing what's known as a crash stampede.
  8. The authors then mention that a proper instrumentation on the application level or logging for such events (cache flush/cache re-populate) would have shown the problem easily and enabled the developers from solving it without a specialized database consultation as the database wasn't really the problem after all but rather the application.

References

High Performance MySQL by Derek J. Balling

Old articles

[Chapter 1] MySQL Architecture and History


Ayman Nasr

Embedded Software Engineer @Alefbits

4 年

Where could I find the discussion materials ? is it online sessions ?

回复
Eslam Gomaa

Senior Site Reliability Engineer II (Tech Lead) @ Careem

4 年

good effort

回复

要查看或添加评论,请登录

Sayed Alesawy的更多文章

  • High Performance MySQL [Ch.6 | part1]

    High Performance MySQL [Ch.6 | part1]

    So the last article discussed chapter 5 of High Performance MySQL titled Indexing for High Performance. Today's article…

  • High Performance MySQL [Ch.5]

    High Performance MySQL [Ch.5]

    So the last article discussed chapter 4 of High Performance MySQL titled Optimizing Schema and Data Types. Today's…

  • High Performance MySQL [Ch.4]

    High Performance MySQL [Ch.4]

    So the last article discussed chapter 2 and 3 of High Performance MySQL titled Benchmarking MySQL and Profiling Server…

    3 条评论
  • High Performance MySQL [Ch.1]

    High Performance MySQL [Ch.1]

    So a couple of weeks ago, one of my colleagues at Instabug recommended this book called High Performance MySQL to me. I…

    3 条评论
  • Lessons I learned while working with Go for over a year

    Lessons I learned while working with Go for over a year

    So I have been writing Go services for over a year now, both professionally and as personal projects. Using a certain…

    2 条评论
  • Making Videos Searchable at Scale

    Making Videos Searchable at Scale

    I have been meaning to write an article going through the interesting parts of my graduation project for quite a while…

    7 条评论

社区洞察

其他会员也浏览了