Hyper-Threading - for the lay person.
Craig Watts
Believes Support is more exciting, dynamic and much more interesting than Implementation. But doesn't understand why others disagree.
As a Solution Architect who specialises in System Performance one of the joys in what I do is trying to explain technical concepts in a relatively non-technical manner. I call it a joy although it could easily be defined as a challenge, the less technical the recipient the greater the challenge.
It's about being an interpreter, be that attempting to describe how the solution is being used by the business users in order to assist technical and development teams or visa versa interpreting the underpinnings of the solution in such a way as it doesn't sound like a random list of TLAs (Three Letter Acronyms) when being described to functional users or management.
For this my preferred approach is generally to use analogies. Often bearing no real world resemblance to the matter being discussed but simply there to provide a point of reference and understanding which is hopefully understood. Today's topic is Hyper-Threading, yes I know that's a little random although I'm fairly sure at some point this week I'm going to be asked to explain some recent comments in relation to it's impact upon system performance specifically related to a Hyper-Threaded SQL Server environment. What better place to test out a new analogy than here where we have a true mix of the highly technical as well as the purely functional potential audience.
So what is Hyper Threading? Before we get to the analogy let's see what Intel have to say about the topic.
Intel? Hyper-Threading Technology is a hardware innovation that allows more than one thread to run on each core. More threads means more work can be done in parallel.
How does Hyper-Threading work? When Intel? Hyper-Threading Technology is active, the CPU exposes two execution contexts per physical core. This means that one physical core now works like two “logical cores” that can handle different software threads. The ten-core Intel? Core? i9-10900K processor, for example, has 20 threads when Hyper-Threading is enabled.
Two logical cores can work through tasks more efficiently than a traditional single-threaded core. By taking advantage of idle time when the core would formerly be waiting for other tasks to complete, Intel? Hyper-Threading Technology improves CPU throughput (by up to 30% in server applications).
Sound great doesn't it, the potential of 30% increased throughput with the flick of a switch. All good so far but there's a catch. Well there are actually a few . One of the keys to successfully utilising Hyper-Threading is the release of the physical core allowing the virtual queues to process. Another challenge is the metrics, theoretically 50% CPU usage could be 100% of the physical cores. I'd generally imply it's closer to 70% but it's a theory and theoretically each in use virtual could have been allocated it's own physical. There is also the time taken to release the physical core, in that, if there is a longer running process utilising the physical core the other processes sit in the virtual queue for longer waiting around for physical availability. That is one of the reasons it's suggested that for Hyper-Threading to really be effective no process should take more than 200 milliseconds to execute. Essentially getting in and out quickly and releasing the physical core for the next process.
Let's bring this into the non-IT world, the challenge here is what else could be used to describe virtual and physical cores while also explaining congestion. Then I had an epiphany, Motorways. They sure know about congestion and you could argue have more than the occasional deadlock. In particular UK Motorways which I had the joys of experiencing for most of the 1990s. Anyone who's experienced the M25 at rush hour or an 8hr tailback on the M1 Southbound due to snow knows exactly how congested they can get. That said this analogy is focused on the congestion caused by the merging of motorways.
For todays analogy we're going to pick on Birmingham, here we have multiple motorways converging but for the purposes of our example we'll focus on three. The M5 from Exeter, the M40 from London and The M6 Northbound. Each of the motorways will be three lanes with the M5 and the M40 representing our virtual cores (6 vCPUs) and the M6 representing our physical cores (3 CPUs). There are a couple of other assumptions to be made. Firstly lane restrictions apply, meaning that if you come from the middle lane on the M5 you have to use the middle lane of the M6. Also we're assuming every driver can seamlessly merge (Yes, I know that last is a pretty wild assumption)
Each of the feeding motorways (M5 and M40) are at about 40% capacity the traffic seamlessly merges onto the M6 and this is now running at 80% capacity. All good with a bit of wriggle room. So let's create some congestion, let's assume that half the traffic on the M5 is heavy haulage. For the purposes of our example that would imply a long running query. Now the gap to fit into (CPU idle time) is reduced and everything moves a little bit slower. Smaller gaps in turn cause our feeder motorways to start to have longer queues to enter onto the M6.
Maybe we have an incident on the M6 which for our analogy would be database blocking. You start to find the queues on the feeders start to grow even longer. You then start to find another SQL Server database setting can come into play, that is MAXDOP (Maximum Degrees of Parallelism) in our scenario this is what enforces the lane control. If the MAXDOP setting is 1 then all traffic has to stay in lane, anything other than 1 there is potential for our heavy haulage trucks to commandeer all three lanes of the motorway if required. So now due to one rather large truck on the M6 you don't have one ever increasing queue on each of the M5 and M40 but could potentially be backing up on all 6 lanes of feeder traffic. Not really a situation anyone wants.
It's not all doom and gloom for Hyper-Threading as it truly can achieve a 30% throughput uplift. You just have to ensure it has the right load interacting with it. Continuing with the motorway analogy, if everyone drove a sports car and all heavy haulage was sent via the rail network what a wonderfully efficient system we would have. That being one of the goals of any System Performance initiative, getting sports car like performance from processes which quite simply drive like a truck. By achieving that you can fully take advantage of your additional capacity.
Now if only we could teach everyone to merge properly, wouldn't that make the traditional Bank Holiday gridlock all the more manageable?