Vectorization Part 2 – Why and What?
New challenges in the financial markets driven by changes in market structure and regulations and accounting rules like Basel III, EMIR, Dodd Frank, MiFID II, Solvency II, IFRS 13, IRFS 9, and FRTB have increased demand for higher performance risk and analytics. Problems like XVA require orders of magnitude more calculations for accurate results. This demand for higher performance has put a focus on how to get the most out of the latest generation of hardware.
This is the second in a series of blogs on Vectorization which is a key tool for dramatically improving the performance of code running on modern CPUs. Vectorization is the process of converting an algorithm from operating on a single value at a time to operating on a set of values at one time. Modern CPUs provide direct support for vector operations where a single instruction is applied to multiple data (SIMD).
In my last blog I cover how CPUs have evolved and how software must leverage both Threading and Vectorization to get the highest performance possible from the latest generation of processors.
In this blog I cover the why and what of Vectorization.
Why Vectorize
Vectorization is the process of converting an algorithm from operating on a single value at a time to operating on a set of values (vector) at one time.
Modern CPUs provide direct support for vector operations where a single instruction is applied to multiple data (SIMD). For example a CPU with a 512 bit register could hold 16 32-bit single precision doubles and do a single calculation 16 times faster than executing a single instruction at a time. Combine this with threading and multi-core CPUs leads to orders of magnitude performance gains.
The following is code to add two vectors.
for (i = 0; i < 4; i++)
c[i] = a[i] + b[i];
In a serial calculation, the individual vector (array) elements are added in sequence. The additional register space in modern CPUs is unused.
In a vectorized calculation, all elements of the vector (array) can be added in one calculation step.
What kind of problem is vectorizable?
Not all code can take advantage of vectorization. The problem set must be amenable to a vectorized solution. Vectorization works best on problems that require the same simple operation to be performed on each element in a data set. So, first of all, look for a loop. The prototypical example is used above - the addition of each element in an array.
for (i = 0; i < count; i++)
c[i] = a[i] + b[i];
But many other primitive operators can also be vectorized. The kinds of matrix transformation seen in linear algebra are usually a good candidate for vectorization. The good news is that the Finance domain provides many problem sets that are suitable.
Issues that impact Vectorizing your code
There are a range of issues that can impact the effectiveness of vectorisation. Some of the more common ones include:
1. Loop Dependencies (Avoid read-after-write)
for (i = 1; i < end; i++)
f[i] = f[i-1] + b[i-1];
2. Indirect Memory Access (Use loop index directly. Seek unit loop stride)
for (i = 0; i < end; i++)
c[idxC[i]] = a[i] + b[i];
3. Non ‘Straight line’ code (function calls, conditions, unknown loop count)
for (i = 0; i < CalcEnd(); i++)
{
if (DoJump())
i += CalcJump();
c[i] = a[i] + b[i];
}
Resources
Vectorization, Kirill Rogozhin, Intel, March 2017
Vectorization of Performance Dies for the Latest AVX SIMD, Kevin O’Leary, Intel, Aug 2016
A Guide to Vectorization with Intel? C++ Compilers, Intel, Nov 2010
Vectorization Codebook, Intel, Sep 2015
The Free Lunch Is Over - A Fundamental Turn Toward Concurrency in Software, Herb Sutter, March 2005
Recipe: Using Binomial Option Pricing Code as Representative Pricing Derivative Method, Shuo-li, Intel, June 2016