Vectorization Part 3 - Implemention
New challenges in the financial markets driven by changes in market structure and regulations and accounting rules like Basel III, EMIR, Dodd Frank, MiFID II, Solvency II, IFRS 13, IRFS 9, and FRTB have increased demand for higher performance risk and analytics. Problems like XVA require orders of magnitude more calculations for accurate results. This demand for higher performance has put a focus on how to get the most out of the latest generation of hardware.
This is the third in a series of blogs on Vectorization which is a key tool for dramatically improving the performance of code running on modern CPUs. Vectorization is the process of converting an algorithm from operating on a single value at a time to operating on a set of values at one time. Modern CPUs provide direct support for vector operations where a single instruction is applied to multiple data (SIMD).
In my last blog I covered the why and what of Vectorization.
In this blog I cover the practicalities of implementing Vectorization.
Alternatives
There are a range of alternatives and tools for implementing Vectorization. They vary in terms of complexity, flexibility and future compatibility.
Source: Intel
Intel’s 6 Step Program for Vectorization
The simplest way to implement vectorization is to start with Intel’s 6-step process. This process leverages Intel tools to provide a clear path to transforming existing code into modern, high-performance software leveraging multicore and manycore processors.
Step 1. Measure baseline release build performance
The starting point is a reference release build. A release build is important because:
1. The compiler will optimize your code
2. You need to have a baseline to measure how vectorization is improving performance
Ideally you should set a goal for performance to know when you are done.
Step 2. Determine hotspots
Tools like Intel’s performance profiler VTune? Amplifier XE can be used to profile your application to find the most time-consuming areas of code or “Hotspots”. Identifying Hotspots helps focus effort on the areas of optimization that will generate the most benefit.
Intel VTune Amplifier XE
Step 3. Determine loop candidates
Compiler reports like Intel's Compiler Optimization Report can tell you which loops are suitable for vectorization. Loops in hotspots that are not automatically vectorizable may be able to be modified using various techniques to allow them to be vectorized.
Step 4. Analyse specific hotspot code to measure performance gains
Tools like Intel's Advisor can be used to measure potential benefits from vectorization of specific code to help focus effort for the maximum gain.
Intel Advisor
Step 5. Implement Vectorization Recommendations
Implement recommendations for vectorizing code using re-ordering of code, compiler hints or other methods.
Step 6. Repeat
The process is iterative and should be repeated till the desired performance is reached.
Resources
Vectorization, Kirill Rogozhin, Intel, March 2017
Vectorization of Performance Dies for the Latest AVX SIMD, Kevin O’Leary, Intel, Aug 2016,
A Guide to Vectorization with Intel? C++ Compilers, Intel, Nov 2010,
Vectorization Codebook, Intel, Sep 2015,
The Free Lunch Is Over - A Fundamental Turn Toward Concurrency in Software, Herb Sutter, March 2005
Recipe: Using Binomial Option Pricing Code as Representative Pricing Derivative Method, Shuo-li, Intel, June 2016
Author/trainer/mentor in computational finance: maths (pure, applied, numerical), ODE/PDE/FDM, C++11/C++20, Python, C#, modern software design
6 年Interesting articles! How does this approach' benefits compare to using popular GPUs and C++ Concurrency? Is this solution for PDE, Monte Carlo?
Chief Customer Officer
7 年Great article about the importance of vectorization in order to write high performance C++ code
Technology Executive
7 年Rohan, These articles are very good! Thank you exposing these capabilities, how to apply them, and of course, for sharing.
An aspiring quant at ScotiaBank
7 年Nice articles! Quite informative!