Accelerating AI processing. By: Jose Segadaes
José Segad?es
Electronics and Telecommunications Engineer | Artificial Intelligence enthusiast | Laboratory Equipment | Medical Devices |
Because of the huge processing power required to process artificial intelligence deep neural networks we need alternatives to processing them with traditional microprocessor (CPUs-Central Processing Units) and move onto specialized and dedicated microprocessors, namely AI accelerators. AI accelerator chips are capable of much faster processing of data. At present there are a number of different chip technologies being used to accelerate AI algorithm processing, namely GPUs (Graphic processing units), FPGAs (Field programmable gate arrays) and more recently the emergence of the dedicated AI inference accelerator chips. GPUs, FPGAs and AI inference accelerators are being used extensively in data centers but is quickly being introduced at the edge, that is in end-point devices like smart phones for example.
I will give a brief description of each as follows:
GPUs – Are specialized microprocessor based hardware used in computers to manipulate images and video. The mathematical basis of deep neural networks and image manipulation are similar and thus are suited for AI algorithm processing.
FPGAs- Are chips which can be configured for a specific application using a software programming language, namely Verilog language, to re-configure the digital circuitry within the FPGA. FPGAs can be configured to process AI algorithms and the outcome speed is fast as well. With FPGAS comes flexibility as you can always re-configure and upgrade in future by re-using and re-configuring the FPGA.
AI inference chips – While GPUs and FPGAs perform far better than traditional microprocessors for AI processing, AI inference accelerator chips can leverage the processing efficiency by a factor of 100 compared to traditional microprocessors. AI inference chips are ASICs (Application specific integrated circuits). They are physically designed and manufactured specifically for AI algorithm processing.
Let’s look what constitutes an AI inference accelerator. It is comprised of both hardware and software. The hardware is constituted of MACs, (Multiply and accumulate unit) on-chip SDRAM (internal memory), off-chip DRAM (external memory), control logic and interconnect. The software is comprised of algorithms, performance estimations and code generation.
HARDWARE
MACS-Is the hardware which executes the mathematical processing of the AI algorithms.
SDRAM-Processing instructions and variables need to be stored internally on internal memory.
DRAM- For large applications data may have to be stored externally from the chip.
Control Logic – Is hardware within the chip which orchestrates the data flow within the chip.
Interconnect- Data buses and address buses which move data between the MACs and SDRAM.
SOFTWARE
Algorithms-The input to the AI inference chip’s software is a deep neural network model. To run properly the chip needs an algorithm that tells it how to orchestrate the resources feeding the model.
Estimation –This part of the software will estimate the time it will take to process the input data, for example an image.
Code- This is the programming code which gets loaded on to the chip to execute the deep neural network model.
The peak performance of an AI inference accelerator is measured in TOPS (Trillions of operations per second) = chip’s clock frequency/compute unit (1 MAC=2 operations).
In essence more and more dedicated AI inference accelerator chips will find their way into the accelerating of processing of AI algorithms.