Accelerating Computing of the Future
Having the right tool for the job is always important. Some tools, like this Wenger Giant Swiss Army Knife, provide so many different functions that it would make MacGyver drool. But if you had to tighten that loose screw in a narrow place, the Phillips screw driver would do the best job. For decades, the Swiss army knife of the computing world, the CPU, has laid the foundation for all computing infrastructure. However, today’s computing problems require more single purpose tools that provide efficient computing power.
Our Computing Legacy
Until 2008, gains in CPU performance seemed to continue forever, but since then, clock stall has been a factor in modern CPU design. With clock stall, CPU single-thread performance and frequency diverted from the previous exponential rise and “stalledâ€. Constraints in fabrication cost and power consumption have pushed the processor industry to focus on power efficiency instead of single-thread performance. As shown in the bottom curve in the chart below, this push began the era of multi-core processors and parallel computing to produce performance gains.
Given these trends, the emergence of big data, and armies of developers built around CPU software development, companies have built out massive data centers based on the huge populations of CPUs. In 2013, the Department of Energy estimated that U.S. data centers consumed more than 2% of all U.S. electricity use. In 2016, that number could be as high as 3%. The most appalling part is that one-third of data centers’ electrical consumption is used for cooling. Analogous to the Giant Swiss Army Knife turning a screw, multipurpose CPUs are performing computing tasks inefficiently and creating massive amounts of waste heat. Thus, computing requires efficient single purposed tools, like the screw driver, for increasing performance by making computing more efficient.
What’s in the Tool Kit?
In the computing tool kit, there are four basic device types that form the foundation of modern computing:
- Central Processing Unites (CPUs)
- Graphics Processing Units (GPUs)
- Field Programmable Gate Arrays (FPGAs)
- Application Specific Integrated Circuits (ASICs)
The CPU’s main design objective has been flexibility, i.e., ease of programming. To enable flexibility, CPUs have been designed with control circuitry that can execute sequences of commands based on the CPU’s instruction set. The Arithmetic Logic Units (ALUs) support arithmetic and logic on data inside the CPU’s memory. When programs are compiled, they are converted into a list of instructions and memory addresses from which the CPU executes the program. For decades, software developers have taken this feature for granted.
GPUs have taken the multi-core trend to the extreme by adding thousands of smaller control cores onto a single device. Additionally, GPUs have large amounts of ALU resources compared to the CPUs. GPUs have been commonly used in graphics add-in cards and have been specifically popular with the gaming industry. In 2006, GPUs were adopted for accelerating computing because of large quantities of ALU resources and the ability to massively parallelize operations. The use of GPUs for accelerated computing has been explored for many applications.
FPGA architectures are vastly different from GPUs and CPUs. Before FPGAs, digital designers were forced to wire dozens (even hundreds) of individual logic chips to realize designs. With the invention of the configurable logic fabric, FPGAs made the wired designs of logic chips obsolete. Configurable logic created new flexible digital design capabilities not possible with the previous hard wired designs. Today, FPGAs are used heavily in communications, but are making ways into the computing market.
Because of CPU legacy, armies of software developers do not know how to program FPGAs. Programming of FPGAs is done through a Register Transfer Level (RTL) language such as Verilog or VHDL. RTL languages describe the operations of signals as they pass through the logic fabric. Unlike traditional software, which has a behavioral flow paradigm, RTL development uses a data flow paradigm that requires different methods for development and debugging. The Block RAM memory on an FPGA is also orders of magnitude less than a CPU.
The most significant advantage that FPGAs have over CPUs and GPUs is computing efficiency. In many applications, higher efficiency means faster operations that consume less energy. The efficiency advantage is a triple win for increasing performance and reducing operational/ capital expenditures. Generally, it takes less hardware to do the same job. Finally, the most unique advantage that FPGAs bring to the table is RTL designs can be used directly to build an ASIC.
ASICs, although single purposed, have the highest computing efficiency of all. An ASIC is extremely similar in function to an FPGA. However, ASICs do not have the ability to be reprogrammed. The program is burnt (wired) directly onto a piece of silicon and is packaged into a chip. Because of the startup costs, ASICs make the most sense in applications where large volumes or large performance increases are needed. Most importantly, FPGA designs can be ported to ASIC designs for increasing computing efficiency.
Tool Comparison
One area of research that has performance comparative analysis on some of these tools is bioinformatics. In bioinformatics research, DNA sequence alignment is a particularly challenging problem, where researchers have tried CPU, GPU, and FPGA implementations to increase performance of alignment algorithms. The figure below, taken from Hasan, shows spider graphs with five different evaluative parameters for CPU, GPU, and FPGA implementations. The further away from the center of the graph the better the implementation is for that parameter.
As discussed previously, CPUs are most flexible (i.e., easiest) for development, GPUs less flexible than CPUs, and FPGAs least flexible for development. While FPGAs have a flexibility disadvantage, their performance per Watt and Cost (Euro) exceeds that of both CPUs and GPUs. These advantages are so significant that Hasan ranked FPGAs as the best future prospect for hardware acceleration in bioinformatics. Hasan’s prediction was correct. Since this publication in 2011, bioinformatics hardware vendors have recently emerged using both FPGA and ASIC accelerated hardware. Additionally, the computing industry has also come to the same conclusions.
The Future Toolkit: FPGAs + CPUs
The power efficiency of FPGAs has been noticed by the computing industry and has led to some interesting recent developments. The most significant development is Intel’s acquisition of top FPGA vendor Altera for $16.7 Billion. The motivation: to help Intel maintain dominance in the data center market. Specifically, application accelerators (FPGAs /ASICS) meet data center performance and power usage requirements. Similarly, Altera’s rival, Xilinx, has formed a partnership with IBM. IBM’s new OpenPOWER architecture has created a unique way to support accelerators through the Coherent Accelerator Processor Interface (CAPI), which allows accelerators direct access into memory of the POWER8? processor. Finally, Xilinx and Altera both now have devices with fabric that links FPGAs with ARM processors into the same device.
Designing with Tools of the Future
With CPUs and FPGAs working closer together, accelerated computing of the future will require both software developers and FPGA developers. To be most effective, software developers will need to identify the primitives of their solutions that need acceleration. The FPGA designer will then take these primitives into an accelerated RTL design. Additionally, through a seamless software application programming interface (API) and hardware driver, software developers need to rely on their old paradigms while FPGA developers give them access to accelerated primitives. In some cases, RTL designs will be made into ASICs.
Conclusions
With a revolutionary legacy, the CPU still has an important role in computing, but will need specialized applications on FPGAs to create the next jump in performance gains. Augmenting CPU’s computing efficiency limitations with FPGAs is the best choice. Unlike software developers, FPGA developers are not as common, and the demand for these skills will increase. Additionally, the need for new development paradigms for working with the new CPU-FPGA interfaces and dual-devices will create a number of new unique challenges. Having the right tools is important, and the CPU and the FPGA together create the right mix of capabilities that will accelerate computing into the next generation.
About AHA
For almost three decades, Advanced Hardware Accelerators (AHA) has developed ASICs and FPGA solutions for the data compression and communications industry. AHA offers both ASICs and PCIe boards for GZIP and SSL acceleration. Using our hardware, customers have optimized workloads to reduce capital and operational expenditures. AHA is a fully capable development group for creating ASICs, boards, and FPGA core designs for a variety of applications. Visit us at www.aha.com.
Credits:
Laiq Hasan and Zaid Al-Ars (2011). An Overview of Hardware-Based Acceleration of Biological Sequence Alignment, Computational Biology and Applied Bioinformatics, Prof. Heitor Lopes (Ed.), ISBN: 978-953-307-629-4, InTech, Available from: https://www.intechopen.com/books/computational-biology-and-appliedbioinformatics/an-overview-of-hardware-based-acceleration-of-biological-sequence-alignment
Very nice write up. Thank you for sharing!