ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Accelerating Computing of the Future

Juan D. Deaton, Ph.D.

å‘å¸ƒæ—¥æœŸ: 2016å¹´2æœˆ18æ—¥

Having the right tool for the job is always important. Some tools, like this Wenger Giant Swiss Army Knife, provide so many different functions that it would make MacGyver drool. But if you had to tighten that loose screw in a narrow place, the Phillips screw driver would do the best job. For decades, the Swiss army knife of the computing world, the CPU, has laid the foundation for all computing infrastructure. However, todayâ€™s computing problems require more single purpose tools that provide efficient computing power.

Our Computing Legacy

Until 2008, gains in CPU performance seemed to continue forever, but since then, clock stall has been a factor in modern CPU design. With clock stall, CPU single-thread performance and frequency diverted from the previous exponential rise and â€œstalledâ€. Constraints in fabrication cost and power consumption have pushed the processor industry to focus on power efficiency instead of single-thread performance. As shown in the bottom curve in the chart below, this push began the era of multi-core processors and parallel computing to produce performance gains.

Given these trends, the emergence of big data, and armies of developers built around CPU software development, companies have built out massive data centers based on the huge populations of CPUs. In 2013, the Department of Energy estimated that U.S. data centers consumed more than 2% of all U.S. electricity use. In 2016, that number could be as high as 3%. The most appalling part is that one-third of data centersâ€™ electrical consumption is used for cooling. Analogous to the Giant Swiss Army Knife turning a screw, multipurpose CPUs are performing computing tasks inefficiently and creating massive amounts of waste heat. Thus, computing requires efficient single purposed tools, like the screw driver, for increasing performance by making computing more efficient.

Whatâ€™s in the Tool Kit?

In the computing tool kit, there are four basic device types that form the foundation of modern computing:

Central Processing Unites (CPUs)
Graphics Processing Units (GPUs)
Field Programmable Gate Arrays (FPGAs)
Application Specific Integrated Circuits (ASICs)

The CPUâ€™s main design objective has been flexibility, i.e., ease of programming. To enable flexibility, CPUs have been designed with control circuitry that can execute sequences of commands based on the CPUâ€™s instruction set. The Arithmetic Logic Units (ALUs) support arithmetic and logic on data inside the CPUâ€™s memory. When programs are compiled, they are converted into a list of instructions and memory addresses from which the CPU executes the program. For decades, software developers have taken this feature for granted.

GPUs have taken the multi-core trend to the extreme by adding thousands of smaller control cores onto a single device. Additionally, GPUs have large amounts of ALU resources compared to the CPUs. GPUs have been commonly used in graphics add-in cards and have been specifically popular with the gaming industry. In 2006, GPUs were adopted for accelerating computing because of large quantities of ALU resources and the ability to massively parallelize operations. The use of GPUs for accelerated computing has been explored for many applications.

FPGA architectures are vastly different from GPUs and CPUs. Before FPGAs, digital designers were forced to wire dozens (even hundreds) of individual logic chips to realize designs. With the invention of the configurable logic fabric, FPGAs made the wired designs of logic chips obsolete. Configurable logic created new flexible digital design capabilities not possible with the previous hard wired designs. Today, FPGAs are used heavily in communications, but are making ways into the computing market.

Because of CPU legacy, armies of software developers do not know how to program FPGAs. Programming of FPGAs is done through a Register Transfer Level (RTL) language such as Verilog or VHDL. RTL languages describe the operations of signals as they pass through the logic fabric. Unlike traditional software, which has a behavioral flow paradigm, RTL development uses a data flow paradigm that requires different methods for development and debugging. The Block RAM memory on an FPGA is also orders of magnitude less than a CPU.

The most significant advantage that FPGAs have over CPUs and GPUs is computing efficiency. In many applications, higher efficiency means faster operations that consume less energy. The efficiency advantage is a triple win for increasing performance and reducing operational/ capital expenditures. Generally, it takes less hardware to do the same job. Finally, the most unique advantage that FPGAs bring to the table is RTL designs can be used directly to build an ASIC.

ASICs, although single purposed, have the highest computing efficiency of all. An ASIC is extremely similar in function to an FPGA. However, ASICs do not have the ability to be reprogrammed. The program is burnt (wired) directly onto a piece of silicon and is packaged into a chip. Because of the startup costs, ASICs make the most sense in applications where large volumes or large performance increases are needed. Most importantly, FPGA designs can be ported to ASIC designs for increasing computing efficiency.

Tool Comparison

One area of research that has performance comparative analysis on some of these tools is bioinformatics. In bioinformatics research, DNA sequence alignment is a particularly challenging problem, where researchers have tried CPU, GPU, and FPGA implementations to increase performance of alignment algorithms. The figure below, taken from Hasan, shows spider graphs with five different evaluative parameters for CPU, GPU, and FPGA implementations. The further away from the center of the graph the better the implementation is for that parameter.

As discussed previously, CPUs are most flexible (i.e., easiest) for development, GPUs less flexible than CPUs, and FPGAs least flexible for development. While FPGAs have a flexibility disadvantage, their performance per Watt and Cost (Euro) exceeds that of both CPUs and GPUs. These advantages are so significant that Hasan ranked FPGAs as the best future prospect for hardware acceleration in bioinformatics. Hasanâ€™s prediction was correct. Since this publication in 2011, bioinformatics hardware vendors have recently emerged using both FPGA and ASIC accelerated hardware. Additionally, the computing industry has also come to the same conclusions.

The Future Toolkit: FPGAs + CPUs

The power efficiency of FPGAs has been noticed by the computing industry and has led to some interesting recent developments. The most significant development is Intelâ€™s acquisition of top FPGA vendor Altera for $16.7 Billion. The motivation: to help Intel maintain dominance in the data center market. Specifically, application accelerators (FPGAs /ASICS) meet data center performance and power usage requirements. Similarly, Alteraâ€™s rival, Xilinx, has formed a partnership with IBM. IBMâ€™s new OpenPOWER architecture has created a unique way to support accelerators through the Coherent Accelerator Processor Interface (CAPI), which allows accelerators direct access into memory of the POWER8? processor. Finally, Xilinx and Altera both now have devices with fabric that links FPGAs with ARM processors into the same device.

Designing with Tools of the Future

With CPUs and FPGAs working closer together, accelerated computing of the future will require both software developers and FPGA developers. To be most effective, software developers will need to identify the primitives of their solutions that need acceleration. The FPGA designer will then take these primitives into an accelerated RTL design. Additionally, through a seamless software application programming interface (API) and hardware driver, software developers need to rely on their old paradigms while FPGA developers give them access to accelerated primitives. In some cases, RTL designs will be made into ASICs.

Conclusions

With a revolutionary legacy, the CPU still has an important role in computing, but will need specialized applications on FPGAs to create the next jump in performance gains. Augmenting CPUâ€™s computing efficiency limitations with FPGAs is the best choice. Unlike software developers, FPGA developers are not as common, and the demand for these skills will increase. Additionally, the need for new development paradigms for working with the new CPU-FPGA interfaces and dual-devices will create a number of new unique challenges. Having the right tools is important, and the CPU and the FPGA together create the right mix of capabilities that will accelerate computing into the next generation.

About AHA

For almost three decades, Advanced Hardware Accelerators (AHA) has developed ASICs and FPGA solutions for the data compression and communications industry. AHA offers both ASICs and PCIe boards for GZIP and SSL acceleration. Using our hardware, customers have optimized workloads to reduce capital and operational expenditures. AHA is a fully capable development group for creating ASICs, boards, and FPGA core designs for a variety of applications. Visit us at www.aha.com.

Credits:

Laiq Hasan and Zaid Al-Ars (2011). An Overview of Hardware-Based Acceleration of Biological Sequence Alignment, Computational Biology and Applied Bioinformatics, Prof. Heitor Lopes (Ed.), ISBN: 978-953-307-629-4, InTech, Available from: https://www.intechopen.com/books/computational-biology-and-appliedbioinformatics/an-overview-of-hardware-based-acceleration-of-biological-sequence-alignment

Christian Rodriguez

6 å¹´

Very nice write up. Thank you for sharing!

èµž

å›žå¤

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Juan D. Deaton, Ph.D.çš„æ›´å¤šæ–‡ç«

Daniel 4: The Perils of Pride and the Power of Humility

2023å¹´12æœˆ10æ—¥

Daniel 4: The Perils of Pride and the Power of Humility

If you've been following my series on the Book of Daniel, you'll know we've journeyed through some incredible storiesâ€¦

2 æ¡è¯„è®º
Embracing Gratitude: A Year of Transformation and Triumph

2023å¹´11æœˆ20æ—¥

Embracing Gratitude: A Year of Transformation and Triumph

As the crisp autumn air ushers in the Thanksgiving season, I wanted to reflect and express gratitude for myâ€¦

1 æ¡è¯„è®º
Daniel 3: Fireproof Integrity

2023å¹´10æœˆ24æ—¥

Daniel 3: Fireproof Integrity

Have you ever been asked to compromise your ethics? Perhaps at your workplace, where certain practices are culturallyâ€¦
Daniel 2: Serve Your King with Supernatural Success

2023å¹´9æœˆ30æ—¥

Daniel 2: Serve Your King with Supernatural Success

Have you ever faced a seemingly insurmountable challenge at work? I have, more times than I care to admit. There haveâ€¦
Scorched LEO: Tantalizing Targets for Mutual Destruction

2023å¹´9æœˆ27æ—¥

Scorched LEO: Tantalizing Targets for Mutual Destruction

In the annals of space exploration and defense, General Hyten's statement from 2018 stands out as a prescient warning:â€¦
Daniel 1: Eating from the King's Table

2023å¹´9æœˆ23æ—¥

Daniel 1: Eating from the King's Table

For those who know me, you're well aware of my passion for talking about Jesus Christ. If you've ever had the privilegeâ€¦
SATCOM's Three Looming Shadows

2023å¹´9æœˆ21æ—¥

SATCOM's Three Looming Shadows

In the dimly lit corridors of the SATCOM industry, I'm feeling a palpable unease beginning to permeate the air. Theâ€¦

9 æ¡è¯„è®º
Five Reasons SATCOM's Future Relies on Bandwidth Efficiency

2023å¹´6æœˆ7æ—¥

Five Reasons SATCOM's Future Relies on Bandwidth Efficiency

Why is Bandwidth Efficiency Still Essential to the SATCOM Industry? In recent years, a growing narrative within theâ€¦

2 æ¡è¯„è®º
Unifying the Satellite Industry with DIFI

2023å¹´5æœˆ22æ—¥

Unifying the Satellite Industry with DIFI

Why is a Digital IF Standard Needed Now? In today's rapidly evolving satellite communications landscape, two keyâ€¦
How Cubesats are Changing the Way We Use Satellites

2016å¹´1æœˆ5æ—¥

How Cubesats are Changing the Way We Use Satellites

In the movie Wall-e, there is scene were the Earth is surrounded by billions of satellites, decorating the blue orbâ€¦

1 æ¡è¯„è®º

See all articles

ç¤¾åŒºæ´žå¯Ÿ

Operating Systems

Here's how you can optimize operating system performance through CPU scheduling.

Accelerating Computing of the Future

Juan D. Deaton, Ph.D.

Our Computing Legacy

Whatâ€™s in the Tool Kit?

Tool Comparison

The Future Toolkit: FPGAs + CPUs

Designing with Tools of the Future

Conclusions

About AHA

Credits:

Juan D. Deaton, Ph.D.çš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Dedicated CPU Vs Shared vCPUs

From Silicon to Quantum: Unlocking the Secrets of Modern Computing

Dedicated CPU Vs Shared vCPUs

Unlocking CPU Performance: Strategies to Minimize Pipeline Deadlocks and Instruction Latency, part III

Revolutionizing Cloud and Edge Computing with Unified RISC-V Architecture

Dedicated CPU Vs Shared vCPUs

Re-RAM - RAM and SSDs will unite!

CPU Computing: The Old Dog Driving Todayâ€™s Tech

Paderborn Center for Parallel Computing joins AMD's Heterogeneous Accelerated Compute Clusters (HACC) initiative

Explain Different Types of Control Units in CPU With Functions

Our Computing Legacy

Whatâ€™s in the Tool Kit?

Tool Comparison

The Future Toolkit: FPGAs + CPUs

Designing with Tools of the Future

Conclusions

About AHA

Credits:

Juan D. Deaton, Ph.D.çš„æ›´å¤šæ–‡ç«

Daniel 4: The Perils of Pride and the Power of Humility

Embracing Gratitude: A Year of Transformation and Triumph

Daniel 3: Fireproof Integrity

Daniel 2: Serve Your King with Supernatural Success

Scorched LEO: Tantalizing Targets for Mutual Destruction

Daniel 1: Eating from the King's Table

SATCOM's Three Looming Shadows

Five Reasons SATCOM's Future Relies on Bandwidth Efficiency

Unifying the Satellite Industry with DIFI

How Cubesats are Changing the Way We Use Satellites

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Dedicated CPU Vs Shared vCPUs

From Silicon to Quantum: Unlocking the Secrets of Modern Computing

Dedicated CPU Vs Shared vCPUs

Unlocking CPU Performance: Strategies to Minimize Pipeline Deadlocks and Instruction Latency, part III

Revolutionizing Cloud and Edge Computing with Unified RISC-V Architecture

Dedicated CPU Vs Shared vCPUs

Re-RAM - RAM and SSDs will unite!

CPU Computing: The Old Dog Driving Todayâ€™s Tech

Paderborn Center for Parallel Computing joins AMD's Heterogeneous Accelerated Compute Clusters (HACC) initiative

Explain Different Types of Control Units in CPU With Functions

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†