Do we need next gen HPCs through high end connectivity and unique core abilities and novel algorithms to achieve the earliest possible time to market?
VHPC

Do we need next gen HPCs through high end connectivity and unique core abilities and novel algorithms to achieve the earliest possible time to market?

Technology Pendamic pushes heavley towards benefits of process technology and tech scaling is getting reduced. The benefits of a new technology/solutions alone are often inadequate to justify the development costs of a next generation device, forcing more aggressive innovations at the architectural and system levels. With the recent explosion of data and surge of ML and AI applications,Copilot, the needfnor high perfomance compute has been increasing steeply. Due to the high costs for lower nanometer wafer technology nodes and the continually changing requirements in these pendamic, developing ASICs for these markets is challenging.

Next gen user experiences demand superior efficiency, higher bandwidth, excellent processing, outstanding scalability, and compact integrated ICs. Other essentials include compact superior performance compute HPC brain, thin latency storage, and that surpass traditional technologies!! Development front, to satisfy such rigorous bandwidth applications market requirements, must plan support for Small Form-factor Pluggable (SFP), Flex Ethernet (FlexE), and other emerging interface standards.

Additionally., ADAS and AD video streaming functions fired bandwidth demands supplemented by full customer acceptance of several always-connected devices and smart infrastructure. In line with this change., demand for supportive areas like cloud services continue to gain traction, additionally, connectivity components and smart antennas and modules are also raising steeply.

What is happening behind the screen?

Fluctuations in semiconductor process frequency scaling have pushed the standard computing element to become progressively parallel. The semiconductor industry continues to explore alternates for classic domain-specific architectures. This including those previously consigned to vector-based processes (DSPs, GPUs), fully parallel programmable hardware (FPGAs), and other discrete extreme performance segments. This brings new dimensions of thinking, approaches, new structure with unique configurations into to the architecture.??

The needs driving the change from the traditional CPU based compute model, explores the other options like heterogeneous compute platform.

It should address few basic aspects like, …

1. Software Programmability—The ability to quickly develop optimized applications through software-abstracted toolchains.

2. Acceleration—Metrics for a wide range of applications from artificial intelligence, smart network interface cards, high density storage, 5G wireless, self-driving cars, advanced modular radar, and terabit optical networks.

3. Dynamically Adaptable Reconfiguration—The ability to reconfigure the hardware to accelerate new loads within milliseconds.

What we need?

A fully adoptive SaaP/-S, HaaP/-S model with Heterogeneous Compute Platform that combines Scalar Engines, Adaptable Engines, and Intelligent Engines to achieve dramatic performance improvements of up to 100X over today's fastest HPC implementations.? The new architecture should produce/give a dramatic improvement in ease of use in to the current environment. It should provides a fully integrated, memory-mapped platform for programming through a unified toolchain. Versatile High Performance Compute platform (VHPC) could be the right fit.? Traditional hardware developers can still port their existing RTL to VHCP via the traditional RTL entry flow.

Current Challenges

Recent technical challenges in the semiconductor process prevent scaling of the traditional “One Size Fits All” CPU scalar compute engine. Changes in semiconductor process frequency scaling have forced the standard computing element to become increasingly parallel. As a result, solution development and semiconductor industry both exploring alternate for domain-specific architectures!! Which including current relegated to specific extreme performance segments such as vector-based processing (DSPs, GPUs) and fully parallel programmable hardware (FPGAs).

?Which architecture is best for which task?

  1. Scalar Processing Elements (SPE): Very efficient at complex algorithms with various decision trees and a broad set of libraries but are limited in performance scaling.
  2. Vector processing elements (VPE):? More efficient at a narrower set of parallelizable compute functions, but they experience latency and efficiency drawbacks because of inflexible memory hierarchy.
  3. Programmable logic :? can be precisely customized to a particular compute function, which makes them best at latency-critical real-time applications (e.g., ADAS) and irregular data structures, but algorithmic changes have traditionally taken hours to compile against minutes.

Dynamic Reconfiguration

Certain cost sensitive, real-time applications can benefit from utilizing the device's inherent programmability to multiplex one set of programmable hardware between multiple logical functions with sub-millisecond Adaptable Engine partial reprogramming time. In the Data Center, this allows Versal VHPC devices to perform a much wider array of functions traditionally performed by a CPU when compared to a more limited vector processor like a GPU.

One of the biggest advantages of programmable logic is the ability to reconfigure memory hierarchy and thus to optimize for different compute loads. For example, even within the scope of neural networks focused on image recognition, the memory footprint and compute operations per image vary widely depending on the algorithm. Programmable memory hierarchy allows the programmable logic to be adjusted to optimize compute efficiency for each network it supports.

Automotive Driver Assist (ADAS)

Today’s automotive ADAS/AD systems are demanding an increasing number of HD cameras. Compute requirements scale with pixels, which means an image from an HD camera (1080x1920) requires significantly more compute vs. a Data Center standard range of 200x to 400x image. The AI processing performance required for sensor fusion is relatively uncertain when compared to the AI performance required for perception. The Convolutional Neural Network (CNN) enhances the performance via sensor fusion, this is a form of a deep learning neural network commonly used in computer vision. While the AI processing required to address aspects of sensor fusion may be lightweight, addressing other aspects of sensor fusion are quite complex.

With all the innovation occurring in the automotive space, it is important to choose a processing device portfolio that offers HS & SW portability and scalability across multiple platforms. It covers wide power range starting from xW windscreen mounted front camera designs to xxW in-cabin central modules to xxxW+ liquid cooled boot mounted HPCs, all with the same programming model.

What about connectivity?

High speed data processing via highly integrated multi-core compute, dedicated DSP & AI engines enabling up to ADAS level 5

? Memory Bandwidth reaching to 300GB/s and 500GB/s? for level 4 and 5, respectively

? High performance/high bandwidth, distributed across chipsets is required to achieve the necessary TOPS for ADAS level 3-5

System Demand

Response time is an exclusively critical for processing performance factor when considering vehicles traveling at automotive speeds. At 100KPH/60MPH, tiny difference in microseconds in reaction time of different ADAS systems can have a significant impact on a system’s effectiveness.

AD technology demands multiple neural networks., which might need to be chained together in sequence to perform complex tasks, worsening the issues with GPU executions dependent on high batch sizes. Therefore, need of optimized the AI Edge series to operate at extremely high efficiency at low batch sizes.

High reliability and thermally constrained systems in automotive : Should designed up front to mitigate Single/Multi Event Upset effects and operate up to a temperature of 125°C. Combined with., a focus on Machine Vision (MV) and Machine Learning (ML), a heritage of reliability and quality were attained for ADAS and future AD technology.

VHPCs merges power efficient Scalar Engine with multi-core Cortex, programmable I/O, and low latency, intelligent AI Engines that enable higher functionally safety. Additionally., AI-enhanced AD solutions with extremist machine learning performance vs. today’s FPGA-based ASIL-C/D certified ADAS solutions.

Furthermore, the ability to reprogram the entire device via OTA and hardware updates improves system in-field adaptability, which adds customer value and next gen user experiences.

Summary

Next gen architecture should tightly integrate with programmable fabric, CPUs, and software-programmable acceleration engines into a single device that enables higher levels of software abstraction, enabling more rapid development of hardware accelerators that solve next generation problems. AI Engines represent a new class of High Performance Computing (HPC). The AI Engine, integrated within a Versal-class device, can be optimally combined with processor subsystem (PS) & programmable logic (PL) to implement high-complexity systems in a single pack. AI Engines deliver three to eight times better silicon area compute density when compared with traditional programmable logic DSP and ML implementations while reducing power consumption by half.

要查看或添加评论,请登录

Tamilarasu S的更多文章

社区洞察

其他会员也浏览了