Do we need next gen HPCs through high end connectivity and unique core abilities and novel algorithms to achieve the earliest possible time to market?
Technology Pendamic pushes heavley towards benefits of process technology and tech scaling is getting reduced. The benefits of a new technology/solutions alone are often inadequate to justify the development costs of a next generation device, forcing more aggressive innovations at the architectural and system levels. With the recent explosion of data and surge of ML and AI applications,Copilot, the needfnor high perfomance compute has been increasing steeply. Due to the high costs for lower nanometer wafer technology nodes and the continually changing requirements in these pendamic, developing ASICs for these markets is challenging.
Next gen user experiences demand superior efficiency, higher bandwidth, excellent processing, outstanding scalability, and compact integrated ICs. Other essentials include compact superior performance compute HPC brain, thin latency storage, and that surpass traditional technologies!! Development front, to satisfy such rigorous bandwidth applications market requirements, must plan support for Small Form-factor Pluggable (SFP), Flex Ethernet (FlexE), and other emerging interface standards.
Additionally., ADAS and AD video streaming functions fired bandwidth demands supplemented by full customer acceptance of several always-connected devices and smart infrastructure. In line with this change., demand for supportive areas like cloud services continue to gain traction, additionally, connectivity components and smart antennas and modules are also raising steeply.
What is happening behind the screen?
Fluctuations in semiconductor process frequency scaling have pushed the standard computing element to become progressively parallel. The semiconductor industry continues to explore alternates for classic domain-specific architectures. This including those previously consigned to vector-based processes (DSPs, GPUs), fully parallel programmable hardware (FPGAs), and other discrete extreme performance segments. This brings new dimensions of thinking, approaches, new structure with unique configurations into to the architecture.??
The needs driving the change from the traditional CPU based compute model, explores the other options like heterogeneous compute platform.
It should address few basic aspects like, …
1. Software Programmability—The ability to quickly develop optimized applications through software-abstracted toolchains.
2. Acceleration—Metrics for a wide range of applications from artificial intelligence, smart network interface cards, high density storage, 5G wireless, self-driving cars, advanced modular radar, and terabit optical networks.
3. Dynamically Adaptable Reconfiguration—The ability to reconfigure the hardware to accelerate new loads within milliseconds.
What we need?
A fully adoptive SaaP/-S, HaaP/-S model with Heterogeneous Compute Platform that combines Scalar Engines, Adaptable Engines, and Intelligent Engines to achieve dramatic performance improvements of up to 100X over today's fastest HPC implementations.? The new architecture should produce/give a dramatic improvement in ease of use in to the current environment. It should provides a fully integrated, memory-mapped platform for programming through a unified toolchain. Versatile High Performance Compute platform (VHPC) could be the right fit.? Traditional hardware developers can still port their existing RTL to VHCP via the traditional RTL entry flow.
Current Challenges
Recent technical challenges in the semiconductor process prevent scaling of the traditional “One Size Fits All” CPU scalar compute engine. Changes in semiconductor process frequency scaling have forced the standard computing element to become increasingly parallel. As a result, solution development and semiconductor industry both exploring alternate for domain-specific architectures!! Which including current relegated to specific extreme performance segments such as vector-based processing (DSPs, GPUs) and fully parallel programmable hardware (FPGAs).
?Which architecture is best for which task?
领英推荐
Dynamic Reconfiguration
Certain cost sensitive, real-time applications can benefit from utilizing the device's inherent programmability to multiplex one set of programmable hardware between multiple logical functions with sub-millisecond Adaptable Engine partial reprogramming time. In the Data Center, this allows Versal VHPC devices to perform a much wider array of functions traditionally performed by a CPU when compared to a more limited vector processor like a GPU.
One of the biggest advantages of programmable logic is the ability to reconfigure memory hierarchy and thus to optimize for different compute loads. For example, even within the scope of neural networks focused on image recognition, the memory footprint and compute operations per image vary widely depending on the algorithm. Programmable memory hierarchy allows the programmable logic to be adjusted to optimize compute efficiency for each network it supports.
Automotive Driver Assist (ADAS)
Today’s automotive ADAS/AD systems are demanding an increasing number of HD cameras. Compute requirements scale with pixels, which means an image from an HD camera (1080x1920) requires significantly more compute vs. a Data Center standard range of 200x to 400x image. The AI processing performance required for sensor fusion is relatively uncertain when compared to the AI performance required for perception. The Convolutional Neural Network (CNN) enhances the performance via sensor fusion, this is a form of a deep learning neural network commonly used in computer vision. While the AI processing required to address aspects of sensor fusion may be lightweight, addressing other aspects of sensor fusion are quite complex.
With all the innovation occurring in the automotive space, it is important to choose a processing device portfolio that offers HS & SW portability and scalability across multiple platforms. It covers wide power range starting from xW windscreen mounted front camera designs to xxW in-cabin central modules to xxxW+ liquid cooled boot mounted HPCs, all with the same programming model.
What about connectivity?
High speed data processing via highly integrated multi-core compute, dedicated DSP & AI engines enabling up to ADAS level 5
? Memory Bandwidth reaching to 300GB/s and 500GB/s? for level 4 and 5, respectively
? High performance/high bandwidth, distributed across chipsets is required to achieve the necessary TOPS for ADAS level 3-5
System Demand
Response time is an exclusively critical for processing performance factor when considering vehicles traveling at automotive speeds. At 100KPH/60MPH, tiny difference in microseconds in reaction time of different ADAS systems can have a significant impact on a system’s effectiveness.
AD technology demands multiple neural networks., which might need to be chained together in sequence to perform complex tasks, worsening the issues with GPU executions dependent on high batch sizes. Therefore, need of optimized the AI Edge series to operate at extremely high efficiency at low batch sizes.
High reliability and thermally constrained systems in automotive : Should designed up front to mitigate Single/Multi Event Upset effects and operate up to a temperature of 125°C. Combined with., a focus on Machine Vision (MV) and Machine Learning (ML), a heritage of reliability and quality were attained for ADAS and future AD technology.
VHPCs merges power efficient Scalar Engine with multi-core Cortex, programmable I/O, and low latency, intelligent AI Engines that enable higher functionally safety. Additionally., AI-enhanced AD solutions with extremist machine learning performance vs. today’s FPGA-based ASIL-C/D certified ADAS solutions.
Furthermore, the ability to reprogram the entire device via OTA and hardware updates improves system in-field adaptability, which adds customer value and next gen user experiences.
Summary
Next gen architecture should tightly integrate with programmable fabric, CPUs, and software-programmable acceleration engines into a single device that enables higher levels of software abstraction, enabling more rapid development of hardware accelerators that solve next generation problems. AI Engines represent a new class of High Performance Computing (HPC). The AI Engine, integrated within a Versal-class device, can be optimally combined with processor subsystem (PS) & programmable logic (PL) to implement high-complexity systems in a single pack. AI Engines deliver three to eight times better silicon area compute density when compared with traditional programmable logic DSP and ML implementations while reducing power consumption by half.