Vol.9: What shall we do with all the PCIe speed in Embedded?
PCI Express keeps evolving to ever increasing bitrates. The current specification under development, PCIe 7.0 will introduce 128 giga-transfers per second (GT/s) that’s more than 50x the original PCIe speed of 2.5GT/s. How can such sheer IO bandwidth be put to use in system designs?
Even if we look at what’s offered by today’s silicon, PCIe gen. 5 at 32GT/s per lane, it’s really a lot of bandwidth, especially as PCIe interface are deployed at x4, x8, x16 lanes wide. A gen.5 x16 link can transport up to 64GByte/s, that’s 10x the data rate of an 8k UHD image at 60fps. Wow. That’s great for high end graphics and streaming applications. Also, IO- and acceleration-heavy workloads in datacenter and cloud environments can benefit.
If we look at industrial use cases, even the most demanding AI inferencing is typically limited by compute and neural network processing power. Doubling IO bandwidth from PCIe gen 4 to gen 5 does not yield significant improvements. Well, if you look at it as an isolated feature, only.
At system level, the amount of available PCIe lanes on mobile and workstation processors like intel’s Core and AMD’s Ryzen families is limited. And there’s two class of PCIe interfaces: higher bandwidth interfaces tightly coupled into the onchip fabrics close to the processors and cache memories and lower performance interfaces that are spun off from the chipset portion of the silicon. The later ones not only have some limitations in respect to throughput and access latency but also are less deterministic: as these interfaces share some chip fabrics with interfaces such as USB, SATA and even serial ports, latency jitters much more than on the PCIe links close to the CPU.
Typically, around 20 high performance PCIe lanes are supported on those processors (Only more expensive and higher wattage server processors have high speed PCIe in abundance). 4 out of 20 lanes are usually used to interface an NVMe SSD, leaving 16 lanes for platform IO. Industrial data processing, in a generalized architecture, will include a data ingest (let’s call that a digitizer for simplicity), a heterogenous processing (e.g. the combination of x86 cores and GPU resources on a PCIe card) and then the presentation of results, be it as image on a display or sharing of results over a network interface or direct control actions.
And that’s where higher speed PCIe comes to play: In previous gen. platforms, the 16 lanes may have been eaten by the GPU card, pushing the digitizer down to the less favorable PCIe lanes from the chipset. Today, the 16 high speed / low latency lanes available can be equally be partitioned between the digitizer and the accelerator like a GPU. This will result in higher throughput and more determinism of the solution and avoids the cost and power penalties inherent to server platforms.
Yet, many accelerator cards still come in x16 form factors and cannot be installed in x8 PCIe slots. That’s why on industrial computer boards, including those from Advantech, of course, you’ll find PCIex16 card slots that only have a physical x8 connection. But you’ll find two of these slots: one for the accelerator and one for the digitizer. Smart. Gives you more band for the buck at system level.
#Advantech #PCIe #PCIExpress #GPU #EmbeddedSystems #PCIe7 #HighSpeedIO
Principal Backplane Architect, Hobby Machinist at Retired
1 周Wow, "The current specification under development, PCIe 7.0 will introduce 128 giga-transfers per second (GT/s) that’s more than 50x the original PCIe speed of 2.5GT/s". Well I guess a backplane path is not in the cards.