High-Performance Ethernet Using Multiple Cores

A few years ago I had to solve a very challenging problem:

No alt text provided for this image

FreeRTOS + LWIP running on ARM, custom DSP algorithm processing packets on SHARC2, and a custom packet router running on SHARC1

I needed to RX and TX 1000 Bytes (each way) of data every 10us from a DSP algorithm over UDP while at the same time running a full TCP stack (for ssh, DHCP, and various low bandwidth control TCP sockets). The DSP took 10us to process the data, so while it was processing packet 2, packet 1 was being transmitted and packet 3 was being received.

T0 = rx packet 0

T1 = process packet 0, rx packet 1

T2 = tx packet 0,   process packet 1, rx packet 2

T3 = tx packet 1,   process packet 2, rx packet 3

T4 = tx packet 2,   process packet 3, rx packet 4

........

Where each time difference is 10us

This is all had to run on an ADI SC589 ARM+2 SHARK SOC.

No alt text provided for this image

https://www.analog.com/media/en/dsp-documentation/processor-manuals/SC58x-2158x-hrm.pdf

In order for the DSP to be able to process the data in 10us, the data had to be in on-chip L2 SRAM (running at the same clock as the SHARC) and the DSP had to have all its cycles dedicated to the algorithm. Having high-speed on-chip ram was the key to the success of this project. It allowed for the use of DMA without paying the penalty of non-cache memory access from the DSP. For the high-performance path, the non-cached DDR was only accessed by PDMA or MDMA. Only the slow path, LWIP running on the ARM had a performance hit because its buffers were in the non-cached DDR regions.

The Ethernet driver ran on SHARC1 and DMA'd the packets to and from DDR (in a region with cache disabled). The packet router (also running on SHARC1) peeked into the port field of the packet and if the packet was destined for a specific port the packet was sent to L2 SRAM using Memory to Memory DMA (MDMA). All other packets were processed by LWIP running on the ARM under FreeRTOS directly in DDR. Only the peek into DDR incurred a non-cached access hit.

The most challenging aspects of this project included tuning the MDMA configuration for maximum performance, modifying the LWIP Ethernet driver layer, and signaling (not shown in the above image). All the signaling was done with the SC589 trigger unit. A hardware signaling module that makes it easy to send signals between cores.

  1. https://www.freertos.org/
  2. https://savannah.nongnu.org/projects/lwip/
  3. https://www.analog.com/media/en/technical-documentation/application-notes/EE377v01.pdf
  4. https://www.analog.com/media/en/technical-documentation/application-notes/EE383v01.pdf




要查看或添加评论,请登录

Eric Gregori的更多文章

  • We are at the early stages of a paradigm shift similar to the move from assembly to high-level languages

    We are at the early stages of a paradigm shift similar to the move from assembly to high-level languages

    "The ability to ask the right questions and frame problems effectively will be more important than syntax knowledge."…

  • Work From Home Attributes

    Work From Home Attributes

    When asked why they want to be a software engineer, many students reply, "to work from home". As someone who has worked…

    7 条评论
  • Google C++ Style Guide Summary

    Google C++ Style Guide Summary

    I put this together for some students and I thought it would be interesting to others. A code style guide is a guide to…

    7 条评论
  • Particle Filters for Robot Localization

    Particle Filters for Robot Localization

    1 条评论
  • PCIe

    PCIe

    A PCI Express* (PCIe*) ‘link’ comprises from one to 32 lanes. Links are expressed as x1, x2, x4, x8, x16, etc.

    3 条评论
  • Designing a Q&A Agent

    Designing a Q&A Agent

    I am writing this because as I transition from my lab position to my new job, I need to purge my brain :) I have found…

    1 条评论
  • Agent Smith

    Agent Smith

    In summer 2018, I started an amazing journey by writing a short paper, "Developing a Document Trained Automated…

  • Low-Cost Computer Vision

    Low-Cost Computer Vision

    Low-Cost computer vision has been the holy grail of the sensor community for many years. Image sensors (cameras) have…

    5 条评论
  • Why is it Still so Hard to Get Documented CMOS Sensors in Low Quantities?

    Why is it Still so Hard to Get Documented CMOS Sensors in Low Quantities?

    I have written over a dozen titles for this article trying to find one that did not sound like a complaint. I gave up…

    3 条评论
  • Hardware, Firmware, and Software - The Three Bears of the Internet of Things

    Hardware, Firmware, and Software - The Three Bears of the Internet of Things

    Internet of Things is a relatively new buzzword given to an embedded device that communicates over the internet…

社区洞察

其他会员也浏览了