Accelerated Computing Series Part 4: Smart Camera with NLP on FPGA & Linux
Neeraj Kumar, PhD
AI Scientist | ex-CTO@iRFMW, Huawei, BertLabs | IISc | Reconf. Edge Compute (FPGAs/GPUs), RTOS/Linux | Sensor Fusion | Radar, SDR | Cognitive Systems
Building on the heels of the last blog, here’s a quick demonstration of an application that leverages the DPU accelerator IP we implemented last time. A smartcam application is demonstrated that takes in a video feed from a USB camera, applies computer vision algorithms running on the PL for face/object detection, and puts bounding boxes around any detection. The stream with overlaid boxes is then sent to a monitor via displayport. Meanwhile, a keyword spotting (KWS) algorithm runs on the PS that parses commands to dynamically switch between multiple tasks and updating the display.
In Part 3 , we implemented different configurations for the DPUCZDX8G DPU in the PL and built Petalinux along with several supporting Vitis-AI packages and an app in the rootfs to test them out. But Vitis is much more than that. It provides a plethora of open source accelerated libraries for math, statistics, linear algrebra, and DSP. Among the many domain specific libraries, we’re going to focus on the ones for vision and image processing here.
The development flow in the Vitis ecosystem is depicted below (see Vitis Unified Software Platform Documentation):
Up until Part 3, we were using the embedded development flow where a ‘fixed’ XSA (hardware handoff) file was generated by Vivado and packaged by Petalinux. On the other hand, the application acceleration development flow in Vitis is based on an ‘extensible’ XSA generated by Vivado, which allows for acceleration kernels. We already saw an example of an acceleration kernel, DPUCZDX8G. Others could be generated in Vitis-HLS (C/C++) compiled into a .xo kernel and linked by v++ with several other kernels. There are many others in the accelerated libraries mentioned above. The following picture depicts the key differences in the embedded vs acceleration development flow.
The v++ linker prepares a .xclbin file from the .xo kernels, bitstream and other metadata, which the host application uses to load the FPGA and interface with the kernel functions.
We’ll see our DPU fits in this flow for this design: Camera → Accelerator → Display. Our design is based on a reference design involving the Kria KV260 platform, but ported to ZCU106. The KV260 reference design uses the DPU with B3136 architecture, so we’ll use it as it is (you are free to modify it to your liking though). It also includes two camera channels, one for RPi camera with IMX219 sensor, and another for AR1335 camera sensor with AP1302 ISP, all on the carrier board. These two channels pass through two MIPI CSI 2 IP instances that also implement ISP pipeline acceleration kernels implemented in HLS C++ and compiled into .xo kernels before integrating with the rest of the design.
Following is the KV260 hardware and processing flow architecture.
The blue & red blocks represent IP implementation on PL, and black blocks represent stuff implemented on PS. Pre-processing involves resizing and quantization of images before they can be passed to the DPU, while post-processing involves merging metadata, bounding box, KWS inference onto the image and passing to the DP/HDMI splitter (there’s just DP on ZCU106, no splitter though)
Since we don’t have these sensors/ISP on ZCU106, I won’t be implementing them in my design. However, the design does support USB webcam and microphone (for NLP commands), so that’ll serve our purpose. Therefore, the only accelerator IP in our design is the DPU.
You can get the reference design here:
$ git clone --recursive https://github.com/Xilinx/kria-vitis-platforms.git -b xlnx_rel_v2022.1
We’ll be porting the following platform:
kria-vitis-platforms/kv260/platforms/vivado/kv260_ispMipiRx_rpiMipiRx_DP/
I’ll just be giving general guidelines on the porting as it should be pretty straightforward:
Once all this is done, we’re ready for the build.
Building the project
$ source <path_to_petalinux>/2022.1/settings.sh
$ source <path_to_vivado>/2022.1/settings64.sh
Go to your board directory (e.g., kria-vitis-platforms/kv260/ or zcu106-smartcam/zcu106 if you renamed the directory structure like this) and run make:
$ make petalinux OVERLAY=nlp-smartvision
The process will build a Vivado project at the following location:
~/zcu106-smartcam/zcu106/overlays/examples/nlp-smartvision/binary_container_1/link/vivado/vpl/prj’
Here’s what my project block design looks like (the DPU is highlighted):
The design is very similar to our DPU reference design in Part 3, except there all the clocking, interconnects, and the DPU were grouped into a hierarchical block.
You’ll find the Linux images in the ‘./petalinux/zcu106_usbCamRx_DP/images/linux’ directory. Prepare the sdcard as usual.
Connect a monitor to the displayport connector onboard, connect a USB hub to the sole USB connector to connect both camera and mic (also keyboard and mouse if you wish). Luckily for me, my camera also has a mic built in. Boot up the board, after a while you should see the following GUI on the monitor:
In your UART console, you can type v4l2-ctl --list-devices to find your camera listed:
zcu106usbCamRxDP20221:~$ v4l2-ctl --list-devices
C922 Pro Stream Webcam (usb-xhci-hcd.1.auto-1):
/dev/video0
/dev/video1
/dev/media0
Build the App
zcu106usbCamRxDP20221:~$ git clone https://github.com/Xilinx/nlp-smartvision -b xlnx_rel_v2022.1
zcu106usbCamRxDP20221:~$ cd nlp-smartvision
zcu106usbCamRxDP20221:~$ mkdir -p build/install && cd build && cmake ../ && make && make DESTDIR=./install install
Install & Run the App
zcu106usbCamRxDP20221:~$ sudo cp -r install/* /
zcu106usbCamRxDP20221:~$ export PATH="/opt/xilinx/kv260-nlp-smartvision/bin:$PATH"
Before running, we need to switch from the GUI displayed above to a CLI:
zcu106usbCamRxDP20221:~$ sudo systemctl isolate multi-user.target
zcu106usbCamRxDP20221:~$ nlp-smartvision -u
If you see the following error:
... Check failed: fd_ > 0 (-1 vs. 0) , open(/run/media/mmcblk0p1/dpu.xclbin) failed.
you need to remount the sdcard boot partition:
领英推荐
zcu106usbCamRxDP20221:~$ sudo mount /dev/mmcblk0p1 /run/media/mmcblk0p1
The App looks for the dpu.xclbin file at the /run/media/mmcblk0p1 mount point.
You should now start seeing detections on the screen. Here’s what I got:
1. The model detects a person, chair and a potted plant.
2. For a typical ADAS application, the model correctly identifies all the cars and a pedestrian.
But it’s not always without pitfalls. In this case it identified the rear view mirror as a dog.
3. Following is a demonstration of NLP in action for face detection task. Three keywords are used, LEFT: Bounding boxes on left side only, RIGHT: Bounding boxes on right side only, and STOP: Reset display settings to default.
A full list of supported commands is given below:
While you observe the video feed getting updated, the serial terminal would display the detected keywords.
Keyword Detected : "Down" Task : Switched to Object Detect
Keyword Detected : "Left" Task : Display Processed Result for only left side of screen
Keyword Detected : "Down" Task : Switched to Face Detect
Keyword Detected : "Up" Task : Switched to Object Detect
Keyword Detected : "Down" Task : Switched to Face Detect
Keyword Detected : "Left" Task : Display Processed Result for only left side of screen
Keyword Detected : "Right" Task : Display Processed Result for only Right side of screen
Keyword Detected : "Left" Task : Display Processed Result for only left side of screen
Keyword Detected : "Stop" Task : Stop Current Settings And reset to Default
Keyword Detected : "Left" Task : Display Processed Result for only left side of screen
4. Finally face detection and tracking is demonstrated in the following videos.
This application uses the following models on the DPU, dynamically switched based on the UP/DOWN keywords:
Face Detection - Network model: cf_densebox_wider_360_640_1.11G_1.2
Object Detection - Network model: dk_yolov2_voc_448_448_0.77_7.82G_1.2
Number Plate Detection - Network model: cf_plate-detection_320_320_0.49G_1
In summary, the reference design has the following architecture:
It involves multiple pipelines stages:
1) Capture pipeline
2) Audio processing pipeline
3) Video processing pipeline
4) Output pipeline
In line with the theme of this series, you can see all the interfaces AXIS, AXI-MM, and AXI-Lite being used according to their roles.
The KWS model for the NLP task is based on Hello Edge, trained on one second audio clips of 10 keywords (Yes, No, Off, On, Up, Down, Left, Right, Stop, Go) from the open sourced Google Command Dataset.
I chose to demonstrate this particular application to highlight the remarkable possibilities FPGAs bring to the edge. You can find more details about it on the Kria NLP-SmartVision webpage.
That’s it for now, hope you found this application fascinating. Next, I plan to get into another beautiful topic for this series: DSP with FPGAs. The applications are endless, but I’ll try to keep it aligned with the theme of this series (data plane, control plane and Linux drivers).
Until then, bye bye!
Other parts in this series:
My previous related series: