Accelerated Computing Series Part 4: Smart Camera with NLP on FPGA & Linux

Accelerated Computing Series Part 4: Smart Camera with NLP on FPGA & Linux

Building on the heels of the last blog, here’s a quick demonstration of an application that leverages the DPU accelerator IP we implemented last time. A smartcam application is demonstrated that takes in a video feed from a USB camera, applies computer vision algorithms running on the PL for face/object detection, and puts bounding boxes around any detection. The stream with overlaid boxes is then sent to a monitor via displayport. Meanwhile, a keyword spotting (KWS) algorithm runs on the PS that parses commands to dynamically switch between multiple tasks and updating the display.

In Part 3 , we implemented different configurations for the DPUCZDX8G DPU in the PL and built Petalinux along with several supporting Vitis-AI packages and an app in the rootfs to test them out. But Vitis is much more than that. It provides a plethora of open source accelerated libraries for math, statistics, linear algrebra, and DSP. Among the many domain specific libraries, we’re going to focus on the ones for vision and image processing here.

The development flow in the Vitis ecosystem is depicted below (see Vitis Unified Software Platform Documentation):


Refer ug1393-vitis-application-acceleration

Up until Part 3, we were using the embedded development flow where a ‘fixed’ XSA (hardware handoff) file was generated by Vivado and packaged by Petalinux. On the other hand, the application acceleration development flow in Vitis is based on an ‘extensible’ XSA generated by Vivado, which allows for acceleration kernels. We already saw an example of an acceleration kernel, DPUCZDX8G. Others could be generated in Vitis-HLS (C/C++) compiled into a .xo kernel and linked by v++ with several other kernels. There are many others in the accelerated libraries mentioned above. The following picture depicts the key differences in the embedded vs acceleration development flow.


Refer ug1393-vitis-application-acceleration

The v++ linker prepares a .xclbin file from the .xo kernels, bitstream and other metadata, which the host application uses to load the FPGA and interface with the kernel functions.

We’ll see our DPU fits in this flow for this design: Camera → Accelerator → Display. Our design is based on a reference design involving the Kria KV260 platform, but ported to ZCU106. The KV260 reference design uses the DPU with B3136 architecture, so we’ll use it as it is (you are free to modify it to your liking though). It also includes two camera channels, one for RPi camera with IMX219 sensor, and another for AR1335 camera sensor with AP1302 ISP, all on the carrier board. These two channels pass through two MIPI CSI 2 IP instances that also implement ISP pipeline acceleration kernels implemented in HLS C++ and compiled into .xo kernels before integrating with the rest of the design.

Following is the KV260 hardware and processing flow architecture.


https://github.com/Xilinx/nlp-smartvision

The blue & red blocks represent IP implementation on PL, and black blocks represent stuff implemented on PS. Pre-processing involves resizing and quantization of images before they can be passed to the DPU, while post-processing involves merging metadata, bounding box, KWS inference onto the image and passing to the DP/HDMI splitter (there’s just DP on ZCU106, no splitter though)

Since we don’t have these sensors/ISP on ZCU106, I won’t be implementing them in my design. However, the design does support USB webcam and microphone (for NLP commands), so that’ll serve our purpose. Therefore, the only accelerator IP in our design is the DPU.

You can get the reference design here:

$ git clone --recursive https://github.com/Xilinx/kria-vitis-platforms.git -b xlnx_rel_v2022.1        

We’ll be porting the following platform:

kria-vitis-platforms/kv260/platforms/vivado/kv260_ispMipiRx_rpiMipiRx_DP/

I’ll just be giving general guidelines on the porting as it should be pretty straightforward:

  1. Delete the contents of ./xdc/pin.xdc (except the last line) since these constraints apply to KV260, not ZCU106. You may want to edit this file for correct pins if you plan to add these cameras via FMC.
  2. Replace all references to kv260 board to zcu106 in ./scripts/main.tcl.
  3. The ./scripts/config_bd.tcl is where the base design is created, the DPU will be overlaid onto it at a later stage. All the PS initialization, clock settings, interconnects, interfaces & port connections, address segment assignments, and MIPI IP setup happen here. Most of the stuff here except the PS and clock config can be removed if you just intend to connect USB camera.
  4. If you rename your directory structure to match zcu106 instead of kv260, then make sure the nomenclature is reflected at the appropriate places in all the Makefiles.

Once all this is done, we’re ready for the build.

Building the project

$ source <path_to_petalinux>/2022.1/settings.sh
$ source <path_to_vivado>/2022.1/settings64.sh        

Go to your board directory (e.g., kria-vitis-platforms/kv260/ or zcu106-smartcam/zcu106 if you renamed the directory structure like this) and run make:

$ make petalinux OVERLAY=nlp-smartvision        

The process will build a Vivado project at the following location:

~/zcu106-smartcam/zcu106/overlays/examples/nlp-smartvision/binary_container_1/link/vivado/vpl/prj’        

Here’s what my project block design looks like (the DPU is highlighted):


The design is very similar to our DPU reference design in Part 3, except there all the clocking, interconnects, and the DPU were grouped into a hierarchical block.

You’ll find the Linux images in the ‘./petalinux/zcu106_usbCamRx_DP/images/linux’ directory. Prepare the sdcard as usual.

Connect a monitor to the displayport connector onboard, connect a USB hub to the sole USB connector to connect both camera and mic (also keyboard and mouse if you wish). Luckily for me, my camera also has a mic built in. Boot up the board, after a while you should see the following GUI on the monitor:


In your UART console, you can type v4l2-ctl --list-devices to find your camera listed:

zcu106usbCamRxDP20221:~$ v4l2-ctl --list-devices
C922 Pro Stream Webcam (usb-xhci-hcd.1.auto-1):
	/dev/video0
	/dev/video1
	/dev/media0        

Build the App

zcu106usbCamRxDP20221:~$ git clone https://github.com/Xilinx/nlp-smartvision -b xlnx_rel_v2022.1
zcu106usbCamRxDP20221:~$ cd nlp-smartvision 
zcu106usbCamRxDP20221:~$ mkdir -p build/install && cd build && cmake ../ && make && make DESTDIR=./install install        

Install & Run the App

zcu106usbCamRxDP20221:~$ sudo cp -r install/* /
zcu106usbCamRxDP20221:~$ export PATH="/opt/xilinx/kv260-nlp-smartvision/bin:$PATH"        

Before running, we need to switch from the GUI displayed above to a CLI:

zcu106usbCamRxDP20221:~$ sudo systemctl isolate multi-user.target
zcu106usbCamRxDP20221:~$ nlp-smartvision -u        

If you see the following error:

... Check failed: fd_ > 0 (-1 vs. 0) , open(/run/media/mmcblk0p1/dpu.xclbin) failed.        

you need to remount the sdcard boot partition:

zcu106usbCamRxDP20221:~$ sudo mount /dev/mmcblk0p1 /run/media/mmcblk0p1        

The App looks for the dpu.xclbin file at the /run/media/mmcblk0p1 mount point.

You should now start seeing detections on the screen. Here’s what I got:

1. The model detects a person, chair and a potted plant.

Multiple targets detected & identified.

2. For a typical ADAS application, the model correctly identifies all the cars and a pedestrian.

Image captured by camera is processed by DPU and displayed on screen.
All cars and the pedestrian detected correctly.

But it’s not always without pitfalls. In this case it identified the rear view mirror as a dog.

Rear view mirror identified as dog at certain positions.

3. Following is a demonstration of NLP in action for face detection task. Three keywords are used, LEFT: Bounding boxes on left side only, RIGHT: Bounding boxes on right side only, and STOP: Reset display settings to default.

A full list of supported commands is given below:

While you observe the video feed getting updated, the serial terminal would display the detected keywords.

Keyword Detected : "Down" 	 Task : Switched to Object Detect 
Keyword Detected : "Left" 	 Task : Display Processed Result for only left side of screen
Keyword Detected : "Down" 	 Task : Switched to Face Detect 
Keyword Detected : "Up" 	 Task : Switched to Object Detect 
Keyword Detected : "Down" 	 Task : Switched to Face Detect 
Keyword Detected : "Left" 	 Task : Display Processed Result for only left side of screen
Keyword Detected : "Right" 	 Task : Display Processed Result for only Right side of screen
Keyword Detected : "Left" 	 Task : Display Processed Result for only left side of screen
Keyword Detected : "Stop" 	 Task : Stop Current Settings And reset to Default
Keyword Detected : "Left" 	 Task : Display Processed Result for only left side of screen        

4. Finally face detection and tracking is demonstrated in the following videos.


LEFT: bounding boxes on left side only, RIGHT: bounding boxes on right side only, STOP: reset display to default


Face tracking speed, even when subject moves in and out of frame.

This application uses the following models on the DPU, dynamically switched based on the UP/DOWN keywords:

Face Detection - Network model: cf_densebox_wider_360_640_1.11G_1.2

Object Detection - Network model: dk_yolov2_voc_448_448_0.77_7.82G_1.2

Number Plate Detection - Network model: cf_plate-detection_320_320_0.49G_1

In summary, the reference design has the following architecture:


Four pipeline stages used in the reference design.

It involves multiple pipelines stages:

1) Capture pipeline

2) Audio processing pipeline

3) Video processing pipeline

4) Output pipeline

In line with the theme of this series, you can see all the interfaces AXIS, AXI-MM, and AXI-Lite being used according to their roles.

The KWS model for the NLP task is based on Hello Edge, trained on one second audio clips of 10 keywords (Yes, No, Off, On, Up, Down, Left, Right, Stop, Go) from the open sourced Google Command Dataset.

I chose to demonstrate this particular application to highlight the remarkable possibilities FPGAs bring to the edge. You can find more details about it on the Kria NLP-SmartVision webpage.

That’s it for now, hope you found this application fascinating. Next, I plan to get into another beautiful topic for this series: DSP with FPGAs. The applications are endless, but I’ll try to keep it aligned with the theme of this series (data plane, control plane and Linux drivers).

Until then, bye bye!


Other parts in this series:

Part 0: Linux, FPGAs, GPUs, and some coffee!

Part 1: Custom IP & Control Plane

Part 2: Streaming Dataplane & Linux Drivers

Part 3: Deep Learning Accelerator on FPGA & Linux Drivers


My previous related series:

Embedded Linux Weekend Hacking: Linux Device Drivers

要查看或添加评论,请登录

社区洞察

其他会员也浏览了