Arduino PRO + FPGA, a perfect marriage for extreme tasks: let's discover together how to do it easily.

Arduino PRO + FPGA, a perfect marriage for extreme tasks: let's discover together how to do it easily.

Not long ago I wrote an article on TM FAST, a technological module from Siemens, equipped with FPGA, which allows the execution of very fast tasks in parallel to the PLC to which it is connected.

The idea is really interesting, it allows you to exploit the flexibility of the PLC in the control of machinery and in the meantime perform heavy tasks, such as the management of extremely fast signals, which no PLC is able to deal with.

To paraphrase a friend of mine, who says that the best way to learn something is to build it from scratch, I thought I'd bring the same concept back to the Arduino PRO ecosystem; this idea is not brand new, Arduino already produces the MKR Vidor 4000, a board equipped with a SAMD21 and a Cyclone 10 FPGA, which coincidentally is the same one that TM FAST uses.

Vidor has the enormous advantage of being compact; if size is an important constraint for you, but you still want to exploit all the power of an FPGA, this is the solution for you.

However, if you have a few more square centimeters available and want to use what Arduino PRO offers you, or have other more flexible experiences, follow me and let's see together what is possible.

Having two independent systems, Controller-FPGA, allows us the flexibility to adopt ad hoc solutions for our applications, appropriately balancing performance and costs; The SPI communication drivers I wrote are portable both across the Arduino family and across various FPGA brands, so you can have fun experimenting with any combination.

I will tell you about the generic architecture, we will see the software drivers and how to use them; therefore, we will go into the construction details of the various prototypes that I created for this experience, and finally I will give you some useful advice for creating other similar solutions.

Covering all the topics in detail, especially the one related to FPGAs, would make this article much more like a book. Luckily I recently wrote a very basic introductory article on this technology; Those who have read the first article will find the natural continuation in this one.

You'll be amazed at how much easier it is to work with FPGAs than you've been led to believe.

Let's start.


Hardware architecture

The architecture of our system is quite simple, on one side there is Arduino and on the other the FPGA. They communicate via SPI where Arduino is the Controller (Master) and the FPGA the Peripheral (Slave).

The terms Master/Slave have been abandoned by the international community in favor of Controller/Peripheral or Device, I have inserted them in brackets to give continuity to the commonly used nomenclature, however I will no longer use them.

System architecture

The two subsystems have well-defined characteristics.

Arduino is flexible, simple to program, very powerful in managing algorithms and with a marked IoT vocation. Using a Portenta H8 you can also install a Linux distribution and program it in Python.

With it we can manage Ethernet, CAN and an infinite number of SPI or I2C peripherals.

The FPGA is less flexible, in the sense that writing an FPGA program is practically equivalent to designing an electronic circuit, which must be transferred with each modification made. Fortunately, almost all FPGAs allow a transfer to SRAM (therefore volatile) which makes the process faster, however, development systems can take several minutes to "synthesize" a circuit, therefore, if you are in the habit of recompiling the your project with every small change, you will have to change your mentality and plan before programming.

The huge advantage of the FPGA, however, is its speed; response times, in the top models, can even reach the nanosecond, there is no sequential machine code to interpret, everything happens in parallel: each "block" of code is an electronic circuit, and as such, works simultaneously to everyone else. When a phenomenon is too fast for a processor to handle, the FPGA just starts to heat up.

The Artix A7-35T FPGA, for example, used in the first prototype, contains SerDes (Serializer/Deserializer) units that enable synchronous communication at up to 6.6 Gb/s.

At first glance an FPGA seems much more complicated to program than Arduino, but in reality this is not the case.

The first advantage of Arduino is the sequential programming paradigm: the instructions we write are executed, in an "intuitive" way, from top to bottom. The second advantage is the enormous effort that the Arduino team and the contributors put into writing the device libraries; the underlying complexity is hidden to make everything simpler and more pleasant.

Out of curiosity, open the sources of the SPI libraries of the mbed devices (Portenta, GIGA R1, etc.), there are hundreds of lines of code, many of these at low level.

The FPGA SPI Driver that I wrote (but also others that you can find online), including protocol and Watchdog management, net of comments, is less than 100 lines long and is quite simple to understand.

I won't go into further details on FPGAs, I invite you to read my introductory article to get an overview of their potential and how they are programmed.


Control architecture

Such a heterogeneous system requires a functional approach: each subsystem must do what it was born to do.

So, to put it simply, there are two ways to proceed, one right and one wrong.

The wrong approach is the one in which we consider the FPGA system as simple fast acquisition hardware and therefore delegate the control of the process to Arduino.

This is fine for a traditional system, but, if our process is critical, we will negate all the advantages of using an FPGA. In fact, in addition to the "slowness" of the Arduino program, which is a slave to the scanning time, we must add the transfer time to and from the module, which could be unacceptable even if we used Ethernet communication.

The correct way to design a control system with this module is to write the management logic directly in the FPGA. By doing so we will be able to benefit from high speed and parallel processing of signals.

If our logic involves a set of quick operations to be performed on command, Arduino can exchange synchronism signals with the program in the FPGA, pass work parameters or collect results at the end of the task using the data structures that we will see, but never fit into the control logic.

Data exchange

If you're expecting intricate transfers between complex structures, you'll be disappointed; conceptually everything is much simpler: there are four mirrored areas, two in Arduino and two in the FPGA.

COPI_DATA in Arduino contains the data to be sent to COPI_DATA of the FPGA, vice versa, CIPO_DATA of the FPGA contains the data to be sent to CIPO_DATA of Arduino.

Therefore, COPI DATA is a write area for the Arduino and a read area for the FPGA, while CIPO_DATA is a read area for the Arduino and a write area for the FPGA.

Data exchange

The names derive from the SPI nomenclature:

  • COPI = Controller Output Peripheral Input
  • CIPO = Controller Input Peropheral Output

I used these names because Read and Write are ambiguous; in fact, their meaning depends on which side they look at.

Ultimately, to exchange data, the two systems will simply have to read and write data in the defined areas. The actual exchange operation is carried out by Arduino, which is the Controller, using a simple function that we will see later.

The drivers, internally, for both devices, work on groups of bytes, it is up to you to map structures within them that have a specific meaning in your application.

Data protocol

The implemented protocol is completely managed by the drivers, there is a single function exported from a class, in the Arduino library, which allows you to exchange two data blocks, of arbitrary position and length, within the CIPO_DATA and COPI_DATA areas.

Thanks to the full-duplex nature of SPI, data exchange occurs simultaneously, this, together with the fact that SPI is a synchronous bus, allows us to have a data rate of almost 20 Mb/s.

SPI Data exchange

In other words, synchronous to the same clock, the FPGA acquires the data sent by Arduino and Arduino acquires the data sent by FPGA.

The low-level data format is as follows:

Protocol

The SPI telegram is composed of a 16-bit word containing the address of the first bit to be transmitted, followed by the group of bytes we want to transfer.

The transfer occurs in reverse, starting from the most significant bit of the last byte and arriving at the least significant bit of the first byte. This allows us to avoid rotation operations, the address calculation is carried out internally by the Arduino class.

Arduino communication driver

To manage communication, as already mentioned, in the Arduino library you will find a class: FPGAClass.

To use it you need to define the two data areas I talked about before. They do not need to be byte arrays, they can also be arbitrary structures, the only important thing is that they are the same size as the twin ones inside the FPGA.

Using FPGAClass is very simple, in fact it contains only two methods: begin() and Exchange(). Let's see them in detail.

FPGAClass::begin() is used to initialize/parameterize the class, typically it is to be called in the setup() function of your sketch. Contains five parameters:

  • pin_size_t SS_PIN is the pin used for selecting the Peripheral, generally D10 for the Arduino Nano and GIGA R1 WIFI, while for Portenta you can use the constant PIN_SPI_SS. However you can use any available GPIO pin.
  • void* data_out is the address of your COPI_DATA area.
  • void* data_in is the address of your CIPO_DATA area.
  • int BufferSize is the size of the buffer areas in bytes.
  • uint32_t Clock is an optional parameter and indicates the SPI frequency, if not used, by default it is 20000000 i.e. 20 MHz.

Int FPGAClass::Exchange() performs the data exchange, you can call it wherever you want within your code. It contains only two parameters.

  • uint16_t Start indicates the start byte inside your buffers, if you use an array of uint8_t it is simply the index of the starting element, if it is a substructure inside a struct, you have to calculate this offset (perhaps obtaining it as the difference between the memory address of the substructure and that of the parent structure).
  • uint16_t Size indicates the number of bytes you want to transfer. The function checks that you are not going beyond the allowed limits based on what was declared in the BufferSize parameter passed to begin().

Be careful, the transfer occurs in reverse, but this management is internal to the class. If you want to transfer 16 bytes starting from byte 3, you should, more intuitively, use Exchange(3, 16);

Return

  • FPGA_SUCCESS (0): The function was successful.
  • FPGA_COMERROR (1): Communication problems occurred, data was not exchanged.
  • FPGA_PARERROR (2): parameter verification failed.

Using SPI, it is not possible to know if the partner is working or not, for this reason FPGA returns the echo of the sent address.

In reality this value will not be complete, we will be able to consider only the central 14 bits as valid, the FPGA prepares the response on the falling edge of the clock by sampling the input bit (which obviously it does not know a priori), so the value is shifted by one bit to the right, furthermore the last one is lost because the clock pulses have run out.

The power-up phase of the two systems will certainly have different times that depend on many factors, so the communication driver, on the Arduino side, will consider the communication valid only when it receives the correct echo in response.

FPGA communication driver

It is the spi_driver module that you find in the \fpga\common\spi_driver.v file. It is common to all FPGAs used.

Let's start by saying that I made some particular architectural choices.

A "canonical" SPI-Peripheral driver requires the clock line (spi_sck) to be sampled by the system clock, or its derivative emitted by a PLL, at a frequency quadruple that of the SPI clock, and therefore synchronized in the domain.

The current SPI frequency is 20 MHz, if this were to increase, because perhaps you intend to use other controllers (or another FPGA as Controller), there could be problems with the master clock, which currently, both in the Gowin and in the Xilinx, must be increased using a PLL since the base frequencies (27 MHz and 12 MHz) are insufficient.

For this reason, I directly used spi_sck as the driver clock, connected to some generic I/O pins. This sacrilegious approach triggers the reaction of all development tools, which issue a series of warnings.

If you continue to use Arduino as a controller, leave everything as it is: 20 MHz is perfectly manageable by conventional logic; the clock pins are limited in number and could be used for more important uses, such as reading encoders.

If, however, you intend to increase the SPI frequency, then move (at least) spi_sck onto a clock line, being careful to use, in the case of differential inputs, the "P" line, you can continue to use the N line, but only as generic I/O.

In the connection diagram of the first prototype, for convenience, I have highlighted them in green. In any case, always refer to the official datasheet.

Cmod A7 clock lines

Watchdog

For greater security, I implemented a watchdog mechanism within the FPGA driver. It is possible to pass a timeout parameter in milliseconds, beyond which without a data exchange, the FPGA raises a bit which can be used as an "emergency" line for any connected hardware. If you don't want to use this option, pass 0 as value.

Extra Clock

Finally, given that the default frequencies of FPGAs are always low to reduce interference, to make it easier for you I have programmed a PLL (for each FPGA) to produce an internal clock at 100 MHz to use as you wish. It is an operation that can be annoying if you are a beginner. In the demos, as an example, the 100 MHz are divided by 100,000,000 and simply make an LED flash to verify correct operation.

Demo

The demos you find in the repository will need to be modified for your real applications. In particular, the main FPGA program, top.v, performs a loopback between copi_data and cipo_data and displays the number of telegrams exchanged. Obviously this part of the code will have to be eliminated/modified, in the sources you will find instructions on how to do it.

Keyboard with display

Since we're talking about drivers, I'll tell you something about the hardware we're going to see.

Economical FPGA development boards are divided into two large families: educational systems and breadboard friendly.

The former, in addition to the FPGA, contain buttons, switches, displays and various connectors. They are compact and self-consistent - you can power them up and experiment with them in minutes. The disadvantage is that they cost more and cannot be used for prototypes or small productions because they are obviously larger and, also, buttons and displays are not of much use in production.

Breadboard friendly boards have only the FPGA on board, a minimum of circuitry for programming, and in the more advanced models it is possible to find SRAM and/or DRAM.

The advantage is that they cost little and have very compact dimensions, so it is possible to use them both for teaching and, easily, for small production runs, given that they have 2.54 mm pitch pins.

In the photo I have arranged a Terasic DE10-Lite and a Digilent Cmod A7 to show you the difference.

Terasic DE10-Lite vs Digilent Cmod-A7

The disadvantage of breadboards however, as you can see from the photo, is that they are "mute and deaf"; you have to equip them with buttons, displays and LEDs to gain experience, and this can be a bit annoying because after a while the wires make bad contact and you risk chasing non-existent bugs.

The biggest penalty, finally, is not that of connecting the external circuitry, but that this takes up a lot of precious I/O. A board with eight 7-segment displays (multiplex, cathode or common anode), eight buttons and eight LEDs occupies a total of 32 I/O pins, and this is unacceptable.

But I solved this problem.

I wrote the driver for a display keyboard based on the Titan TM1638 chip. You can find it for €4 in any online store. It does not have a specific name, although it is often found as "LED&KEY", but it is produced by dozens of manufacturers and they are all the same. It has eight 7-segment displays, eight LEDs and eight buttons.

LED&KEY

To avoid the hassle of disconnecting and connecting the terminals for the different prototypes, given the cost, I purchased four of them, and they all work very well.

Simply instantiate the driver and read/write the exchange variables, you can write a 32-bit hexadecimal number or drive individual display segments independently. The state of the buttons is always available in an 8-bit variable, and in the same way we will drive the LEDs or the decimal points of the displays.

It's all very simple to use, the electrical advantage is that only three wires are needed to communicate, so there is no big impact on the number of I/Os taken away from the FPGA.

As for the SPI driver, here too I used generic I/O pins, it would have been a real shame to sacrifice a clock line for just 500 KHz. the only parameter required, in addition obviously to the list of physical pins, is the frequency in MHz of our master clock which it will automatically scale; that's all.

This driver is "cross-platform" too, it works without modifications with Xilinx, Altera, Lattice and Gowin.

Finally, if you want to use this board directly with Arduino, there are at least three fully functional libraries: search for TM1638 in the Library Manager.


Prototypes

I tested the following combinations by crossing all the Arduino boards with the FPGA ones and created some prototypes.

In order not to waste your attention, the prototypes that I am going to describe to you are the most representative ones and represent four of the fifteen combinations, starting from the PRO one, which is more powerful, up to the cheapest one, a mini-lab that allows you to gain experience with an Arduino-FPGA system while staying under €40.

To house the FPGA boards, and so as not to disfigure His Majesty's Portenta, I used EPLZON black multi-hole boards, they cost a little but they are truly of superior quality.

First prototype

the architecture is based on the new Portenta HAT Carrier, which can accommodate all variants (C33, H7, H8) and Digilent Cmod A7, a breadboard friendly FPGA equipped with an Artix 7-35T.

As a controller I used a Portenta H7.

FPGA side

Artix 7 is the latest Xilinx family before the flagship ones, it is very powerful and contains 6.6 Gb/s transceivers. The model that mounts the board used, the 35-T, has these characteristics.

The characteristics of an FPGA must however be combined with those of the host board, which, for obvious reasons, does not expose all the available pins, some are used internally for RAM, FLASH and other peripherals, the remainder are not connected (the educational boards use them for buttons and displays).

In our case, Digilent's Cmod A7 exposes 52 digital I/O pins and 2 analog pins; it is equipped with a 512KB SRAM and an 8MB Quad-SPI Flash. It has 2 buttons and 3 LEDs, one of which is RGB.

It does not require an external programming interface, in fact on board we find both the JTAG circuitry and a USB bridge which allows us to exploit an RS232 port in case we want to instantiate a MicroBlaze softcore and communicate with other equipments without too much effort.

To maximize the number of I/Os, this card, unfortunately, does not have the service voltage of 3.3V at its output, useful for powering external peripherals.

In order to overcome this problem I mounted a 5V->3.3V 800 mA DCC step-down.

Step-down 5v->3,3V

Arduino side

As already mentioned, I used Portenta H7 and Hat Carrier; I wanted to say a few words on the latter.

Portenta Hat Carrier is the card I've been waiting for for a long time. Unlike Max Carrier and Portenta Breakout it is more compact. This means that it can be used in production for small series, as long as the I/Os are equipped with the necessary electronics: for field applications, level shifters are not enough, we need optoisolators and fast protection circuits.

However, it has three well-defined access points (connectors), so it is possible to create fairly tidy daughter boards. Or take advantage of what already exists for Raspberry, with which it shares the pinout of the main connector and the operating 3.3V.

I'll show it to you in comparison with Portenta Breakout.

Surely you will find it not very decisive, but aesthetically I really like it.

The driver I wrote should also work well with Portenta C33, I have no idea if the same can be said for the new Portenta H8 since I don't have it.

For this prototype, on the FPGA side, you must use the fpga\cmod-a7 folder in the repository (find the link at the end of the article)

These are the connections to be made.

Finally, the cost of this prototype is around €280, Cmod A7 is in the catalog of all the best microelectronics stores (Farnell, Digikey, Mouser etc.).

Given the cost, I recommend this implementation for particular applications and if, after careful analysis, you are unable to use the following components.

Second prototype

For the second prototype the Arduino part does not change. On the FPGA side I used a Sipeed Tang Nano 20K, it is still a breadboard friendly FPGA board, but it features a Gowin GW2AR-18, a very interesting chip, quite high-performance but economical.

Tang Nano 20K has lower features than Cmod A7, but proportionately costs much less. Gowin, the FPGA manufacturer and Sipeed have a fairly aggressive pricing policy, their products are highly appreciated, especially by the maker community. Emulators of old (not very powerful) consoles have also been written for this FPGA board.

In those few square centimeters, beyond the FPGA, Sipeed has inserted six LEDs, two buttons, a micro-SD reader, an LCD display connector, a PCM audio amplifier and even an HDMI port. Truly remarkable. The I/O groups opposite and close to the HDMI connector are equipped with two 3.3V outputs to power SPI Joypads.

I mounted this board on a multi-holes board identical to the one used for Cmod A7 so that I can easily exchange them.

You can purchase this board from the official Sipeed store on Aliexpress, it arrives in about ten days well packaged in a rigid plastic box and costs around €38.

These are the characteristics of the FPGA.

Using this card is simpler, as already mentioned Tang Nano 20K also has 3.3V outputs, so it is not necessary to use a step down to power the keyboard or other small peripherals.

For this prototype, on the FPGA side, you must use the fpga\tang-nano20k folder in the repository (find the link at the end of the article).

These are the connections if you want to make this prototype.

I really love Sipeed boards because they are cheap and powerful. Furthermore, the development system, Gowin IDE, is very fast. I often use it to write and debug my basic Verilog code which I then port to other platforms, whose development systems are more precise in circuit synthesis, but much slower.

The cost of this prototype is around €190.

Third prototype

In this prototype the FPGA board is always Cmod A7 but on the Arduino side we find GIGA R1 WiFi.

This card is one of my favorites because it is very powerful and has tons of peripherals and I/O. The cost is very honest considering what it offers, by pairing the new dedicated display it is possible to create many interesting things, one of these days I will decide to buy it and maybe I will tell you something...

Returning to our business, in the figure you see the connections to be made for this prototype.

The cost of this prototype is around €120.

Fourth prototype

This prototype is designed to provide experience with Arduino and FPGA at a cost accessible to all.

The controller used is an Arduino Nano ESP32, while the FPGA board is a Sipeed Tang Nano 9k.

By reducing the components to the bare minimum the cost is less than €40. In the photo you see it mounted on a breadboard.

Be careful, in this case cheap is not synonymous with low quality.

Arduino Nano ESP32 is a board that has an excellent price/performance ratio, in this regard I show you a benchmark that I ran not long ago on some of the 32-bit DIY boards of mine, in which it ranks third in the family Arduino.

Benchmarks must always be considered carefully, there are many factors to consider when choosing a card, however, in terms of performance, this board is truly spectacular.

Even on the FPGA side this system is not to be denigrated, these are in fact the declared characteristics of Tang Nano 9k.

9K LUTs are not few, trust me; it is even possible to instantiate a small softcore, furthermore, for this card, there are many examples online.

For an application, unless you need very fast peripherals, such as high-speed transceivers or a large number of encoders to manage, the Tang Nano 9k + Nano ESP32 combination can most likely already solve a good part of your problems or be useful for advanced training.

Below are the connections for this prototype.

Finally, remember that an educational system has the advantage of costing little, but, by following good programming rules, we can bring much of our code to more advanced platforms. As already mentioned, thanks to the great work done by Arduino programmers, it is possible to use standard libraries for the whole family.

Further combinations

As you have seen, it is possible to create different Arduino/FPGA combinations, however there is a rule to be respected: the Arduinos must work at 3.3V, and this is a fundamental condition; FPGAs cannot withstand higher voltages.

If you want to use a 5V board, for example the UNO family or Arduino MEGA, you must insert level-shifters. These allow bidirectional level translation 3.3V<->5V, the only precaution is to use fast components, otherwise you will be forced to lower the SPI frequency.

Level-shifters typically handle 4 + 4 I/O, so one will be sufficient.

Making a small brand deviation, it goes without saying that you can use all the ESP32 or STM32-based boards, today it is possible to program them all with Arduino IDE.

Electrical connections

If you want to use SPI at 20 MHz and above, it is important that you take care of the wiring and electrical potentials.

During the debugging phase you will certainly have both systems connected to the PC via USB cable, in this situation you must do two things:

  1. (CRUCIAL) Disconnect the positive 5V power supply from the FPGA to the Arduino, leaving only GND.
  2. (Recommended) Lower the SPI frequency to 8 or 10 MHz.

Both boards take power from USB when connected, and this is definitely not the same for both, especially if you use different PCs. A potential difference between the two positive branches could compromise one of the two.

We find the same potential difference on the SPI bus, certainly to a lesser extent because it derives from a step down, so electrically we will not have problems, but at high frequencies communication errors can occur.

Arduino OPTA

Let's get to the sore points.

I would have really liked to try this architecture with OPTA, also because the module that inspired me, TM FAST, works precisely in this way, that is, coupled to a PLC.

Unfortunately, OPTA does not have SPI or similar ports; there is a lateral expansion connector called AUX, but it is not known which lines it contains. Expansion modules have been announced, but even in this case it is not clear whether useful information will be provided at the same time about the protocols that can be used.

It would be interesting to have a "Proto Shield" for OPTA, unfortunately the PRO line is semi-armoured.

To be honest, I don't know of any commercial PLC for which the wiring diagram has been "officially" published; we are makers and we love details, but from a strategic/commercial point of view this choice is unfortunately coherent.

However, hoping costs nothing.

Conclusions

We have reached the end and I hope I have aroused some interest in you. The key word is to always experiment, at any age, and always push yourself a little further.

Sometimes it can be difficult, but we can't spend our lives flashing LEDs or creating digital thermometers; the Arduino PRO line, as its name states, is dedicated to professional applications, but is well documented and, with its enormous potential, is certainly a great source of fun.

Finally, when you make something interesting, share it with the community; Helping others grow is the true spirit of makers.

Good fun

References

Drivers download (github)

FPGA article

TM FAST article

Arduino

Portenta H7 / Portenta HAT Carrier

GIGA R1 WIFI

Nano ESP32

FPGA

Digilent Cmod A7

Sipeed Tang Nano 20K

Sipeed Tang Nano 9K


Aniket Kumar

Embedded engineer

1 年

Great article.

Jeroen Wolf

Audiovisual Advisor and Application Manager for the Amsterdam University of Applied Sciences

1 年

Great article and inspiration for my project ???????

Bob Afwata

RnD Director - Focuslense Electronics Limited

1 年

nice article

要查看或添加评论,请登录

Davide Nardella的更多文章

社区洞察

其他会员也浏览了