Extending the Power of Logic Simulations using the Programming Interfaces (part 1)

Extending the Power of Logic Simulations using the Programming Interfaces (part 1)

Introduction

Some years ago I was working as software engineer implementing cycle-accurate models of SoC systems in C++ for a company making (amongst many other things) microcontrollers using their own proprietary RISC CPU core design. The CPU core and the microcontroller peripherals development went along in parallel and, of course, all needed verifying. The verification team was using third-party tools to drive the peripheral testing and, since the CPU was not available, reading and writing control and status registers on the system bus (or busses) was done with a proprietary tool which could emulate a variety of available microprocessors, such as MIPs, ARM etc. This meant test code for driving the peripheral blocks had to be cross-compiled to a target CPU and then run on the simulation of that CPU that the third-party tool provided. One day the verification manager spoke to me saying that all he wanted for running the peripheral tests was to be able to write a program, compiled natively on the simulation host machine (a Windows PC or Linux server), that could read and write to memory locations on the bus, without all the need for the extra third-party tools and cross-compiling for a given processor. With him knowing that, at the time, I was hooking up the C++ models to the HDL simulator we were using, he asked me if this was possible. Circumstances then overtook things, with some company downsizing before this could be followed up at that particular company.

However, it got me thinking, as this was a good idea that was applicable to many situations where testing of logic required, at various points, control, and status registers to be accessed with reads and writes over the system's bus. In my own time I started playing with the Verilog PLI v1.0 supplied by the open-source Icarus Verilog simulation tool to achieve this. Since that time, to my surprise, most logic design engineers I've worked with have not used the programming interfaces provided by their simulation tools and many were not even aware of them. Even more shocking is that some did not even know how to write basic software in C or C++. I have spent most of my career as a logic designer, though I have been a software engineer as well—but I knew how to write C and C++ before that time. If you are sitting every working day in front of a computer, how can you not want to know how to program it? If you are a logic designer reading this and are not yet able to program in C or C++ (or even Python), get your company to send you on a course. Not only will it open up your abilities in logic development and test, make you more aware of embedded software needs, but will, more importantly, allow you to follow the rest of this article—as you will need some C knowledge.

In Verilog these APIs are standard, with PLI 1.0 (programming logic interface) and VPI (Verilog Procedural Interface—PLI 2.0). VHDL has standardised with VHPI (VHDL Procedural Interface) and Mentor Graphic's ModelSim FLI (Foreign Language Interface) has been available in VHDL for many years. With the advent of SystemVerilog, interfacing to a program from the HDL world is even easier using the DPI (Direct Programming Interface). These all do pretty much the same thing, with the differences just in the details. They allow a program function (written in C, say) to be called from logic and (if required) the other way around as well. In addition, a whole host of other things can be done, but essentially there is a means to exchange information across a logic-software boundary and process it in the program code.

To get to the point where a program, in C or C++, can be run that can read and write registers on a simulated system bus, a few things must be solved. In these articles we will first review some programming interfaces and how they can be used, using the absolute bare minimum of features, to call a function and return data with the goal of demonstrating that it doesn't really matter which flavour of PLI is used once one knows how to set it up. This means it is then straightforward to use the appropriate tools for the environment you are working in. From there we will explore how to cross the logic-software boundary and make simple functions calls to use software to do various tasks. From this point we will be able to discuss how to solve the problem of running a program to access a system bus on the HDL simulator for register reads and writes. This will not require any new PLI features to do this and once we have crossed the software-logic chasm the sky's the limit. Once we have the ability to access the system bus from software, just like a CPU, we will then look at how we might go even further and compile some or all of the target embedded software to be the 'software' accessing the system bus, and thus have co-simulation capability. All this using the simplest part of the PLI features to cross the divide and then build layers on top of this. The power of this is not in the PLIs themselves (good though they are) but, having crossed the boundary, what then becomes possible.

For the rest of this first article we will just review the programming logic interfaces mentioned above, except for VHPI as I don't have any real experience with it and does not appear to have as much support for it in simulators I've used. Hopefully, the comparison of the other interfaces will show that they all pretty much do the same thing, so if you need to use the VHPI interface you will be able to adapt what is here.

Motivation

It is true that all of the things I have outlined do not need to be done to perform the testing required, or even the co-simulation. So why bother? Well, simulation cycles are costly in processing power and take CPU time and are thus relatively slow. It would be better to use those simulation cycles, as much as possible, to exercise the unit under test and not waste them on the test bench processing. Once we have crossed over to a software program, the speed of processing data will increase by orders of magnitude. Indeed, in my deployment of these methods in logic test benches, I have always tried to get data into software with as little processing in the logic simulation as possible. Once there, one has access to all the vast number of software libraries available to help do the processing of data and cross-checking for validity, generation of statistics, logging of data etc. Use the simulator processing cycles wisely.

All these methods have been tried and tested in real commercial product development, from the development of Supercomputers—co-simulating the fabric kernel driver code, running in QEMU, with ASIC logic—to testing the embedded logic for industrial inkjet printer control systems. In these articles references will be made to example open-source code that does all these things and thus can act as reference designs for those who would dive deeper into this subject, or as useful tools for your own development projects.

SystemVerilog's DPI

The easiest of the programming interfaces to use is the SystemVerilog direct programming interface (DPI). More strictly we will use the DPI-C interface as we are interfacing to C code. We need to tell the test bench logic about the C function we want use and be able to call it from the SystemVerilog code. First, we need some C functions to call. Simple memory read and write functions are what we need.

void Write??(const int? addr,?const int  data, const int be)
{
    /* do some processing */
}

void Read???(const int? addr,?      int* data, const int be)
{
    /* do some processing */
}        

This C functions are just what you might have in a normal C program—as part of an instruction set simulator model, say. They can even be compiled as C++, adding extern "c" before them as the interface links with C functions. We then have access to a whole C++ environment. What code goes in these functions isn't defined here, but could, say, access a large array to read and write to, for a simple memory model. (We can do better than that, and more on this later.) These functions need to be linked to the System Verilog somehow.

In the SystemVerilog code from which we will call the C functions we can import them, within the module, as shown below:

import "DPI-C" function void Write(input int addr, input int? data, input int be);

import "DPI-C" function void Read (input int addr, output int data, input int be);        

You'll notice that the declaration looks quite similar to the C prototypes, but a direction has been added as input or output, with output matching the pointer integer argument of the Read() function. Now there is a whole lot of different types that can be matched between SystemVerilog types and C types, but we will stick to integers, keeping the minimalist philosophy intact. Also one can go in the other direction and map SystemVerilog tasks and functions to be callable functions from C—but we'll let that go as well for now.

Now, from within the SystemVerilog module the functions can be called, much the same as a SystemVerilog task:

always @(posedge clk)
begin
  // some code...

  // Do a write
  if (write)
    Write(waddr, wr_data, wr_byte_en);

  // Do a read
  if (read)
    Read(raddr, rd_data, 'h15);

  // ...some more code
end
        

The above code implies that the arguments are all SystemVerilog integer types, but they could be 32-bit reg or input vector types etc., and SystemVerilog will cast them to integers as that is how the function is declared. One thing to note, though, is that if the vector has a 'Z' or 'X' on any of the bits, the whole value is cast to 0.

Compiling the code

We have C functions and we have calls from a SystemVerilog module, but we need to compile all this into the simulation. This can vary from simulator to simulator. I have been using ModelSim, as it covers all the languages and PLIs I want to discuss, but whatever simulator you have available will not be too different. ModelSim allows the C source code to be compiled as part of the file list and will recognise the .c suffix. For example:

vlib work
vlog my_sv_code.sv my_c_funcs.c        

This is fine for small file lists, but a more generic method is to compile all the code into a shared object (or dynamic link library on Windows) with whatever C compiler is suitable for the host machine. For ModelSim compiling this to a 32 bit or 64 bit so/dll is required, depending on the version of ModelSim. For example:

gcc -shared -Bsymbolic -m32 -I $MODEL_TECH/../include -I. my_c_funcs.c \
    -o my_c_funcs.so        

Note that, in windows, ModelSim supports MinGW and (I believe) Visual Studio. It doesn't support Cygwin, though, and one must be careful not to use Cygwin compiled code. Under Linux none of this is an issue. Now, with the shared object (or DLL) we can load it as a SystemVerilog library.

Running the Simulation

To run the simulation with the shared object loaded (assuming some test environment with top_module as the top-level module) something like the following is run on ModelSim

vsim -sv_lib my_c_funcs top_module        

The vsim program knows to look for a .so (or .dll) file with the name given as the argument of the -sv_lib command line option. By going down the shared object route this can be compiled with all the C or C++ functionality required, including the DPI functions that use this other code. More than one shared object can be specified, so if it make sense to compile libraries and other programs separately, this is supported, so long as it's all there on the vsim command line.

And that's all there is to it. Having done this, for SystemVerilog, lets now look at Verilog's PLI to do the same thing. We'll be able to skip some of the detail in this initial section as it is basically the same, such as compiling the C.

Verilog PLI 1.0

The PLI task/function interface (tf) is probably the most widely supported programming logic interface, and so is useful to know. How every it is more awkward to use than the DPI interface and is now deprecated (but not yet obsoleted)! It is replaced by the VPI interface (PLI 2.0), but it is so ubiquitous that it may be all that is available to you, particularly in open-source simulators such as Icarus and Verilator (on both of which I have used this interface). Lets define the C function prototypes for the same Read and Write as before.

int Write (void);

int Read (void);        

The main thing here to note is that the C functions do not have any parameters, and we will have to use provided PLI library functions to access the passed in calling arguments, and to pass back values. To read input values from the task call in Verilog, the tf_getp(<argument index>) function is used. To put a value on an output of the calling task, the tf_putp(<argument index>) is used. Both tasks take an integer index which selects which of the arguments is being got or put, with the first argument at 1 (argument 0 is the function/task itself). So the functions can now look like the following:

int Write (void)
{
  int addr, data, be;

  // get inputs values
  addr = tf_getp(TF_ADDR_IDX); // first argument
  data = tf_getp(TF_DATA_IDX); // second argument
  be   = tf_getp(TF_BE_IDX);   // third argument

  /* do some processing */

  // Return OK status
  return 0;
}

int Read (void)
{
  int addr, data, be;

  // get input values
  addr = tf_getp(TF_ADDR_IDX);
  be   = tf_getp(TF_BE_IDX);

  /* do some processing... */

  // Set output value(s)
  tf_putp(TF_DATA_IDX, data);

  // Return OK status
  return 0;
}        

So, from the C point of view we just have to make a few extra PLI calls at the beginning and, where output is to be set, calls at the end. Other than that the code can be the same as that for SystemVerilog and DPI. However, to let the simulator know about the functions, there is a little bit more work to than for DPI.

For PLI 1.0 a table must be provided mapping the C functions to Verilog tasks. This varies from simulator to simulator as well and (if memory serves me correctly) it is done differently in, say, VCS than for ModelSim, and Icarus needed some other function to register the table. In ModelSim the table, and a function to return a pointer to it, looks like the following:

#include <veriuser.h>
#include <vpi_user.h>

s_tfcell veriusertfs[] = 
{
    {usertask, 0, NULL, 0, Read,? ?NULL,? "$tfread",? ?1}, 
? ? {usertask, 0, NULL, 0, Write,? NULL,? "$tfwrite",? 1},
    {0}
};

p_tfcell bootstrap ()
{
    return veriusertfs;
}        

Without going into too much detail, the table is an array of structures, terminated by a null entry (the {0}). There is one entry for each C function we wish to use, with the first field indicating that this is a user defined task and will thus be called like any other Verilog built-in task (such as $display, $memreadh etc.). The name of the task, to be called from Verilog, is defined in the string of the seventh field. I changed them slightly from the C names, to avoid clashes with existing system task names. The other fields don't matter for what we are doing and can be just set as shown. Refer to you documentation for more details. The bootstrap() function needs to be defined, with that name, and return the pointer to the table.

This code can be part of the C function source code or, perhaps more properly, in a separate source file. With all this compiled into a shared object, just as for SystemVerilog, the C functions can be called from Verilog, via the defined task names:

always @(posedge clk)
begin
  // some code...

  // Do a write
  if (write)
    $tfwrite(waddr, wr_data, wr_byte_en);

  // Do a read
  if (read)
    $tfread(raddr, rd_data, 'h15);

  // ...some more code
end        

This all now acts like the SystemVerilog, with integers being input and output, and the same vector to integer casting rules.

Compiling the code

With the additional C code for the task table, the C code is compiled into a shared object (or DLL) in just the same way as for SystemVerilog. Let's assume that the additional PLI C source is in our my_c_funcs.c source file, then it would be almost the same as SystemVerilog, but an extra library (mtipli) is required:

gcc -shared -Bsymbolic -m32 -I $MODEL_TECH/../include -I. my_c_funcs.c \
    -L$MODEL_TECH -lmtipli                                             \
    -o my_c_funcs.so        

Running the Simulation

Running the simulation with PLI code is similar to SystemVerilog, but the shared object is referenced slightly differently

vsim -pli my_c_funcs.so top_module        

Here, the -pli command line argument replaces -sv_lib and needs the whole name of the shared object file (and doesn't imply the suffix). Other than that it is the same.

As I said before, though not yet obsoleted, the PLI 1.0 interface is being deprecated in favour of VPI (i.e. PLI 2.0). So let's have a look at this.

Verilog VPI

This interface is a much more flexible and consistent interface than PLI 1.0, but for our purposes it is slightly different and a little more complicated. For the C function prototypes we now have:

int Write (char *userdata);
int Read  (char *userdata);        

The introduction of the char pointer argument, over PLI 1.0, is of no consequence, and we won't be using it. To get access to the arguments of the calling task we need a 'handle' to the argument list, then an 'iterator' to go through each argument in turn. As we are likely to do this a lot, lets define a function to do this:

#include "veriuser.h"
#include "vpi_user.h"

int getArgs (vpiHandle taskHdl, int value[])
{
? int? ? ? ? ? ? ? ? ? idx = 0;
? struct t_vpi_value? ?argval;
? vpiHandle? ? ? ? ? ? argh;

? vpiHandle? ? ? ? ? ? args_iter = vpi_iterate(vpiArgument, taskHdl);

? while (argh = vpi_scan(args_iter))
? {
? ? argval.format? ? ? = vpiIntVal;

? ? vpi_get_value(argh, &argval);
? ? value[idx]? ? ? ? ?= argval.value.integer;

? }


? return idx;
}        

Our function takes, as input, a handle (of type vpiHandle) to the calling task and a pointer to an array of integers in which to put the retrieved argument values. Within the function we need a variable of type t_vpi_value and another vpiHandle for the argument values. Finally we need our iterator object (type vpiHandle again) and initialise it with a call to vpi_iterate(), passing in a vpiArgument type to say we want an argument iterator, and the handle of the calling task. Now, using this iterator, we can get each argument (of type t_vpi_value) in a loop until it returns null. At each loop, we get a handle to the argument's object. The argval structure has its format field set for an integer type, to indicate that the argument is of that type, and then vpi_get_value() is called, passing in the argument handle and the argval structure. The actual argument value is returned in argval.value.integer, which we can place into the passed in array.

Updating arguments is not dissimilar, so let's define a function to do this.;

int updateArgs (vpiHandle taskHdl, int value[])
{
? int? ? ? ? ? ? ? ? ?idx = 0;
? struct t_vpi_value? argval;
? vpiHandle? ? ? ? ? ?argh;

? vpiHandle? ? ? ? ? ?args_iter = vpi_iterate(vpiArgument, taskHdl);

? while (argh = vpi_scan(args_iter))
? {
? ? argval.format? ? ? ? = vpiIntVal;
? ? argval.value.integer = value[idx++];
? ??
? ? vpi_put_value(argh, &argval, NULL, vpiNoDelay);
? }


? return idx;
}        

The general form is the same, but now the value array contains the update data, and argval.value.integer is set with the value before calling vpi_put_value(). This has a couple of extra parameters, which can be set as shown to emulate the previous code we have looked at. With these two functions defined, our Read and Write templates become:

int Write (void)
{
  int addr, data, be;
  vpiHandle taskHdl;
  int argVals[4];

  taskHdl?= vpi_handle(vpiSysTfCall, NULL);
  getArgs(taskHdl, &argVals[1]);

  // get inputs values
  addr = argVals[TF_ADDR_IDX];
  data = argVals[TF_DATA_IDX];
  be   = argVals[TF_BE_IDX];

  /* do some processing... */

  // Return OK status
  return 0;
}

int Read (void)
{
  int addr, data, be;
  vpiHandle taskHdl;
  int argVals[4];

  taskHdl?= vpi_handle(vpiSysTfCall, NULL);
  getArgs(taskHdl, &argVals[1])

  // get input values
  addr = argVals(TF_ADDR_IDX);
  be   = argVals(TF_BE_IDX);

  /* do some processing... */

  // Set output value(s)
  argVals[TF_DATA_IDX] =  data;
  updateArgs(taskHdl, &argVals[1]);

  // Return OK status
  return 0;
}        

So a couple of things to note here. Firstly, the taskHdl handle is created externally to the two functions, mainly so it can be reused in the Read() function for input and output with two calls to vpi_handle. Secondly, the array to receive the values has a dimension of four, and then the pointer to the array index 1 is passed to the functions. This is to make it look more like the PLI 1.0 code, and the defined indexes for the arguments will than work for both the PLI and VPI code. This allows code to be more easily be written and compiled for either version of Verilog PLI.

The VPI tasks need to be registered, similarly to PLI 1.0, only in a different way.

void register_vpi_tasks()
{
? ? s_vpi_systf_data data[] =
? ? ? {{vpiSysTask, 0, "$tfread",? ? Read,? ? ?0, 0, 0},
? ? ? ?{vpiSysTask, 0, "$tfwrite",? ?Write,? ? 0, 0, 0},
? ? ? };

? ? for (int idx= 0; idx < 2; idx++)
? ? {
? ? ? ? vpi_register_systf(&data[idx]);
? ? }
}

void (*vlog_startup_routines[])() =
{
? ? register_vpi_tasks,
? ? 0
}        

We create a function (register_vpi_tasks() in this example), and in it we create an array of s_vpi_systf_data structures, and initialise them as shown, indicating that these are tasks in the first field, the name of the user task in Verilog in the third field, and the C function name in the next field. All other fields are set to zero. We register these entries by looping across the array and calling vpi_register_systf() for each one. Now we must add the pointer to the register_vpi_tasks() function to an array of pointers-to-functions and terminate the list with 0. The type of each pointer-to-function entry is just as for register_vpi_tasks(), and the array pointer is already defined in the VPI libraries provided by the simulator.

Compiling the VPI C code, calling the C functions from Verilog, and running the simulation are identical to the PLI 1.0 case, so I won't repeat it here (refer to that section above).

Now, you might be thinking that this is all a bit of a faff to do the same things as we've already done in a much simpler way. Well, you'll get no argument from me and it maybe that, once PLI 1.0 is obsoleted and if you don't have access to SystemVerilog's DPI, it's the only choice available to you. But now you know how to do it, wrapping up the esoteric details into two functions that we've written, so we don't have to worry about it too much again. That wraps it up for the Verilog type languages, what about for those developing in VHDL. Let's look at the ModelSim FLI as the last programming interface.

VHDL FLI

The foreign language interface from ModelSim basically does for VHDL what PLI does for Verilog. The C functions are defined exactly the same as for the SystemVerilog, with parameters for each of the input and output signals of the calling task—the outputs being pointers. To gain access to them in from VHDL some procedures need to be defined, in a package just as one would for normal procedures, but with dummy bodies, and a special attribute.

package my_pkg is

  procedure Write (
    addr          : in  integer;
    data          : in  integer;
    be            : in  integer;
  );
  attribute foreign of Write : procedure is "Write my_c_funcs.so";

  procedure Read (
    addr          : in  integer;
    data          : out integer;
    be            : in  integer;
  );
  attribute foreign of Read: procedure is "Read my_c_funcs.so";
end;

package body my_pkg is

  procedure Write (
    addr          : in  integer;
    data          : in  integer;
    be            : in  integer;
  ) is
  begin
    report "ERROR: foreign subprogram Write not called";
  end;

  procedure Read (
    addr          : in  integer;
    data          : out integer;
    be            : in  integer;
  ) is
  begin
    report "ERROR: foreign subprogram Read not called";
  end;
end;
        

Note that the procedures in the package still need to have bodies, but dummy code is put there. If the foreign code is missing the error messages will be displayed instead. The attributes map the C functions to the VHDL procedures. Note the last string also defines the shared object file in which the C functions are located. So long as this package is compiled into the work library, then the C functions can be called from VHDL code:

use work.my_pkg.all;

entity my_fli_block is
  port (
         # port list #
       );
end entity my_fli_block;

architecture behaviour of my_fli_block is
  
  process(clk)
  begin
    if clk'event and clk = '1' then
      # some code...

      # Do a write
      if write = '1' then
        Write(waddr, wr_data, wr_byte_en);
      end if;

      # Do a read
      if read = '1' then
        Read(raddr, rd_data, 'h15);
      end if

      # ...some more code
    end if;
  end process;        

Compiling the code

The C is compiled for ModelSim into a shared object file, exactly as for the SystemVerilog DPI interface. Compiling the VHDL is just as for normal VHDL simulations, but also including the package source file. When running the simulation the -pli command line argument is used once more in to load the shared object.

Real World Use Case

The above discussion has shown how to call simple functions in C from SystemVerilog, Verilog and VHDL. A couple of outline functions were called, with integer arguments to do a read and a write to some undefined coded function to illustrate the principles of bridging the chasm. So what can really be done with this?

Well, as a real-world example I want to outline an open-source memory model that I wrote using these techniques. The problem I was trying to solve was that the simulation components being tested had access to a very large address space (gigabytes in fact) and might make accesses at very disparate places in that space. Obviously, I couldn't create an array of logic bit vectors in the gigabytes capacity. The simulation, though accessing over a very wide space, won't actually access the entire space (that would take forever). It happened that I'd already written a C++ model of a memory that could address an entire 64-bit address space, by allocating memory blocks as and when they were accessed. The general principle is shown in the diagram from the model's documentation, using a 3-stage approach and 4Kbyte blocks:

No alt text provided for this image

The API for this C++ model has read and write functions for various word sizes, from bytes to DWORDS, and so it was easy to interface to the simple read and write functions of the PLI C interface of the type we have been discussing (the names are slightly different). Some simple logic around calls to the read and write PLI tasks in the Verilog, with some bursting and SRAM type interfaces, I now have a Verilog memory test component with a very large address space, which runs orders of magnitude faster than a Verilog model and consumes little memory resources. More details of this model can be found in the documentation in the github repository.

Conclusions

We've crossed the logic/software divide and compared techniques for doing so. The article has deliberately restricted to the bare essentials of what is actually available in these interfaces but, having made the leap, we have seen that this opens up a whole opportunity of adding large, possibly complex, functionality in the C/C++ domain using all that's available there, just as in the memory model example.

Which interface to use? I don't think it really matters—which ever are available. Though some are more complex to use than others, once that initial setup is done, they are largely the same. With many simulators being mixed language, you may have access to SystemVerilog, VHDL and Verilog all in the same environment, so you can use whichever programming logic interface is appropriate. Just bare in mind portability. I have usually avoided mixing languages where possible to maximise reuse.

So, have we met the brief regarding the requirement to read and write to memory addresses from a natively compiled program, as requested by my verification manager all those years ago? Well, no. The examples we have looked at are all what I call 'passive' calls. That is the C is called from the HDL and then returns. This is still extremely useful, but not the full Monty. We can't yet write a standalone program where the C/C++ is running, making read and write calls to the simulation. We need a means to make the C/C++ program look like it is running the show when the simulator is in fact the main program. Not easy, and there are a couple of approaches I've used in my professional career, one of which is easier to understand and keeps the passive call approach, and one of which I have constructed in such a way as to allow co-simulation features and truly has a single C/C++ program running, which unlocks more potential from bridging over a PLI. But this will have to wait for the next article.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了