登录查看更多内容

Finite Impulse Response Filters Part 1: Convolution and HDL

Simon Southwell

Semi-retired logic, software and systems designer. Technical writer, mentor, educator and presenter.

发布日期: 2023年9月11日

Introduction

In these articles on convolution and finite impulse response filters I want to move into the world of digital signal processing. Strictly speaking my articles on data compression, including image compression with JPEG, fall under this category but, I think, this article on finite impulse response filters is what most people would consider as definitely in the realm of DSP.

Finite impulse response filters (or FIRs) are based around the process of ‘convolution’, and we will have to have a look at some equations to explain what’s going on, but I promise that there is nothing particularly taxing about this. When we get to a practical solution, we will be down to multiply-and-accumulate and nothing more complicated than that. We will also look at impulse responses—that is, the output you get from a system when an impulse (or delta) is input—which tend to be infinite (assuming unlimited precision). It turns out that if you convolve the impulse response of the desired filter with your signal it will filter that signal as per the filter characteristics. In fact, convolving in the time domain is the same as multiplying in the frequency domain (and vice versa). So, if we wanted a low pass filter, say, in the frequency domain this means that all frequencies from 0 to fc we want to multiply by 1, and all frequencies above fc we want multiplied by 0. This is the ideal frequency response, and the diagram below shows this for a low pass filter.

Even more handy, multiplying everything by 1 in the frequency domain is equivalent convolving a ‘delta function’ in the time domain—i.e., an impulse. Thus, if we know the impulse response of the filter, we can convolve this with our signal and it will filter it, just as if we had taken the signal into the frequency domain, multiplied it by our frequency response (by 1 for low frequencies, and by 0 for high frequencies), and put it back in the time domain again. In fact, this is another way to filtering which is outside of the scope of this article, but we might visit this again when looking digital fourier transforms and the fast fourier transform algorithm in the future. So, we ‘simply’ need to work out the impulse response values of our desired filter and convolve these with our signal. We will look at this in this article and a handy open-source program is provided to help us do just this.

In practical solutions, dealing with infinity is somewhat of problem, but we can truncate things as values tend towards zero, making things finite—and this is where the finite part of the FIR comes from, with the ‘IR’ from the impulse response. The truncation of impulse response values effectively gives a ‘window’ on that response, with everything outside of that window set to zero. Now, simply truncating the values causes unwanted artifacts and things can be improved if the window on the impulse response is shaped in a more gentle type of roll-off. There are many types of window functions that have been discovered/developed over time that improve the situation of reducing the artifacts of truncating an infinite impulse response, and we shall be looking at some of these and what the trade-offs are. The program I mentioned implements many of these (though we shall restrict ourselves to just a few) and so the different performances of the window functions can be explored.

Another source of undesired artifacts come from the fact that we can’t have infinite precision in the values we use for our impulse response but must ‘quantise’ these to be of finite precision, with a set number of bits, such as 8- or 16-bit values. As we shall see, this tends to limit the amount of attenuation in the ‘stop-band’ that can be achieved, and the number of bits chosen needs to match the requirements of the filter being implemented.

Finally, the impulse responses of ideal filters are made up with ‘sinc’ functions, or combinations thereof. A sinc function has a general form of sine(x)/x, which is not too complicated. The diagram below shows a plot for a generic sinc function.

As can be seen, the sine component provides the oscillations and the 1/x component gives the decay with the plot centred on 0, thus negative x on the left and positive x on the right. The mathematicians amongst you may have spotted the if x is 0 the function is infinite, but that is not what’s shown. As x tends to 0 the plot approaches 1, and so this limit is the value at 0. (A bit of a mathematical cheat, I known, but I’m not mathematician enough to know why this is okay.)

Using these functions, as we shall see in the second article, gives the infinite impulse responses of our ideal filter and multiplying these with a window function to make finite gives us our finite impulse response. Thus, these are sometimes called windowed-sinc filters.

Now, don’t worry if this introduction has introduced a lot of concepts in a short space of time as we will revisit these in some more detail in the rest of the articles. And, as mentioned before, when we get to an implementation in HDL, we will only be doing multiplication and addition. All the messing about with trigonometrical functions can be done off-line to calculate the values we will need in the logic, and the provided program will do this for us and we could use this ‘turn key’, though I’d very much like you to understand what’s going on.

The article is divided into two parts. In this first article we will introduce convolution for digital signals and move straight to discussing an HDL implementation. By the end we will have a solution that can be used to construct an FIR but won’t know what values to configure it with to do the filtering that we desire. That will have to wait for the next article, where we will look at sinc functions and their use in constructing impulse responses for our filters and at window functions to modify the infinite impulse responses to a finite set of values.

Before diving straight into convolution, I want to talk briefly about the general characteristics of FIRs and their advantages and disadvantages.

FIR Characteristics

There is not space in this article to give a full treatment to the alternative to FIRs, which are infinite impulse response (IIR) filters. Without going into too much detail IIRs are characterised by the current output being a function of both the current input and also by previous output values (and inputs). This gives them a potential infinite impulse response, which has many advantages. Compared to FIRs there are fewer coefficients required and thus smaller memory requirements to get similar characteristics in terms of cut off frequency transition and stop-band attenuation. Because there are fewer coefficients, the latency is smaller than for FIRs. Also, IIRs can be constructed in such a way as to be equivalent to analogue filters in terms of mapping between the s (analogue) and z (digital) planes (for those that know what this means—which we can’t cover here). This is handy if digitizing a previous analogue system. There is no such equivalent mapping for FIRs.

So, if I were a salesman selling FIRs, I’ve just convinced you to go down the IIR route, right?

Well, unlike FIRs, IIRs have non-linear phase characteristics and in some applications, such as processing audio or biometric data, this can be quite undesirable. Due to the fact that IIRs have feedback paths, using previous output values, they can become unstable, whereas FIRs can’t become unstable for any input signal? since they are a function of input values and impulse response only. Although these articles will limit what filter response we will investigate, none-the-less, FIRs can have an arbitrary frequency response. FIRs must handle numerical overflow to some extent, but this is easier than with IIRs, making IIR design more complex, and the effects of quantisation more severe.

So which one is used for any given application will depend on the requirements, resources, complexity, and all the normal engineering trade-offs in choosing between competing solutions. For these articles, we will explore FIRs.

Convolution

I mentioned in the introduction that convolution was at the heart of FIRs, and so we need to understand this. However, since we will be implementing a logic based convolution solution, I want to move straight on to that implementation afterwards, in a break from a traditional text ordering, because, once we have this solution, making a desired filter from it is simply a matter setting the ‘right numbers’, and the next article will deal with this. Thus, if one is only interested in a filter implementation, and are happy to get the right numbers from the provided winfilter program, then you can skip the next article—though I genuinely hope you don’t.

What is Convolution

The general form of the convolution of two signals (e.g., an input signal ?and an impulse response signal h) is given as:

If the input signal has N points and the impulse response has M points, then the resultant output (o) will be an N+M-1 point signal. What does this actually mean? Well, each output o[i]?is calculated by multiplying the impulse response values from 0 to M-1 with the signal values from i?to i-M-1 and adding all these up. This is summarised in the equation below.

In other words, if we think of our impulse response as an M point array of values, and our signal as an N point array of signals, then for each output we multiply the M array elements of the impulse response with the last M elements of the signal, starting from ?and running backwards, and add all the M multiplication results together. If the signal is finite in length (N), then if a signal value is indexed outside of its valid range, then it is assumed 0, and the multiplication is 0, and thus no effect on the output value. Let’s try and picture this. The diagram below has an 8 point signal to be convolved with a 4 point impulse response. This should yield an 11 point output.

Each box is an element in an array for the signal (green), impulse response (blue) and output (orange). Each output element is calculated as the sum of the overlapping signal and impulse response elements multiplied together. I’ve tried to relate this to the previous equation, showing the specific summation of multiplications on the right by each output element. So, we’ve ‘swept’ the impulse response across the signal for each output calculation, multiplying and accumulating the overlapping elements. Note that the indexes for the response and the signal run in opposite directions, and this is important for the general case of convolution. It is likely that a signal will be asymmetric, and the secondary signal (an impulse response for our purposes but may not be for other uses of convolution) may also be asymmetrical. Thus, to convolve, one or the other signal must be reversed. It doesn’t matter which one, as we could treat the impulse response as our signal, and the signal as the values to be convolved. As I said, the two sets of values need not be for FIR purposes. As we shall see later, the filter impulse responses will be symmetrical, so it won’t matter, but an implementation should reversal do this so that it can be used for any convolution purpose.

What About Arbitrarily Long Signals?

The example in the diagram assumed a finite set of signal values, but in a real DSP usage, the signal will probably be an unending set of samples to be processed. Notice from the diagram that the impulse response only ever overlaps the signal values over a limited range. Basically, the length of the impulse response. So, so long as the values to be convolved with the signal are of finite length then, if we remember the last M signal values, we can keep producing output values indefinitely.

Now this is beginning to suggest a solution in logic. A multi-bit shift register is good for remembering the last M values of an input. For our purposes, the impulse response we want to convolve with our signal is fixed, and so we just need a look-up-table of values. Then, to convolve the impulse response with the signal for a new output value, we simply multiply and accumulate the elements of the LUT and the shift register. When a new input arrives, and the shift register is shifted, with the oldest value disposed of and the new value added, the system is ready to generate a new output. It’s that simple!

领英推荐

A Deep Dive into Custom FFT Implementation in C#: From…

David Shergilashvili 1 个月前

The Riemann Hypothesis (by Deepseek R1)

Alexander Granderath 1 个月前

When to Use Tracks and When to Use Trends to Graph…

Teledyne LeCroy 8 个月前

There are some practical considerations to take into account. The above diagram suggests that there are as many multipliers as there are entries in the LUT (known as ‘taps’). A practical impulse response might need to have as many as 256 taps or more, and the accumulator would have to be able to add that many multiplication results at once. If the rate of new input samples is low relative to the maximum clock frequency that, say, the multiplier logic can run at, then the implementation can do a multiply and accumulate once per cycle for M cycles, and if that period is less than the input sample rate, only one multiplier and one adder is required. In between this, if M cycles is too long, then two circuits running in parallel, doing half the taps (bottom/top or odd/even, for example) can be employed. This would take two multipliers and two adders plus an additional adder to add the two final results. Hopefully you can see that this bifurcation can be extended for quarter, eight, sixteenth etc. parallel calculations, trading off speed against logic resources, depending on the requirements. In the example implementation we will discuss later, the clock rate is sufficiently fast that we can use a single multiply-accumulate logic block to process all the taps with time to spare for the target functionality.

Another consideration is the quantisation Q of the signal and ‘tap’ values. The choice here is dependent on the requirements and could range from double precision floating point (64 bits) to 8 bit integers or fixed point values. Having chosen the number of Q bits, it must also be noted that multiplying two numbers of Q bits wide gives a result width of 2Q bits and the adder must be able to take this size input. Also, when adding numbers, the result can be one bit larger than the inputs (let’s assume they’re both the same width). Thus, potentially, to avoid overflow (or underflow), the adder should allow additional bits for each addition—i.e., an additional M bits. In practice, this becomes impractical, particularly if M is configurable (as our example will be, via a parameter). Various ways of dealing with this are available. Firstly, do nothing and let the accumulator rollover. This may seem ‘lazy’, but it might be that upstream processing makes certain range guarantees that means the accumulation calculation can’t over- or under-flow, and additional logic to deal with it is redundant. The next level is to detect under- or overflow and flag an error. This might come in the form of detecting that an addition would change the sign of the result when the input signs match. I.e., adding two positive numbers should not result in a negative number and, similarly, adding two negative numbers should not result in a positive number.? Having detected over/underflow one could then take a further step and ‘saturate’ the result so that the overflowed or underflowed result is discarded to be replaced with the maximum positive or negative number as appropriate.

Having dealt with these problems, we have a result that is 2Q bits wide but, for many applications, we want the result to be Q bits wide to match the input. We could just take the top Q bits from the result, but we can do better than this and introduce less quantisation noise if we add the most significant bit of the truncated bits to the rescaled value to round up. This, then, picks the nearest rescaled value to the original.

Earlier in the article I made the bold claim that convolving a signal with the delta function (an impulse) in the time domain results in multiplying all frequencies by 1 in the frequency domain—in other words and all-pass filter, which doesn’t alter the signal. Our solution gives us an insight into why that might be. Imagine our impulse response is a delta function, with just one of the taps set to 1.0 (or the equivalent in binary). Now, as we sweep this over our signal, the single tap with the 1 in it will pick out only one value from the signal and will pick each one in turn as it sweeps through. All other values will be masked to 0, being multiplied by 0 from the impulse response. The output, then, being the addition of all the results, and all but one of the multiplications being zero, will simply reproduce the input value unaltered. This is true for whatever input signal we choose to put into the convolution. Thus, this acts as an all pass filter.

Conversely, if we make our signal a delta function (impulse), when we convolve this with our impulse response tap values, it will reproduce the impulse response at the output, as it picks one value at a time as it sweeps through. So here we see, in our solution, that convolving in the time domain is the same as multiplying in the frequency domain, and that an impulse in the time domain is the same as an all pass filter.

HDL Implementation

I have put together a configurable Verilog solution, firfilter, on github. Firstly, let’s have a look at its ports. The diagram below shows the fir module block diagram with its signals and parameters.

The module has a clock (clk) and power-on reset (nreset, active low). It also has a synchronous external reset (reset, active high) so that external logic can reset the module outside of POR. There is also a simple subordinate memory mapped bus port which allows the tap values to be programmed from an external source.

The other two ports consist of the input samples (sample) with an accompanying valid (smplvalid), and the output values (out) with its own valid signal (opvalid). The number of bits for the input and output values is determined by the SMPL_BITS parameter (with a default value of 12). Needless to say, these values are going to be treated as signed numbers. The TAPS parameter (with a default of 127) determines the number of logical taps the impulse response LUT will have. It will also determine the width of the address (addr) on the memory mapped port to be able to index all the tap entries.

Actually, as this implementation is targeting an FIR implementation, and because the impulse responses will be symmetrical about the centre value, an optimisation is made to only store half the data, plus the centre value. Thus, the actual storage defined is TAP/2+1 tap entries and logic will be used to recreate the full impulse response. If the size of the LUT is not a limiting factor, then this need not be done, saving on a little bit of logic, and making the implementation a more generic convolution solution. However, I wanted to illustrate some optimisation considerations that might be required when looking at the logic.

One other deviation in this implementation from the generic diagram seen earlier is that the taps are actually SMPL_BITS+1 wide. This allows for the equivalent of 1.0 to be stored, otherwise the maximum value is equivalent to 0.111111111111… (in binary) which gives a slight scaling to the output (an attenuation), the degree of which is dependent of the magnitude of SMPL_BITS (the larger, the less of an error). This may not be an issue for some applications, but this will make things easier to cross-correlate results to what we’ve discussed and not introduce any scaling errors.

Indexing the Tap Values

Before jumping into the convolution logic, we need some logic to extract values from the tap LUT (taps—an array of signed values). These will be programmed with the impulse response values from index 0 to N/2. So, for the default TAPS value of 127, this is 0 to 64. When a new input value arrives (smplvalid is 1 for a cycle), a register count is set to TAPS and counts down to zero, decrementing each clock cycle. For the first half we want to index the LUT from 0 to N/2, and then count back down from N/2-1 down to 0. A combinatorial bit of logic does this calculation assigning a wire tapidx:

wire [$clog2(TAPS)-1:0] tapidx = (countdlyd > (TAPS+1)/2-1) ? ?????????????????????????????????? ~countdlyd : ??????????????????????????????????? count[$clog2(TAPS)-2:0];

Note that countdlyd is the count value delayed by one cycle, which is equivalent to count+1 when the count is active, saving an adder, aligning the value, and relieving the timing.

Multiply and Accumulate

The heart of the convolution is a multiply and accumulate function. In many FPGAs, ‘DSP’ functions are included. For example, the Intel Cyclone V or the AMD Zynq 7000 series FPGAs, both fairly low cost solutions. These DSP blocks are basically multiply and accumulate blocks (though they can be used as just multipliers)—just what we need. One could instantiate these blocks as modules, but the synthesis tools can infer the functionality and construct the blocks we need for us. It does help, though, if we construct the logic to look like a multiply and accumulate equation. In the fir module, the synchronous process has the code:

  if (countdlyd)
  begin
      accum <= (tapval * validsample) + accum;
  end

The accum register is just what you’d expect, accumulating the values over the convolution. The tapval is the registered extracted LUT value (taps[tapidx]). The validsample signal is registered version of the shift register (smpls[]) values, as indexed by count, but also validated as containing a value. After POR, the shift register has no valid values and could be random. POR could reset them all, but this could be a lot of bits (TAPS × Q+1) so, in this implementation, a register, smplsvalid, of TAPS bit wide is cleared on reset, and is used as a single bit shift register, filling with 1s at each shift in of the input samples. Thus, validsample is the indexed shift register value if the equivalent bit of smplsvalid is set, otherwise it’s 0. The update of accum happens whilst the count value (delayed) is non-zero. It gets zeroed each time a new input sample arrives.

Rescaling

Lastly, when in the last cycle, the output is updated with the accum value rescaled. The code fragment below shows this:

if (lastcycle)
begin
    out     <= accum[SMPL_BITS*2-1:SMPL_BITS-1] + accum[SMPL_BITS-2];
    opvalid <= 1'b1;
end

Note that the opvalid is set to a default value of 1'b0 at the beginning of the synchronous process so that, when set here, it pulses for just one cycle, being cleared on the next. And that’s all there is to it. We now have the basis of a finite impulse response filter, if only we knew what to put in the tap values. That will be the subject of the next article.

With this design, having a single multiply and accumulate function, a new output takes TAPS + 3 clock cycles to calculate. The extra three cycles allow for registering at both the input and output, and around the multiply-and-accumulate function. Thus, for a 100MHz clock this would be, for the default TAPS size of 127, 1.3μs or a rate of 769KHz. This 1.3μs is also the latency through the filter.

Conclusions

In this first of two articles, digital convolution was introduced showing how two signals can be convolved in the time domain and what this means in the frequency domain. By making one of the signals a finite impulse response we can use convolution to filter a signal.

A generic solution was explored, where there were no restrictions on resources such as memory, multipliers, or the number of terms in an addition. This was used to demonstrate an intuitive understanding of why convolution in the time domain is equivalent to multiplication in the frequency domain, and why a delta function (or impulse) in the time domain is an ‘all-pass’ filter—i.e., multiples everything in the frequency domain by 1.

From this idealised solution some practical constraints were formalised, and a Verilog based implementation was discussed. The solution requires only multiply and accumulate functionality over and above normal logic, and the solution is, I hope, simple and easy to understand. We left this HDL implemented but with no knowledge of how to configure this logic to act as a filter. We still need the numbers.

In the next article, I want to concentrate on generating the tap values required to turn our HDL implementation into a filter. It will cover generating impulse responses using sinc functions, ‘windowing’ the sinc function to make finite, designing these to meet our requirement, and analysing these to validate the responses. To do this we will take a specification from digital audio processing as an example, for filtering 48KHz samples with a 4 × oversampled low-pass filter, a cut off at 20KHz and a stop -band attenuation target of -60dB. We will use my winfilter program to help us do this but will discuss the calculations behind all this for a fuller understanding.

Srinivasan Venkataramanan

CEO & Founder AsFigo

1 年

Thanks for such a detailed post, sincerely appreciate it. Wondering if you have a Matlab/GNU Octave code for the FIR to serve as a reference, please do add it if possible.

Osman Tutaysalg?r

Staff Digital Design Engineer at Synopsys

1 年

CFBR

1 次回应

查看更多评论

要查看或添加评论，请登录

Simon Southwell的更多文章

Nested Vectored Interrupts and Co-Simulation

2025年1月30日

Nested Vectored Interrupts and Co-Simulation

Introduction In general, co-simulation, in the context discussed in this article, is the ability to simulate in both…

1 条评论
Instruction Set Simulators, gdb, IDEs and Co-simulation

2024年12月1日

Instruction Set Simulators, gdb, IDEs and Co-simulation

Introduction I have previously discussed instruction set simulators (ISS) in various contexts, including a discussion…

1 条评论
The Python/C Interface

2024年9月4日

The Python/C Interface

Introduction In this article I want to talk about the Python C interface. Now that can be one of two directions, of…

2 条评论
Logic Development and Make

2024年8月15日

Logic Development and Make

Introduction The ‘make’ utility has been around a long time now (since 1976), and my relationship with it isn’t much…

3 条评论
Ethernet and TCP/IP

2024年8月2日

Ethernet and TCP/IP

Introduction I have had a few requests to cover something about ethernet over the last few months and so I’ve finally…

1 条评论
Performance Measurements of VProc on Verilator

2024年6月30日

Performance Measurements of VProc on Verilator

Introduction I have written about the VProc virtual processor before, in terms of what it is, what it is used for, how…

5 条评论
Introduction to USB: Part 5

2024年3月4日

Introduction to USB: Part 5

Introduction In the first 4 articles in this series, we got ourselves to a point just shy of transferring data in USB…

1 条评论
Introduction to USB: Part 4

2024年2月8日

Introduction to USB: Part 4

Introduction In the previous article we looked at the USB 3 physical layer, looking at encoding, scrambling and the…
Introduction to USB: Part 3

2024年2月4日

Introduction to USB: Part 3

Introduction In the previous two articles (see here and here) we got up to USB 2.1 with a half-duplex differential pair…

1 条评论
The VProc Virtual Processor VIP

2024年1月2日

The VProc Virtual Processor VIP

Introduction I have written about the VProc virtual processor verification IP in a previous article, which was the…

1 条评论

See all articles

Finite Impulse Response Filters Part 1: Convolution and HDL

Simon Southwell

Semi-retired logic, software and systems designer. Technical writer, mentor, educator and presenter.

Introduction

FIR Characteristics

Convolution

What is Convolution

What About Arbitrarily Long Signals?

领英推荐

HDL Implementation

Indexing the Tap Values

Multiply and Accumulate

Rescaling

Conclusions

Simon Southwell的更多文章

社区洞察

其他会员也浏览了

Why Noise Could Be Ruining Your Images (And How to Fix It)

Emergent Algebraic Structures in Digital Logic: A Deep Exploration into NAND Gates Through Group and Category Theories, and Galois Fields

Kalman Filter from A to Z

Basic understanding of Linearity & Nonlinearity in Amplifiers, DPD, Harmonic and Intermodulation Distortion

Common sense = thinking()...tomorrow?

Determine Associative Strength Attributes of Computational Values Represented as Points in Hyperdimensional Space Using Hypervectors

The integration of machine learning (ML) into CFD

Hardware-friendliness of HyperPlonk

Archiving photos on a ZX Spectrum

Introduction

FIR Characteristics

Convolution

What is Convolution

What About Arbitrarily Long Signals?

领英推荐

HDL Implementation

Indexing the Tap Values

Multiply and Accumulate

Rescaling

Conclusions

Simon Southwell的更多文章

Nested Vectored Interrupts and Co-Simulation

Instruction Set Simulators, gdb, IDEs and Co-simulation

The Python/C Interface

Logic Development and Make

Ethernet and TCP/IP

Performance Measurements of VProc on Verilator

Introduction to USB: Part 5

Introduction to USB: Part 4

Introduction to USB: Part 3

The VProc Virtual Processor VIP

社区洞察

其他会员也浏览了

Why Noise Could Be Ruining Your Images (And How to Fix It)

Emergent Algebraic Structures in Digital Logic: A Deep Exploration into NAND Gates Through Group and Category Theories, and Galois Fields

Kalman Filter from A to Z

Basic understanding of Linearity & Nonlinearity in Amplifiers, DPD, Harmonic and Intermodulation Distortion

Common sense = thinking()...tomorrow?

Determine Associative Strength Attributes of Computational Values Represented as Points in Hyperdimensional Space Using Hypervectors

The integration of machine learning (ML) into CFD

Hardware-friendliness of HyperPlonk

Archiving photos on a ZX Spectrum