Using incomplete sub-multipliers for building multipliers on FPGAs

Using incomplete sub-multipliers for building multipliers on FPGAs

This article describes the core idea of our recent paper "Small Logic-based Multipliers with Incomplete Sub-Multipliers for FPGAs" from Andreas B?ttcher and me. It was accepted for presentation at ARITH 2024 (https://lnkd.in/eMxQHvCZ), a preprint is available on arXiv (https://lnkd.in/eKk8tbpX)

Performing multiplications in a computer follows similar ideas like performing multiplications by hand using paper & pencil. We all learned in school how to perform large multiplications by only knowing the multiplications of a single digit (which we all had to memorize). These results are properly shifted and added to compute the final product.

Machines multiply numbers in a very similar way; partial products are computed which are shifted and added. In the simplest form, a partial product can be a 1x1 bit operation that can be computed using a simple AND gate. The look-up tables of FPGAs can realize Boolean functions of typically 5 to 6 inputs, so a single AND gate is not efficient to implement; the LUT can do much more than that. One efficient way used before is to perform a 3x2 multiplication. It requires 5 inputs and the result ranges from 0 to 7x3=21 (using the maximum input each). Hence, the result requires 5 bits in total, so we need five 5-input LUTs to compute the product. As each of the 6-input LUTs of current FPGAs can realize two 5-input LUTs, we need three 6-input LUTs in total. Note that one of the six possible 5-input LUTs is not usable anymore.

From the view of previous work in multiplier tiling, this multiplier can be seen as a 3x2 tile as on the left of the figure above. From that point of view, the maximum result of the multiplication can be alternatively seen as the sum of the corresponding bit weights in each tile position: 2^0+2^1+2^1+2^2+2^2+2^3=21.

Now, if we remove the position with weight 2^3, the maximum result is 2^0+2^1+2^1+2^2+2^2=13, which can be represented using 4 bits. Hence, only four 5-input LUTs are required that perfectly map to two 6-input LUTs in total. No waste of another 5-input LUT and more efficient.

Now you should argue that this is not a valid multiplication anymore. It is incomplete as it misses some values!

Right! But the missing position can be covered by another tile to build a complete multiplier! This is what we explore in the paper.

If this was interesting to you, check out the paper or visit Andreas' talk at ARITH 2024!


要查看或添加评论,请登录

社区洞察

其他会员也浏览了