Playing with the Data Order in FFTs

Playing with the Data Order in FFTs

An interesting topic when designing fast Fourier transform (FFT) architectures is the data order. When we want to achieve an optimize FFT architectures, a different input or output order may influence the resulting architecture. Likewise, whether input data arrive in continuous flow or in bursts may end up in a different architecture. In essence, adapting the design of the architecture to the data flow is what allows us to achieve an efficient result.

Traditionally, it was believed for many years that the FFT algorithm was related to the input and output data order. It was common to find papers that claimed that radix-2 DIF FFT architectures received inputs in natural order and provided outputs in the so-called bit-reversed order, whereas radix-2 DIT FFTs received inputs in bit-reversed order and provided outputs in natural order.

However, this belief was refuted in the following paper, where it was demonstrated that the data order is completely independent of the FFT algorithm:

Mario Garrido, “A new Representation of FFT Algorithms using Triangular Matrices”, IEEE Trans. Circuits Systems I, Vol. 63, No. 10, pp. 1737-1745, Oct. 2016. (IEEE Xplore, Open Access).

Essentially, any input and output data order and any data order in the intermediate stages of the FFT architecture can be combined with any FFT algorithm. This is what has made it possible to play with the data order and the algorithms in the followign paper in order to obtain favourable mappings that reduce the number of rotators and their complexity:

Mario Garrido, Shen-Jui Huang and Sau-Gee Chen, “Feedforward FFT Hardware Architectures based on Rotator Allocation”, IEEE Trans. Circuits Systems I, Vol. 65, No. 2, pp. 581-592, Feb. 2018. (IEEE Xplore, Open Access).

This idea has been continued by my PhD student Pedro Paz Mongil , who is currently developing and advanced algorithm to reduce the computations in FFT architectures. In his research, Pedro Paz Mongil is generalizing the idea of rotator allocation to carry out a full exploration of the design space of FFT architectures.

Another approach that explores the possibilities of the data order in FFTs has been presented this year in the following paper:

Zeynep Kaya and Mario Garrido, “Novel Access Patterns Based on Overlapping Loading and Processing Times to Reduce Latency and Increase Throughput in Memory-based FFTs”, IEEE International Symposium on Computer Arithmetic, Málaga (Spain), Jun. 2024. (IEEE Xplore).

This work by Zeynep Kaya deals with memory-based FFTs . In it, we studied the relation between the place in the architecture where data are input and output and the order of inputs and outputs. As a result of this analysis, we observed that the loading time and the time to collect the output data can be eliminated in some configurations, which results in higher throughput and lower latency.

Nowadays, Zeynep Kaya continues this research line by exploring a new memory-based architectures that removes the loading time and the time for collecting the outputs while supporting natural input and output order.

In 2025, we expect to have the final results of the excellent research works conducted by Pedro Paz Mongil and Zeynep Kaya , both of them motivated by the explorations of the possibilities that the data order in FFTs offers.

This work has been carried out in the frame of the project "RAFFTING: Realizing Advanced FFT Implementations for 6G", PID2021-126991NA-I00, funded by the Spanish Ministry of Science and Innovation, the AEI, and the European Regional Development Fund through the call "Proyectos de Generación de Conocimiento, 2021".





要查看或添加评论,请登录

Mario Garrido的更多文章

  • Good News from DCIS 2024!

    Good News from DCIS 2024!

    Yesterday we were informed that the three papers that we submitted to the conference DCIS 2024 have been accepted to be…

    1 条评论
  • Further Improvements in MSC FFTs

    Further Improvements in MSC FFTs

    At the beginning of this year I wrote a post on the multi-path serial commutator (MSC) FFT, explaining the first…

  • Even More Efficient FFT Cores

    Even More Efficient FFT Cores

    We continue developing highly optimized FFT cores that beat the state-of-the-art again and again. With this aim, we…

    1 条评论
  • What is the limit in pleasing reviewers to get our papers accepted?

    What is the limit in pleasing reviewers to get our papers accepted?

    This week we have had a very interesting anecdote that has made me reflect on the paper review process. For those who…

    5 条评论
  • Towards Solving the Non-Power-of-Two FFT Challenge

    Towards Solving the Non-Power-of-Two FFT Challenge

    A few months ago, I wrote a post about the challenge of designing non-power-of-two (NP2) fast Fourier transform (FFT)…

  • The MSC FFT

    The MSC FFT

    Today I want to introduce the multi-path serial commutator (MSC) FFT. This type of architecture is the parallel version…

  • Is it Possible to Design Efficient Non-Power-of-Two FFT Hardware Architectures?

    Is it Possible to Design Efficient Non-Power-of-Two FFT Hardware Architectures?

    The challenge of designing non-power-of-two (NP2) FFT architectures has existed for decades. However, nowadays there is…

  • Towards Low-Latency FFT Hardware Architectures

    Towards Low-Latency FFT Hardware Architectures

    One of the key topics in which we are working nowadays is the design of low-latency FFTs. This is an area that has…

    2 条评论
  • FFT Cores

    FFT Cores

    We have started a new Open Science initiative to make our research on fast Fourier transform (FFT) algorithms and…

    2 条评论
  • Sometimes, the best solutions are, indeed, simple

    Sometimes, the best solutions are, indeed, simple

    My PhD student Pedro Paz Mongil came one day to my office asking me about the calculation of the arcsine and arccosine…

    3 条评论

社区洞察

其他会员也浏览了