from FFT to SVD

from FFT to SVD

Look into what SVD can do for us through the lens of FFT.

FFT, Fast Fourier Transform, so famous that needless me to explain further here.

SVD, Singular Value Decomposition. In short definition, it’s a factorization technique from linear algebra world, used extensively in data science.

We are not going to dive into the mathematics of SVD in this post, there are many good materials around detailing that. Given my DSP root, I will try to look into what SVD can do for us through the lens of FFT, hopefully it would be more fun and intuitive.

First, we need a real example and we will use something very familiar with, speech. Pack lots of speech data into 64 samples long frames and feed them into a SVD calculation tool, we will get 3 matrixes back, [U, S, V]. U is the one we want to focus here, it contains fundamental components that when combined, it can reconstruct any of the fed in speech frames. Sounds familiar? Following diagram illustrates this example application of SVD, in comparison with FFT.

?

As illustrated above, the transform process is done by convolution between the input signal and vectors in U. This convolution process is like the butterfly algorithm for calculating FFT.

Now let’s zoom into U. It is a 64x64 matrix, consists of 64 column vectors (convolution filters) where each is 64 samples long. Each of these 64 vectors represents a fundamental component in the new domain, just like a frequency bin in FFT. More interestingly, these vectors are arranged in a prioritized manner, with the most important one at the front. How handy is that!

We all familiar with frequency bin, energy and phase of a single frequency. But what those U vectors look like? Here are the first 5 of them.

It turns out not that different from frequency bin, they are all very tidy sine waves. This makes things a lot more interesting, because we could use FFT to make sense of the U vectors and here it is, the spectrum of those first 5 U vectors.

All of them are basically DC offset + strong single frequency. Some of you might already notice the phase information. Take the yellow and purple line for example, both have pretty much the same 500Hz frequency, the difference is in the DC offset amount and phase angle. This is very different to FFT, which has 0Hz bin dedicated to represent DC offset and give each frequency bin complete freedom in phase angle.

Side topic, the FFT phase information are often untouched in practice due to it’s too difficult to use. The reason being easily corrupted by noise and the wrapping doesn’t help either. On the other hand, SVD treats the phase angle (and DC offset) just like frequency, specifying a fixed value for each component (U vector). Well, it has to, since it uses single value to represent the weight of each component (convolution filter output is a single value). My intuition tells me, this way of encoding the phase might make it more usable.

Back to dissecting U vectors, are all U vectors like the first 5 we just seen? Not exactly, here are the 34th and 35th vector, clearly consist of more than one frequency, a higher frequency signal modulated by a lower frequency signal.

Although these U vectors look very free-style, there is one important characteristic shared by both frequency bins and the U vectors, it’s they are orthogonal. For instance, the dot product of the 34th and 35th vectors plotted above is -2.8e-16, close enough to zero. Even I pick two vectors with strong DC offset (green and purple lines of earlier plot), their dot product is 1.47e-16. Don’t ask me how, it’s just magic ???

Ok, enough of numbers and magic, back to plain English. What this example tells us about what SVD does:

SVD decompose signals into non-correlated components and also rank these components in the order of importance.

A tool that can tell us what the important fundamental components are from a sea of data, no doubt it’s used extensively in data science then.

This powerful tool is not without its unique cost though. We know how to read the output of 128 points FFT, each data point carries fixed meaning. In contrast, the output of transform using a U matrix means very little without knowledge of that particular U matrix. Like what we’ve done earlier, understand the amount of DC offset, frequency points and phase angles represented by each U vector. These knowledge then allow us to correspond the transform output to more meaningful elements. All these are because SVD is a data driven technique, it calculates the transform matrix (U) based on input dataset.


So is that the conclusion, a powerful tool but not easy to use. Well, what if we (human) are not the one to read the output…

要查看或添加评论,请登录

Weiming Li的更多文章

  • free trial: integrate NN processing in MCU with 2 lines of C code

    free trial: integrate NN processing in MCU with 2 lines of C code

    Trying is believing. In this post, I would enable everyone to be able to try bringing my example NN processing into…

  • Ray Tracing for sound, the holy grail for data generation?

    Ray Tracing for sound, the holy grail for data generation?

    Ray Tracing (RT) should be a very familiar term in 3D gaming, but what might be less known is its application in…

  • from minimize error to raise quality

    from minimize error to raise quality

    In this post, I am going to share the finding (and audio samples) of applying perceptual quality as training target for…

  • Looking forward to Cortex-M55 + Ethos-U55

    Looking forward to Cortex-M55 + Ethos-U55

    The 50x inference speed up and 25x efficiency jump are very exciting, but what I really look forward to is how it could…

  • SVDF, just give Conv a bit of time

    SVDF, just give Conv a bit of time

    Simply add a dimension of time to standard Conv layer, it becomes the SVDF layer, the core component powering our…

  • Peek into the future

    Peek into the future

    The Devil is in the details, a often hidden small detail that we must not miss when interpreting performance figures…

  • Tiny model for tiny system

    Tiny model for tiny system

    Large model shows us the limitless perspective of what’s possible, but model doesn’t have to be big to do amazing…

    6 条评论
  • build trust with black box

    build trust with black box

    Putting a black box in a product requires courage, a few ways to turn some of the courage into confidence. A NN model…

  • from batch to streaming

    from batch to streaming

    Unexpected complication I wish I were well aware of from the beginning. If you coming from a conventional DSP…

  • Fuzzy Memory

    Fuzzy Memory

    I don’t mean the kind we have after a hangover, but the kind powering some of the greatest models we know. “But do I…

社区洞察

其他会员也浏览了