Audio Features of ML
Dhanushkumar R
Microsoft Learn Student Ambassador - BETA|Data Scientist-Intern @BigTapp Analytics|Ex-Intern @IIT Kharagpur| Azurex2 |Machine Learning|Deep Learning|Data Science|Gen AI|Azure AI&Data |Technical Blogger
Why audio?
Audio feature categorisation
● Level of abstraction
● Temporal scope
● Music aspect
● Signal domain
● ML approach?
various strategies for categorizing audio features used in machine learning. The presenter introduces the concept of audio features, which are attributes extracted from sound, and explains their importance in training intelligent audio systems. Different strategies for categorization are highlighted
Level of Abstraction This strategy classifies audio features into low-level, mid-level, and high-level categories based on their complexity and abstraction. Low-level features include statistical attributes extracted directly from audio, while mid-level features start to make sense from a perceptual perspective. High-level features are more abstract and map to musical constructs.
Temporal Scope This strategy categorizes features based on the temporal duration they cover. Instantaneous features provide information about short audio chunks (around 50-100 milliseconds), segment-level features analyze seconds-long segments, and aggregate features describe the entire sound by combining lower-level temporal information.
Music Aspects This strategy focuses on music-related aspects of features. It categorizes features based on their relevance to musical elements such as note onsets, key, melody, rhythm, and tempo.
Signal Domain This strategy categorizes features based on the signal domain they belong to. Time domain features, derived from waveform data, provide information about events in a sound over time. Frequency domain features analyze the frequency components of sound using techniques like the Fourier transform. Time-frequency domain features, such as spectrograms, provide combined information about both time and frequency.
Machine learning approach
Traditional Ml approach
Deep Learning approach
Traditional ML
Intially Digital Signal Processing is augumented and then processed by rule based approach and then traditional Ml algortihms are applied to extract the features by feature engineering ,Finally Deep Learning approach is applied to extract automatic features extraction.
Feature Extraction:
the process of extracting time domain and frequency domain features from audio signals. that frames help capture perceptible audio chunks.
FRAME:
An audio frame, or sample,?contains amplitude (loudness) information at that particular point in time. To produce sound, tens of thousands of frames are played in sequence to produce frequencies.
The frame size is usually the power of 2num samples.
The duration of frame is
`d_f = {1/s_r *k}`
领英推荐
where the s_r is the duration of the sampling rate and k is the sample size or the single sample
Time Domain Feature Pipleline
Here in feature extraction time domian ,we appply the framing and then aggregate to augument the results for the statistical model like GMM and then feature value or vector or matrix are extracted.
Frequency Domain Feature Pipeline
It starts from a analog sound and then apply quantisation and then framing of signal is done .Here Fourier transformation is done to represent the time domain representation to frequency respresentation.
In time domain ,basically as the amplitude as a function of time and so all the events across the frequency domain look at the frequency components of a sound and how much they contribute to the overall sample (i.e.).,diffreent frequency bands contribute to overall sound .
Spectral Leakage:
It happens when we are processing a signal or taking the fourier transform of the signal that isn't a integer number of periods this basically happens all the time.
the End points are discontinuous because they are in integer number of periods .this discontinues appears as high frequency components not present in the original signal
Some of this discontinuous frequencies are discontinuous are leaked to other higher leakage called Spectral leakage.
The red box represents the higher frequencies is an example of speactral leakage which have substainal contribution to the sound.To overcome the spectral leakage Windowing is comes in to an act.
Windowing
To overcome the spectral method,windowing is comes into act.,the windowing function is applied to each frame and eliminates the sample at both ends of a frame.Generatee periodic signal
Frequency-domain feature pipeline