Audio Features of ML

Audio Features of ML


Why audio?

  1. Description of sound
  2. Different features capture different aspects of sound
  3. Build intelligent audio system

Audio feature categorisation

● Level of abstraction

● Temporal scope

● Music aspect

● Signal domain

● ML approach?

various strategies for categorizing audio features used in machine learning. The presenter introduces the concept of audio features, which are attributes extracted from sound, and explains their importance in training intelligent audio systems. Different strategies for categorization are highlighted

Level of Abstraction This strategy classifies audio features into low-level, mid-level, and high-level categories based on their complexity and abstraction. Low-level features include statistical attributes extracted directly from audio, while mid-level features start to make sense from a perceptual perspective. High-level features are more abstract and map to musical constructs.

Temporal Scope This strategy categorizes features based on the temporal duration they cover. Instantaneous features provide information about short audio chunks (around 50-100 milliseconds), segment-level features analyze seconds-long segments, and aggregate features describe the entire sound by combining lower-level temporal information.

Music Aspects This strategy focuses on music-related aspects of features. It categorizes features based on their relevance to musical elements such as note onsets, key, melody, rhythm, and tempo.

Signal Domain This strategy categorizes features based on the signal domain they belong to. Time domain features, derived from waveform data, provide information about events in a sound over time. Frequency domain features analyze the frequency components of sound using techniques like the Fourier transform. Time-frequency domain features, such as spectrograms, provide combined information about both time and frequency.

Machine learning approach

Traditional Ml approach

Deep Learning approach

Traditional ML

  1. Amplitude envelope
  2. Root-mean square energy
  3. Zero crossing rate
  4. Band energy ratio
  5. Spectral centroid
  6. Spectral flux
  7. Spectral spread
  8. Spectral roll-off

No alt text provided for this image
ML approach


No alt text provided for this image
Deep Learning approach

Intially Digital Signal Processing is augumented and then processed by rule based approach and then traditional Ml algortihms are applied to extract the features by feature engineering ,Finally Deep Learning approach is applied to extract automatic features extraction.


Feature Extraction:

the process of extracting time domain and frequency domain features from audio signals. that frames help capture perceptible audio chunks.

FRAME:

An audio frame, or sample,?contains amplitude (loudness) information at that particular point in time. To produce sound, tens of thousands of frames are played in sequence to produce frequencies.

The frame size is usually the power of 2num samples.

The duration of frame is

`d_f = {1/s_r *k}`

where the s_r is the duration of the sampling rate and k is the sample size or the single sample


Time Domain Feature Pipleline

No alt text provided for this image

Here in feature extraction time domian ,we appply the framing and then aggregate to augument the results for the statistical model like GMM and then feature value or vector or matrix are extracted.

Frequency Domain Feature Pipeline

It starts from a analog sound and then apply quantisation and then framing of signal is done .Here Fourier transformation is done to represent the time domain representation to frequency respresentation.

In time domain ,basically as the amplitude as a function of time and so all the events across the frequency domain look at the frequency components of a sound and how much they contribute to the overall sample (i.e.).,diffreent frequency bands contribute to overall sound .

Spectral Leakage:

It happens when we are processing a signal or taking the fourier transform of the signal that isn't a integer number of periods this basically happens all the time.

the End points are discontinuous because they are in integer number of periods .this discontinues appears as high frequency components not present in the original signal

No alt text provided for this image



Some of this discontinuous frequencies are discontinuous are leaked to other higher leakage called Spectral leakage.

No alt text provided for this image
spectral leakeage

The red box represents the higher frequencies is an example of speactral leakage which have substainal contribution to the sound.To overcome the spectral leakage Windowing is comes in to an act.

Windowing

To overcome the spectral method,windowing is comes into act.,the windowing function is applied to each frame and eliminates the sample at both ends of a frame.Generatee periodic signal


No alt text provided for this image
Haan window


No alt text provided for this image
Windowing

Frequency-domain feature pipeline

No alt text provided for this image






要查看或添加评论,请登录

社区洞察

其他会员也浏览了