登录查看更多内容

Audio Features of ML

Dhanushkumar R

Microsoft Learn Student Ambassador - BETA|Data Scientist-Intern @BigTapp Analytics|Ex-Intern @IIT Kharagpur| Azurex2 |Machine Learning|Deep Learning|Data Science|Gen AI|Azure AI&Data |Technical Blogger

发布日期: 2023年8月6日

+ 关注

Why audio?

Description of sound
Different features capture different aspects of sound
Build intelligent audio system

Audio feature categorisation

● Level of abstraction

● Temporal scope

● Music aspect

● Signal domain

● ML approach?

various strategies for categorizing audio features used in machine learning. The presenter introduces the concept of audio features, which are attributes extracted from sound, and explains their importance in training intelligent audio systems. Different strategies for categorization are highlighted

Level of Abstraction This strategy classifies audio features into low-level, mid-level, and high-level categories based on their complexity and abstraction. Low-level features include statistical attributes extracted directly from audio, while mid-level features start to make sense from a perceptual perspective. High-level features are more abstract and map to musical constructs.

Temporal Scope This strategy categorizes features based on the temporal duration they cover. Instantaneous features provide information about short audio chunks (around 50-100 milliseconds), segment-level features analyze seconds-long segments, and aggregate features describe the entire sound by combining lower-level temporal information.

Music Aspects This strategy focuses on music-related aspects of features. It categorizes features based on their relevance to musical elements such as note onsets, key, melody, rhythm, and tempo.

Signal Domain This strategy categorizes features based on the signal domain they belong to. Time domain features, derived from waveform data, provide information about events in a sound over time. Frequency domain features analyze the frequency components of sound using techniques like the Fourier transform. Time-frequency domain features, such as spectrograms, provide combined information about both time and frequency.

Machine learning approach

Traditional Ml approach

Deep Learning approach

Traditional ML

Amplitude envelope
Root-mean square energy
Zero crossing rate
Band energy ratio
Spectral centroid
Spectral flux
Spectral spread
Spectral roll-off

No alt text provided for this image — ML approach

Intially Digital Signal Processing is augumented and then processed by rule based approach and then traditional Ml algortihms are applied to extract the features by feature engineering ,Finally Deep Learning approach is applied to extract automatic features extraction.

Feature Extraction:

the process of extracting time domain and frequency domain features from audio signals. that frames help capture perceptible audio chunks.

FRAME:

An audio frame, or sample,?contains amplitude (loudness) information at that particular point in time. To produce sound, tens of thousands of frames are played in sequence to produce frequencies.

The frame size is usually the power of 2num samples.

The duration of frame is

`d_f = {1/s_r *k}`

Universal Production Music 1 年前

The Audible Harmony: 44, 48, and 96 kHz in Music

Manuel Marino 1 年前

Best Digital Mixer for Live Performance

Arsalan Memon SEO 3 个月前

where the s_r is the duration of the sampling rate and k is the sample size or the single sample

Time Domain Feature Pipleline

Here in feature extraction time domian ,we appply the framing and then aggregate to augument the results for the statistical model like GMM and then feature value or vector or matrix are extracted.

Frequency Domain Feature Pipeline

It starts from a analog sound and then apply quantisation and then framing of signal is done .Here Fourier transformation is done to represent the time domain representation to frequency respresentation.

In time domain ,basically as the amplitude as a function of time and so all the events across the frequency domain look at the frequency components of a sound and how much they contribute to the overall sample (i.e.).,diffreent frequency bands contribute to overall sound .

Spectral Leakage:

It happens when we are processing a signal or taking the fourier transform of the signal that isn't a integer number of periods this basically happens all the time.

the End points are discontinuous because they are in integer number of periods .this discontinues appears as high frequency components not present in the original signal

Some of this discontinuous frequencies are discontinuous are leaked to other higher leakage called Spectral leakage.

The red box represents the higher frequencies is an example of speactral leakage which have substainal contribution to the sound.To overcome the spectral leakage Windowing is comes in to an act.

Windowing

To overcome the spectral method,windowing is comes into act.,the windowing function is applied to each frame and eliminates the sample at both ends of a frame.Generatee periodic signal

Frequency-domain feature pipeline

Audio Features of ML

Dhanushkumar R

Microsoft Learn Student Ambassador - BETA|Data Scientist-Intern @BigTapp Analytics|Ex-Intern @IIT Kharagpur| Azurex2 |Machine Learning|Deep Learning|Data Science|Gen AI|Azure AI&Data |Technical Blogger

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Why can’t the Audio Industry be more inventive?

Audio Frequency Oscillator Market 2023: How the Market Will Witness Substantial Growth in the Upcoming years

Colors of Noise

The Power of Sound: How Audio Professionals Shape the Mood in F&B and Entertainment Venues

How do you utilize sound production for your event featuring celebrity talent?

Exploring the Power of GenAI for Audio in Digital Media Production

Myths Debunked: 10 Common Audio Myths That Shouldn't Prevent You From Stellar Sound

The Evolution of Audio

8D Audio – The Future of immersive Music?!

Turning Words into Music: How AI is Revolutionizing Audio Generation

领英推荐

MLOPS -Getting Started .....

2024年6月18日

Pydub

2023年9月4日

Introduction to Python libraries for image processing(Opencv):

2023年9月2日

@tf.function

2023年8月21日

TEXT-TO-SPEECH Using Pyttsx3

2023年8月14日

Web Scraping

2023年8月11日

TORCHAUDIO

2023年8月10日

Getting Started With Hugging Face-Installation and setUp

2023年8月7日

Learning Path: "Voice and Sound Recognition"

2023年8月6日

Pytorch Learning -3 [TRANSFORMS]

2023年8月5日