How to predict Healthy and Faulty sounds with MFCC (Mel-frequency Cepstral Coefficients), SVM and deploy them to Streamlit
Audio Prediction - Credit to Paul Mermelstein

How to predict Healthy and Faulty sounds with MFCC (Mel-frequency Cepstral Coefficients), SVM and deploy them to Streamlit

Sometimes feature extraction with resampling, data augmentation and conversion to Mel spectrograms can seem overwhelming. We found a simpler algorithm to extract Features:?MFCC?(Mel-frequency Cepstral Coefficients). Paul Mermelstein is typically credited with the development of MFCC.

In order to satisfy this technical need, we can implement a logic to extract the vectors from the audio files and use them in the model without impacting the performance of the computer or using too much lines of code.

Let’s dive into it and explore how we can implement this …

Disclaimer: I will share only the important code snippets in order to keep the reading quick, you can check the link to the dedicated repo for this article on GitHub at the end.

Importing libraries

import glob
import pandas as pd 
import math
import secrets
from sklearn.model_selection import train_test_split
import numpy as np
import librosa
from matplotlib import pyplot as plt
from tqdm import tqdm
import os
from sklearn.svm import SVC
import pickle
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns        

Creating an ID generator for the test set

def random_alphanum(length: int) -> str:
  text = secrets.token_hex(nbytes=math.ceil(length / 2))
  isEven = length % 2 == 0
  return text if isEven else text[1:]        

Creating a training set

# assign directory
directory = '.\Data'
	 
# iterate over files in
# that directory
dataset_dict = {
    'relative_path' : [],
    'classID' : [],
    'id_file' : []
}
sum1 = 0
relative_path = []
classID = []
id_file = []
for filename in glob.iglob(f'{directory}/*.wav'):
    relative_path.append(filename)
    if ("Faulty" in filename):
        classID.append(0)
    else:
        classID.append(1)
for i in range(0, len(list(glob.iglob(f'{directory}/*.wav')))):
    id_file.append(random_alphanum(6))
dataset_dict['relative_path'] = relative_path
dataset_dict['classID'] = classID
dataset_dict['id_file'] = id_file
df_dataset = pd.DataFrame(dataset_dict, columns = ['relative_path', 'classID', 'id_file'])        

Showing the Distribution of Healthy and Faulty Sounds

plt.figure(figsize=(10, 6))
sns.countplot(df_dataset['classID'])
plt.title("Count of records in each class")
plt.xticks(rotation="vertical")
plt.show()        
No alt text provided for this image

Class 1 stands for Healthy Sounds and class 0 stands for Faulty Sounds

Features Extraction

We load the audio files with:

audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast')        

We generate the MFCC vectors with the?mfcc?method of librosa library:

mfccs_features = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)        

We standardize the MFCC vectors with?NumPy:

mfccs_scaled_features = np.mean(mfccs_features.T,axis=0)        

We define the features extraction function with:

def features_extractor(file):
  #load the file (audio)
  audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast') 
  #we extract mfcc
  mfccs_features = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
  #in order to find out scaled feature we do mean of transpose of value
  mfccs_scaled_features = np.mean(mfccs_features.T,axis=0)
  return mfccs_scaled_features        

We extract the features with:

extracted_features=[]
for index_num,row in tqdm(df_dataset.iterrows()):
  file_name = os.path.join(os.path.abspath(""),str(row["relative_path"]))
  final_class_labels=row["classID"]
  data=features_extractor(file_name)
  extracted_features.append([data,final_class_labels])
We generate the labeled Features Dataset:
extracted_features_df=pd.DataFrame(extracted_features,columns=['feature','class'])        

Generating the training and in-sample test sets

We split the dataset in training and in-sample test sets with Sklearn?train_test_split:

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=0)        

We save the traning and in-sample test sets:

np.save(r'.\Data\Dataset\X_train',X_train,allow_pickle =False
np.save(r'.\Data\Dataset\X_test',X_test,allow_pickle =False)	np.save(r'.\Data\Dataset\y_train',y_train,allow_pickle =False)	np.save(r'.\Data\Dataset\y_test',y_test,allow_pickle =False))        

Training the SVM (Support Vector Machines) model

We generate the SVM model with Sklearn?SVC:

svclassifier = SVC(kernel='linear')        

We train the model:

svclassifier.fit(X_train, y_train)        

We save the model:

filename = '.\Pickle\SVM-model-Healthy-Faulty-Audios.pkl'
pickle.dump(svclassifier,  open(filename, 'wb'))        

Showing scores

We show the confusion matrix:

print(confusion_matrix(y_test,y_pred))        

We show the accuracy:

print(classification_report(y_test,y_pred))

                  precision    recall  f1-score   support
	           0       1.00      1.00      1.00         3
	           1       1.00      1.00      1.00         3
	    accuracy                           1.00         6
	   macro avg       1.00      1.00      1.00         6
	weighted avg       1.00      1.00      1.00         6        

So, we have 100% accuracy because the features are well extracted and there is no overfitting. There is a good confusion matrix. Maximum values are on the diagonal. The 2 classes are well predicted.

Generating the results

We extracted 10 random sounds from the original dataset before generating the training set. And we predicted them. Out of 5 Healthy sounds, 5 were predicted Healthy. And out of 5 Defective, 5 were predicted Defective. The model is accurate.

We vectorized the prediction set:

extracted_features_pred=[]
for index_num,row in tqdm(df_test_dataset.iterrows()): 
    file_name = os.path.join(os.path.abspath(""),str(row["relative_path"]))
    final_class_labels=row["classID"]
    relative_path=row["relative_path"]
    id_file=row["id_file"]
    data=features_extractor(file_name)
    extracted_features_pred.append([data,final_class_labels, relative_path, id_file])
pred_extracted_features_df=pd.DataFrame(extracted_features_pred,columns=['feature','class', 'relative_path', 'id_file'])        

We generated the predictions:

X_pred=np.array(pred_extracted_features_df['feature'].tolist())
y_real_pred=np.array(pred_extracted_features_df['class'].tolist())
y_pred_test = svclassifier.predict(X_pred)
pred_extracted_features_df["Predicted_Class"] = y_pred_test
pred_extracted_features_df = pred_extracted_features_df[['id_file', 'relative_path', 'Predicted_Class', 'class']]
pred_extracted_features_df        
No alt text provided for this image

Now we will deploy the model to Streamlit. One of the simplest Front-end frameworks is Streamlit. It’s pure Python without the headache of HTML. Streamlit turns data scripts into shareable web apps in minutes. No front?end experience required.

Importing libraries

import streamlit as s
import pandas as pd
import io
import base64
import librosa
import librosa.display
import numpy as np
import picklet        

Defining a title, an icon and a sidebar to the app

st.set_page_config(
	               page_title="Healthy and faulty Audios Prediction",
	               page_icon="chart_with_upwards_trend",
	               layout="wide",
	               initial_sidebar_state="auto",
	              )
st.sidebar.markdown('# About Audios Prediction :')

st.sidebar.markdown("""<div style="text-align: justify;"><p>For Audio Vectorization, we used <strong>MFCC</strong> (<strong>Mel-Frequency Cepstral Coefficients</strong>) :</p><p>It's a a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.</p></div>""", unsafe_allow_html=True)

st.sidebar.markdown("""<div style="text-align: justify;"><p>For Audio Prediction, we used <strong>SVM</strong> (<strong>Support-Vector Machines</strong>) :</p><p>They are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories, SVMs are one of the most robust prediction methods, being based on statistical learning frameworks or VC theory. VC theory is a form of computational learning theory, which attempts to explain the learning process from a statistical point of view.</p></div>""", unsafe_allow_html=True)

st.title('Healthy and faulty Audios Prediction')        

Defining the feature extractor

def features_extractor(file)
  #load the file (audio)
  audio, sample_rate = librosa.load(file, res_type='kaiser_fast')
  #we extract mfcc
  mfccs_features = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
  #in order to find out scaled feature we do mean of transpose of value
  mfccs_scaled_features = np.mean(mfccs_features.T,axis=0)
  return mfccs_scaled_features:        

Defining the audio file upload button

file = st.file_uploader("Enter the Audio files : ", key='info_form3', accept_multiple_files=True)        

Defining the results form and its submit button

info_form = st.form(key='info_form'
is_clk_pred = info_form.form_submit_button('Predict Audio'))        

Defining the processing of the submit button

Reading the files

for fl in file:
  relative_path = '..\\Data\\Test\\' + fl.name
  with open(relative_path, mode='wb') as f:
    f.write(fl.getvalue())
  data = features_extractor(relative_path)
  file_name = fl.name
  extracted_features_pred.append([data, relative_path, file_name])        

Loading the files in a dataset

pred_extracted_features_df = pd.DataFrame(extracted_features_pred, columns=['feature', 'relative_path', 'File_name'])        

Converting the prediction set to a NumPy Array

X_pred = np.array(pred_extracted_features_df['feature'].tolist())        

Loading the model

svm_classifier = pickle.load(open(model_path, 'rb'))        

Predicting the audios

y_pred_test = svm_classifier.predict(X_pred)        

Showing the results Dataframe

pred_extracted_features_df["Predicted_Class"] = y_pred_tes
info_form.dataframe(pred_extracted_features_df[['File_name', "Predicted_Class"]])t        

Defining the button to download the results Excel File

towrite = io.BytesIO()
downloaded_file = pred_extracted_features_df[['File_name', "Predicted_Class"]].to_excel(towrite, encoding='utf-8', index=False, header=True)
towrite.seek(0)  # reset pointer
b64 = base64.b64encode(towrite.read()).decode()  # some strings
linko = f'<a href="data:application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;base64,{b64}" download="Wav-Audio-Files-prediction.xlsx">Download excel file</a>'
st.markdown(linko, unsafe_allow_html=True)        

Testing the application

To run the application you can do it locally. (visit the?README?file for more info).

I inputed 10 sounds outside the training set using the browse files button.

Uploading of files in the App

Uploading of files in the App

I clicked on “Predict Audio”, so we can see the results table and the Excel file downloading link

No alt text provided for this image

Prediction of Healthy and Faulty audios with the App

Here is the Excel file

No alt text provided for this image

Excel file downloading on the App

No alt text provided for this image

App Excel file downloaded

That’s all folks!

Learn more by:

  • Visiting the Github?repo?of the model notebook to see the full code of the data processing and model training.
  • Visiting the Github?repo?of the application to see the full code of the project.
  • Visiting the?librosa?and?SVM?documentation for more complex audio processing and modeling algorithms.
  • For Medium fans you can read the article there.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了