How to predict Healthy and Faulty sounds with MFCC (Mel-frequency Cepstral Coefficients), SVM and deploy them to Streamlit
???? Abdelmalik Berrada ????
?? Quant & Machine Learning Scientist | Algorithmic Trading, Data Analytics & Financial Modeling Expert ??????
Sometimes feature extraction with resampling, data augmentation and conversion to Mel spectrograms can seem overwhelming. We found a simpler algorithm to extract Features:?MFCC?(Mel-frequency Cepstral Coefficients). Paul Mermelstein is typically credited with the development of MFCC.
In order to satisfy this technical need, we can implement a logic to extract the vectors from the audio files and use them in the model without impacting the performance of the computer or using too much lines of code.
Let’s dive into it and explore how we can implement this …
Disclaimer: I will share only the important code snippets in order to keep the reading quick, you can check the link to the dedicated repo for this article on GitHub at the end.
Importing libraries
import glob
import pandas as pd
import math
import secrets
from sklearn.model_selection import train_test_split
import numpy as np
import librosa
from matplotlib import pyplot as plt
from tqdm import tqdm
import os
from sklearn.svm import SVC
import pickle
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
Creating an ID generator for the test set
def random_alphanum(length: int) -> str:
text = secrets.token_hex(nbytes=math.ceil(length / 2))
isEven = length % 2 == 0
return text if isEven else text[1:]
Creating a training set
# assign directory
directory = '.\Data'
# iterate over files in
# that directory
dataset_dict = {
'relative_path' : [],
'classID' : [],
'id_file' : []
}
sum1 = 0
relative_path = []
classID = []
id_file = []
for filename in glob.iglob(f'{directory}/*.wav'):
relative_path.append(filename)
if ("Faulty" in filename):
classID.append(0)
else:
classID.append(1)
for i in range(0, len(list(glob.iglob(f'{directory}/*.wav')))):
id_file.append(random_alphanum(6))
dataset_dict['relative_path'] = relative_path
dataset_dict['classID'] = classID
dataset_dict['id_file'] = id_file
df_dataset = pd.DataFrame(dataset_dict, columns = ['relative_path', 'classID', 'id_file'])
Showing the Distribution of Healthy and Faulty Sounds
plt.figure(figsize=(10, 6))
sns.countplot(df_dataset['classID'])
plt.title("Count of records in each class")
plt.xticks(rotation="vertical")
plt.show()
Class 1 stands for Healthy Sounds and class 0 stands for Faulty Sounds
Features Extraction
We load the audio files with:
audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
We generate the MFCC vectors with the?mfcc?method of librosa library:
mfccs_features = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
We standardize the MFCC vectors with?NumPy:
mfccs_scaled_features = np.mean(mfccs_features.T,axis=0)
We define the features extraction function with:
def features_extractor(file):
#load the file (audio)
audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
#we extract mfcc
mfccs_features = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
#in order to find out scaled feature we do mean of transpose of value
mfccs_scaled_features = np.mean(mfccs_features.T,axis=0)
return mfccs_scaled_features
We extract the features with:
extracted_features=[]
for index_num,row in tqdm(df_dataset.iterrows()):
file_name = os.path.join(os.path.abspath(""),str(row["relative_path"]))
final_class_labels=row["classID"]
data=features_extractor(file_name)
extracted_features.append([data,final_class_labels])
We generate the labeled Features Dataset:
extracted_features_df=pd.DataFrame(extracted_features,columns=['feature','class'])
Generating the training and in-sample test sets
We split the dataset in training and in-sample test sets with Sklearn?train_test_split:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=0)
We save the traning and in-sample test sets:
np.save(r'.\Data\Dataset\X_train',X_train,allow_pickle =False
np.save(r'.\Data\Dataset\X_test',X_test,allow_pickle =False) np.save(r'.\Data\Dataset\y_train',y_train,allow_pickle =False) np.save(r'.\Data\Dataset\y_test',y_test,allow_pickle =False))
Training the SVM (Support Vector Machines) model
We generate the SVM model with Sklearn?SVC:
svclassifier = SVC(kernel='linear')
We train the model:
svclassifier.fit(X_train, y_train)
We save the model:
filename = '.\Pickle\SVM-model-Healthy-Faulty-Audios.pkl'
pickle.dump(svclassifier, open(filename, 'wb'))
Showing scores
We show the confusion matrix:
print(confusion_matrix(y_test,y_pred))
We show the accuracy:
print(classification_report(y_test,y_pred))
precision recall f1-score support
0 1.00 1.00 1.00 3
1 1.00 1.00 1.00 3
accuracy 1.00 6
macro avg 1.00 1.00 1.00 6
weighted avg 1.00 1.00 1.00 6
So, we have 100% accuracy because the features are well extracted and there is no overfitting. There is a good confusion matrix. Maximum values are on the diagonal. The 2 classes are well predicted.
Generating the results
We extracted 10 random sounds from the original dataset before generating the training set. And we predicted them. Out of 5 Healthy sounds, 5 were predicted Healthy. And out of 5 Defective, 5 were predicted Defective. The model is accurate.
We vectorized the prediction set:
领英推荐
extracted_features_pred=[]
for index_num,row in tqdm(df_test_dataset.iterrows()):
file_name = os.path.join(os.path.abspath(""),str(row["relative_path"]))
final_class_labels=row["classID"]
relative_path=row["relative_path"]
id_file=row["id_file"]
data=features_extractor(file_name)
extracted_features_pred.append([data,final_class_labels, relative_path, id_file])
pred_extracted_features_df=pd.DataFrame(extracted_features_pred,columns=['feature','class', 'relative_path', 'id_file'])
We generated the predictions:
X_pred=np.array(pred_extracted_features_df['feature'].tolist())
y_real_pred=np.array(pred_extracted_features_df['class'].tolist())
y_pred_test = svclassifier.predict(X_pred)
pred_extracted_features_df["Predicted_Class"] = y_pred_test
pred_extracted_features_df = pred_extracted_features_df[['id_file', 'relative_path', 'Predicted_Class', 'class']]
pred_extracted_features_df
Now we will deploy the model to Streamlit. One of the simplest Front-end frameworks is Streamlit. It’s pure Python without the headache of HTML. Streamlit turns data scripts into shareable web apps in minutes. No front?end experience required.
Importing libraries
import streamlit as s
import pandas as pd
import io
import base64
import librosa
import librosa.display
import numpy as np
import picklet
Defining a title, an icon and a sidebar to the app
st.set_page_config(
page_title="Healthy and faulty Audios Prediction",
page_icon="chart_with_upwards_trend",
layout="wide",
initial_sidebar_state="auto",
)
st.sidebar.markdown('# About Audios Prediction :')
st.sidebar.markdown("""<div style="text-align: justify;"><p>For Audio Vectorization, we used <strong>MFCC</strong> (<strong>Mel-Frequency Cepstral Coefficients</strong>) :</p><p>It's a a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.</p></div>""", unsafe_allow_html=True)
st.sidebar.markdown("""<div style="text-align: justify;"><p>For Audio Prediction, we used <strong>SVM</strong> (<strong>Support-Vector Machines</strong>) :</p><p>They are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories, SVMs are one of the most robust prediction methods, being based on statistical learning frameworks or VC theory. VC theory is a form of computational learning theory, which attempts to explain the learning process from a statistical point of view.</p></div>""", unsafe_allow_html=True)
st.title('Healthy and faulty Audios Prediction')
Defining the feature extractor
def features_extractor(file)
#load the file (audio)
audio, sample_rate = librosa.load(file, res_type='kaiser_fast')
#we extract mfcc
mfccs_features = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
#in order to find out scaled feature we do mean of transpose of value
mfccs_scaled_features = np.mean(mfccs_features.T,axis=0)
return mfccs_scaled_features:
Defining the audio file upload button
file = st.file_uploader("Enter the Audio files : ", key='info_form3', accept_multiple_files=True)
Defining the results form and its submit button
info_form = st.form(key='info_form'
is_clk_pred = info_form.form_submit_button('Predict Audio'))
Defining the processing of the submit button
Reading the files
for fl in file:
relative_path = '..\\Data\\Test\\' + fl.name
with open(relative_path, mode='wb') as f:
f.write(fl.getvalue())
data = features_extractor(relative_path)
file_name = fl.name
extracted_features_pred.append([data, relative_path, file_name])
Loading the files in a dataset
pred_extracted_features_df = pd.DataFrame(extracted_features_pred, columns=['feature', 'relative_path', 'File_name'])
Converting the prediction set to a NumPy Array
X_pred = np.array(pred_extracted_features_df['feature'].tolist())
Loading the model
svm_classifier = pickle.load(open(model_path, 'rb'))
Predicting the audios
y_pred_test = svm_classifier.predict(X_pred)
Showing the results Dataframe
pred_extracted_features_df["Predicted_Class"] = y_pred_tes
info_form.dataframe(pred_extracted_features_df[['File_name', "Predicted_Class"]])t
Defining the button to download the results Excel File
towrite = io.BytesIO()
downloaded_file = pred_extracted_features_df[['File_name', "Predicted_Class"]].to_excel(towrite, encoding='utf-8', index=False, header=True)
towrite.seek(0) # reset pointer
b64 = base64.b64encode(towrite.read()).decode() # some strings
linko = f'<a href="data:application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;base64,{b64}" download="Wav-Audio-Files-prediction.xlsx">Download excel file</a>'
st.markdown(linko, unsafe_allow_html=True)
Testing the application
To run the application you can do it locally. (visit the?README?file for more info).
I inputed 10 sounds outside the training set using the browse files button.
Uploading of files in the App
I clicked on “Predict Audio”, so we can see the results table and the Excel file downloading link
Prediction of Healthy and Faulty audios with the App
Here is the Excel file
Excel file downloading on the App
App Excel file downloaded
That’s all folks!
Learn more by: