Background
I was pleasantly surprised with the response to the early adopter version of my forthcoming book - mathematical foundations of data science
. If you have messaged me - I will come back to you soon.??
To put the idea of the book in context
I am creating a small community for my book - mathematical foundations of data science. You get pdf when released but you also chapters as they are released and you get to engage and ask questions. The price is a one off 40 USD. If you are interested please DM me. I am trying to keep spaces limited since I want to learn from feedback so that's an important criterion as well.
The idea of the book is simple: In an age when a majority of the code could be LLM generated, its very useful to approach AI from first principles i.e. from the maths. The good news is .. there are only four things to know: linear algebra, statistics, optimization and probability theory. The bad news is: its not easy to tie these four ideas to every machine learning and deep learning algorithm considering that the field itself is rapidly evolving. In this sense, the book helps by creating a concise structure. Since these ideas are known to many people at A levels (around age 18) - the book creates a foundation to know AI based on ideas that you already know - even if you have studied them years ago!
In this post, I explain the idea of the (model evaluation metric) as the ‘north star’ - something that you aspire to and guide you in your machine learning process. Specifically, I elaborate on the idea that every metric can be tied to the north star ie to this objective
So, if you dont know about model evaluation metrics, please see how to choose model evaluation metrics
.?
Its easy to understand machine learning and deep learning when you can see how each step in the ML/DL pipeline ties back to the objective of the model evaluation pipeline.?
Let us first see the stages in the ML/DL pipeline and then we will explore how these stages can be understood in context of the model evaluation metrics.?
The ML/DL pipeline
The machine learning (ML) and deep learning (DL) pipeline consists of several stages that? transform raw data into a functional model capable of making predictions or decisions.??
Problem Definition: Define the problem you are trying to solve, including objectives, metrics, and constraints. This includes deciding whether the problem is classification, regression, clustering, etc.
Data Collection: Gather relevant data from various sources. In the case of deep learning, you typically need large amounts of labeled data.
Data Preprocessing: Data preprocessig comprises different steps. Data Cleaning: Handle missing values, outliers, and inconsistencies.
Data
Transformation: Normalise, scale, or encode categorical features. Data Splitting: Split the dataset into training, validation, and test sets.
Feature Engineering (for traditional ML): Choose the most relevant features to improve model performance. Feature engineering is a critical step in machine learning, where raw data is transformed into features that better represent the underlying problem to the predictive models, enhancing their performance. The main components of feature engineering include:
- Feature Creation: feature creation includes? Domain-Specific Knowledge: Creating new features based on understanding of the domain, e.g., converting timestamps into features like 'day of the week' or 'hour of the day' in time-series data;? Interaction Features: Creating new features by combining two or more existing features, such as multiplying, adding, or concatenating them;? Aggregating Features: Summarizing data at different levels (e.g., mean, sum, or count) to create new features.
- Feature Transformation: feature transformation includes Normalization/Standardization: Scaling features so that they have a standard range, such as normalizing values between 0 and 1 or standardizing them to have zero mean and unit variance;? Log Transformation: Used for reducing skewness in the data, typically applied to data that spans many orders of magnitude. Binning: Converting continuous variables into categorical features by dividing the range into intervals or "bins."
- Handling Missing Data: handling missing data includes Imputation: Filling in missing values using strategies such as mean, median, mode, or advanced techniques like KNN imputation; Removing Missing Data: If too many values are missing, dropping columns or rows may be an option.
- Encoding Categorical Variables: includes One-Hot Encoding: Converting categorical variables into a series of binary features (e.g., using dummy variables); Label Encoding: Assigning a unique integer to each category (used when the categorical variable has a natural order).
- Dimensionality Reduction: Principal Component Analysis (PCA): Reducing the number of features while preserving the variance in the data;?
- Feature Selection Techniques: Methods like correlation analysis, L1 regularization (Lasso), or Recursive Feature Elimination (RFE) to select the most important features.
- Feature Extraction: includesText Features: Extracting features from textual data, such as TF-IDF or word embeddings;? Image Features: Extracting features from images using techniques like convolution in Convolutional Neural Networks (CNNs) or edge detection methods; Temporal Features: For time-series data, extracting features like trends, seasonality, lags, or moving averages.
- Feature Scaling: includes? Min-Max Scaling: Scaling features between a specific range, often [0, 1];? Robust Scaling: Handling outliers by scaling based on percentiles.
- Handling Outliers: includes Capping or Clipping: Replacing extreme values beyond a certain threshold with a specific value; Transformations: Applying transformations like log or square root to mitigate the effect of outliers.
By applying these techniques effectively, you can enhance the predictive power of machine learning models and improve their performance. Note that -? In deep learning, feature engineering is often minimal because the model itself learns features through layers?
Model Selection:? Choose the appropriate algorithm(s) based on the nature of the problem, data size, and complexity (e.g., decision trees, support vector machines, neural networks). For deep learning, decide on the architecture (e.g., Convolutional Neural Networks for images, Recurrent Neural Networks for time series, etc.).
Model Training The steps in model training are:
- Train the Model: Feed the training data to the algorithm and optimize the parameters using optimization techniques (e.g., gradient descent).
- Hyperparameter Tuning: Adjust hyperparameters like learning rate, batch size, number of layers, and regularization methods.
- Loss Function: Define and minimize the loss function, which measures the difference between predicted and actual values.
- Validate the model's performance on a separate validation set to ensure it generalizes well and doesn't overfit the training data.
- Apply techniques like cross-validation to fine-tune the model.
- Evaluate the model on the test set using performance metrics like accuracy, precision, recall, F1-score, AUC-ROC for classification, or RMSE for regression.
- For deep learning, track metrics like training/validation loss and accuracy during training.
- Apply techniques like:
- For deep learning, consider techniques like batch normalization or dropout to improve performance.
- Model Deployment: Convert the trained model into a format suitable for serving (e.g., saving the model as a serialized file).
- Serving the Model: Deploy the model in production via APIs, web services, or embedded systems.
Monitoring and Maintenance
- Continuously monitor the model in production for drift (changes in data distributions), accuracy degradation, or performance bottlenecks.
- Retrain the model if necessary, especially in cases of concept drift.
For certain domains (like healthcare or finance), it's important to make the model interpretable using techniques like SHAP, LIME, or feature importance methods.
Retraining and Feedback Loop
Use feedback from real-world performance to update the model. In cases where new data becomes available, retrain the model to ensure it remains relevant and accurate.
This pipeline is iterative, where steps like data preprocessing, model selection, and training may need to be revisited multiple times to optimize the final model.
Relating the steps of the machine learning pipeline to the model evaluation metric
Now, lets relate each of these stes to the model evaluation metric Each step in the machine learning and deep learning pipeline directly or indirectly impacts the model evaluation stage. :
- Problem Definition The problem definition determines the evaluation metrics used. For example: Classification Problems: Metrics like accuracy, precision, recall, F1-score, AUC-ROC are crucial. Regression Problems: Metrics like RMSE, MAE, R2 are relevant. Without a clear problem definition, it’s challenging to select appropriate metrics for evaluating the model's performance.
- Data Collection: The quality and quantity of the data directly affect model evaluation. High-Quality Data: Well-curated data ensures that the model has the potential to perform well on unseen data, leading to better evaluation scores.Biased or Noisy Data: If the collected data is biased or noisy, it will likely degrade performance, and evaluation metrics will reflect poor generalization. Insufficient Data: Inadequate data may lead to underfitting, impacting metrics like accuracy or precision during evaluation.
- Data Preprocessing: Proper data preprocessing is essential for model performance, and it ensures that evaluation results are meaningful. Data Cleaning: Reduces noise, improving accuracy and other metrics. Normalization/Scaling: Without proper scaling, certain algorithms (like SVMs, neural networks) perform poorly, resulting in suboptimal evaluation scores. Data Splitting: Proper train-validation-test splitting ensures that evaluation metrics reflect generalization to unseen data. Poor splitting can lead to overfitting, skewing evaluation results.
- Feature Engineering: Impact on Evaluation: Feature selection and transformation affect the model’s ability to learn patterns in the data. Good Feature Engineering: Helps the model capture useful information, improving performance metrics like accuracy or R2. Poor Feature Selection: Using irrelevant or redundant features can lead to overfitting or underfitting, which negatively impacts evaluation metrics such as precision, recall, or RMSE. (For deep learning, feature engineering is minimal, but ensuring high-quality input features is still critical for proper model evaluation.)
- Model Selection: Impact on Evaluation: The choice of algorithm affects the evaluation metrics directly. Algorithm Appropriateness: Selecting the right algorithm for the task leads to better evaluation metrics. For example, using decision trees for structured data or convolutional neural networks (CNNs) for image data improves accuracy, precision, or F1-score. Algorithm Mismatch: Choosing an algorithm that’s not well-suited for the problem will likely result in poor performance during evaluation (e.g., low accuracy, high error).
- Model Training: The training process determines how well the model fits the data, which is reflected in evaluation metrics. Good Training: Optimized hyperparameters, learning rate, and proper training result in a model that generalizes well, improving metrics like accuracy or R2. Overfitting/Underfitting: If the model overfits (memorizes training data) or underfits (fails to capture patterns), evaluation metrics will show poor generalization, with validation accuracy much lower than training accuracy or high validation error. Hyperparameter Tuning: The right combination of hyperparameters leads to better performance during evaluation by balancing bias and variance.
- Model Validation: Model validation helps ensure that the model will generalize to unseen data, making evaluation metrics more reliable. Cross-Validation: Techniques like k-fold cross-validation provide more robust performance estimates, which will be more accurate during model evaluation. Avoiding Overfitting: Regular validation helps to stop training at the right time, improving metrics like F1-score or RMSE during final evaluation.
- Model Evaluation This is the stage where the model's performance is quantitatively assessed using predefined metrics. The results depend on how well each of the previous steps was executed. Evaluation on the test set ensures that the model's performance (accuracy, precision, recall, etc.) is unbiased and reflects real-world scenarios.
- Model Optimization Optimization techniques improve the model's ability to generalize, which is reflected in the evaluation metrics. Regularization (L1/L2): Reduces overfitting, leading to better generalization and higher evaluation scores (accuracy, precision, recall). Early Stopping: Prevents overfitting, ensuring that evaluation metrics reflect true model performance on unseen data. Ensembling: Improves evaluation metrics like accuracy, precision, and recall by combining predictions from multiple models.
- Deployment: While deployment is typically after model evaluation, the real-world performance of the deployed model (as monitored by online evaluation) may influence future offline evaluations. If the model performs poorly in production (i.e., real-world predictions deviate from offline evaluations), it might signal issues with the initial evaluation metrics or the data distribution mismatch.
- Monitoring and Maintenance: Continuous monitoring helps track model drift, which indicates a change in evaluation metrics over time. Model Drift: Degradation in evaluation metrics like accuracy or F1-score in production can signal the need to retrain or update the model to align it with current data distributions.
- Model Interpretability: Interpretability allows for understanding which features contribute most to model performance, helping to refine the model and improve evaluation scores. Techniques like SHAP and LIME help ensure that high evaluation scores are meaningful and that the model is not learning spurious correlations.
- Retraining and Feedback Loop: As more data becomes available, retraining the model improves evaluation metrics by incorporating updated patterns in the data. Continuous Improvement: Regular retraining and fine-tuning ensure the model’s evaluation metrics (e.g., accuracy, precision) remain high as the data evolves.
Conclusion
Thus, each stage of the pipeline, from data collection to deployment, directly influences how well the model performs when evaluated. Evaluation metrics serve as the key feedback mechanism that signals the effectiveness of these earlier steps.
While the post is a bit long I find it easier to understand when you relate all the elements of the ML/DL pipeline to the model evaluation metric.?
Re the book - please message me if you want to be a part of the below.?
I am creating a small community for my book - mathematical foundations of data science. You get pdf when released but you also chapters as they are released and you get to engage and ask questions. The price is a one off 40 USD. If you are interested please DM me. I am trying to keep spaces limited since I want to learn from feedback so that's an important criterion as well.
The idea of the book is simple: In an age when a majority of the code could be LLM generated, its very useful to approach AI from first principles i.e. from the maths. The good news is .. there are only four things to know: linear algebra, statistics, optimization and probability theory. The bad news is: its not easy to tie these four ideas to every machine learning and deep learning algorithm considering that the field itself is rapidly evolving. In this sense, the book helps by creating a concise structure. Since these ideas are known to many people at A levels (around age 18) - the book creates a foundation to know AI based on ideas that you already know - even if you have studied them years ago!
Sales And Marketing Specialist at Amazon virtual assistant and freelancer
1 个月Very informative