Build Deep Learning model for the Image Classification task (Part2): Structure, Monitoring for training
Saigon Technology - Accelerate Software Development
We expedite your software development projects with advanced technologies, an Agile mindset, and ISO standards.
Welcome back to Part 2 of my tutorial on building a deep learning model for image classification. In Part 1, It covered the basics of deep learning and walked through the steps to prepare our data for training and basic deep learning model or environment as well. Now, we will focus on the structure of the deep learning project and how to monitor it during training. In the previous blog, you could see I follow an “all-in-one” style for combining all code into one file. In this blog I will introduce you to the basic deep learning project structure and some monitors to help you to track the model performance during training.
1.?Project structure
When starting a new deep learning project, it's essential to plan ahead and think about the overall structure of the project. This will make it easier to maintain your code and data as your project grows, and to collaborate with other developers and researchers.
1.1.?Key components
Here are the key components of a deep learning project:
Once you have your dataset, you will need to preprocess it by resizing the images, normalizing the pixel values, and splitting it into training, validation, and testing sets. It's essential to ensure that your data is well-organized and that the directory structure is consistent across all subsets.
It's crucial to define the model architecture clearly and to document any hyperparameters and design decisions. This will make it easier to reproduce your results and to make changes to the model in the future.
It's important to monitor the training process by logging the loss and accuracy metrics, and to use techniques such as early stopping and learning rate scheduling to prevent overfitting and speed up convergence.
It's important to report the evaluation results clearly and to compare them with the state-of-the-art models in the literature. This will help you determine whether your model is good enough for the intended application.
It's important to document the deployment process clearly and to test the model thoroughly on new data to ensure that it behaves as expected.
Now that we have covered the key components of a deep learning project, let's discuss how to organize your code and data to ensure efficiency and reproducibility.
1.2.?Code structure
1.2.1.?Data Organization
In this structure, we have separate directories for the training, validation, and testing data, with subdirectories for each class. This makes it easy to load the data using data generators or PyTorch datasets.
```
project/
??data/
????train/
??????class1/
????????image1.jpg?
????????image2.jpg
????????...
??????class2/
????????image1.jpg?
????????image2.jpg
????????...?
??????classN/
????????image1.jpg?
????????image2.jpg
????????...?
????val/
??????class1/
????????image1.jpg?
????????image2.jpg
????????...
??????class2/
????????image1.jpg?
????????image2.jpg
????????...?
??????classN/
????????image1.jpg?
????????image2.jpg
????????...?
?
????test/
??????class1/
????????image1.jpg?
????????image2.jpg
????????...
??????class2/
????????image1.jpg?
????????image2.jpg
????????...?
??????classN/
????????image1.jpg?
????????image2.jpg
????????...????????
?
```
Fig1: Dataset structure
It's essential to organize the data in a consistent and reproducible manner. Here are some tips for organizing the data:
- Use a consistent directory structure across all subsets
- Name the images using a clear and descriptive convention, such as {class}-{index}.jpg
- Use a separate CSV file or JSON file to store the class labels and image paths
- Use data augmentation techniques such as flipping and rotation to increase the size of the dataset
- Normalize the pixel values using a standard technique, such as dividing by 255 or subtracting the mean and dividing by the standard deviation
1.2.2.?Code Organization
For the code organization, you could apply the basic structure like below:
project/
??data/
????train/
??????...
????val/
??????...
????test/
??????...
??config/
????config.yml
??src/
????model/
??????init.py
??????cnn1_model.py
领英推荐
??????cnn2_model.py
??????vgg16_model.py
??????layers.py
????config.py
????dataset.py
????loss.py
????optim.py
????util.py
??train.py
??eval.py
??log/
??mlruns/
??model/
??requirements.txt
Fig2: Project structure
The data/ directory contains the dataset? for our project, you could follow the previous section to understand how to organize data for the Image classification task, and I will not mention it again.?
The src/ directory will contain all of our modules for building models like model architecture, loss modules, optimizer modules, or some support modules like utilities, config reader... It's important to modularize the code and to use clear naming conventions to make it easy to understand and modify.
The train.py, eval.py, and predict.py scripts are responsible for training, evaluating, and deploying the model, respectively. These scripts should use the modules in the src/ directory and should take command-line arguments for flexibility.
Some other folders, like log/ will be used to store the training result using tensorboard tool, mlrun/ will be the same but for using mlflow tool, model/ will be used to store the trained model after finishing.
Finally, the requirements.txt file contains a list of dependencies required to run the code, and the README.md file provides documentation and instructions for running the project.
For more detail, you could check the full demo source code here for the project structure: https://github.com/phonglesaigontechnology/Image-Classification-Model-2?
2.?Monitoring
When training a deep learning model for image classification, it's essential to monitor its performance closely to ensure that it's learning effectively and not overfitting to the training data. Overfitting occurs when the model performs well on the training data but poorly on the test data, which indicates that it has memorized the training data instead of learning the underlying patterns.
2.1.?Monitor Values
There are several ways to monitor the performance of your deep learning model during training.
???It's important to monitor the loss and accuracy metrics during training and to plot them over time using a visualization tool such as TensorBoard. This will allow you to see how the model is improving over time and to identify any potential issues, such as overfitting or underfitting.
? Monitoring the learning rate during training can help you to optimize it for better performance. If the learning rate is too high, the model may overshoot the optimal weights and converge to a suboptimal solution. If the learning rate is too low, the model may take too long to converge or get stuck in a local minimum.
? To implement early stopping, you can use a callback function in your training script that monitors the performance on the validation set and stops the training process when the performance stops improving for a certain number of epochs.
? Monitoring the impact of these techniques on the training performance can help you to optimize them for better performance. For example, you can experiment with different dropout rates or regularization strengths to see how they affect the model's performance.
In conclusion, monitoring the performance of your deep learning model during training is essential for ensuring that it's learning effectively and not overfitting to the training data. By monitoring metrics such as loss, accuracy, learning rate, and regularization strength, you can optimize your model for better performance and prevent overfitting.
2.2.?Tools:
There are several other monitoring tools available for deep learning models. Here are a few examples: MLflow, Tensorboard, Neptune.ai, Weights & Biases, Comet.ml...
But I will only introduce you to MLflow and Tensorboard because they are not cloud-based tools. They are open-source platforms, free to use, and can be run locally or on a cluster, and it integrates with many popular machine learning libraries such as TensorFlow, PyTorch, and scikit-learn
?
import mlflow
?
# Setup MLflow
mlflow.set_tracking_uri("./mlruns")?
mlflow.set_experiment("MyModel CNN")
?
# Log params
for key, value in config.__dict__.items():?
???mlflow.log_param(key, value)
?
# Log training loss
mlflow.log_metric('train_loss', loss.item(), step=global_step)
?
# Log training accuracy
mlflow.log_metric('val_accuracy', accuracy, step=epoch) ?
?
To view the logs in MLFlow, run the command below in the terminal at the project/ folder and navigate to https://localhost:5000 in your browser.?
?
(venv)$ mlflow ui
?
Fig3: MLFlow Dashboard UI
?
?
from tensorboardX import SummaryWriter
?
# Setup TensorBoard
writer = SummaryWriter( ‘log/tensorboard_cnn’, time.strftime("%Y-%m-%d-%H-%M-%S"))) ?
# Log training loss
writer.add_scalar('train/loss', loss.item(), global_step)
?
# Log training accuracy
writer.add_scalar('val/accuracy', accuracy, epoch)
?
To view the logs in TensorBoard, run the command below in the terminal at project/ folder and navigate to https://localhost:6006 in your browser.
(venv)$ tensorboard --logdir=./logs/tensorboard_cnn
Fig4: Tensorboard Dashboard UI
In this code, we use the SummaryWriter from TensorBoardX and the mlflow library to log the training loss and test accuracy to both TensorBoard and MLflow.?
Resources
References