Computer Vision : A PyTorch Model Trained on the Stanford Cars Dataset
Ankush Arya
Data Scientist Partnering with Experts to Enhance Public Health Safety and Innovation!
Introduction
In the rapidly advancing field of computer vision, new breakthroughs constantly reshape the boundaries of what's possible. Among these milestones is the creation of a PyTorch framework. In this blog, I delve into the realm of Computer Vision using PyTorch, detailing the development of the TinyVGG architecture and its meticulous training on the esteemed Stanford Cars dataset. Join me as we embark on the journey behind this ground breaking project and uncover its profound implications for the future of vehicle recognition.
Code:
Data
The Stanford Cars dataset provides a rich collection of car images for machine learning tasks. It contains 16,185 images across 196 different car classes. Each class typically represents a specific make, model, and year of a car, ensuring detailed classification.
The dataset is meticulously split for training and testing, with nearly equal portions of 8,144 and 8,041 images respectively. This split allows researchers to train their models effectively and evaluate their performance on unseen data.
Dataset Class and transformers
The StanfordCarsCustom class is a versatile tool for efficiently managing image datasets. Through meticulous initialisation and utilisation of class attributes it allows for a structured approach to accessing and handling image data. Error handling mechanisms ensure robustness, enabling seamless data loading even in the face of potential challenges.
By defining custom transformations of torchvision.transforms in image preprocessing, such as resizing, conversion to RGB, and data augmentation, it can tailor the dataset to suit the needs of the machine learning model. The integration of DataLoader facilitates streamlined data loading, batching, and shuffling, laying a solid foundation for model training and evaluation. With these foundational elements in place, we're poised to embark on a journey of model development and experimentation, armed with the tools to unlock new insights and innovations in the exciting field of deep learning.
Training
There are two critical components essential for training and evaluating deep learning models: the TinyVGG architecture and the training/testing steps. The TinyVGG class defines a compact yet powerful neural network architecture inspired by the VGG model, comprising convolutional and pooling layers followed by a classifier. This architecture, meticulously crafted for image classification tasks, offers a balance between efficiency and performance.
Complementing the model architecture are the train_step and test_step functions, responsible for training and evaluating the model, respectively. These functions orchestrate the forward pass, loss calculation, and optimisation steps, ensuring efficient model training and evaluation. With these foundational elements in place, the program is equipped to embark on the journey of model development in the dynamic field of deep learning.
领英推荐
# 1. Subclass torch.utils.data.Dataset
class StanfordCarsCustom(Dataset):
"""Custom Image Dataset for PyTorch.
This class loads images and class information from a CSV file containing annotations.
It inherits from `torch.utils.data.Dataset` to provide a structured way to access
and manage image data.
Args:
annotations_file (str): Path to the CSV file containing image annotations.
transform (torchvision.transforms, optional): Transformations to apply to images.
Defaults to None.
"""
# 2. Initialize with directory and transform (optional)
def __init__(self, annotations_file, transform=None) -> None:
"""
Initializes the StanfordCarsCustom dataset.
Args:
annotations_file (str): Path to the CSV file containing annotations.
transform (torchvision.transforms, optional): Transformations to apply to images.
Defaults to None.
"""
# 3. Create class attributes
self.annotation_df = pd.read_csv(annotations_file)
self.transform = transform
self.classes = self.annotation_df['class_name'].unique()
# Logging initialization
self.logger = logging.getLogger(__name__)
# 5. Overwrite __len__() method (optional but recommended)
def __len__(self) -> int:
"""
Returns the total number of images in the dataset.
Returns:
int: The total number of images.
"""
return len(self.annotation_df)
# 6. Overwrite __getitem__() method (required)
def __getitem__(self, idx: int):
"""
Retrieves one sample (image and its class label) from the dataset.
Args:
idx (int): Index of the sample to retrieve.
Returns:
Tuple[torch.Tensor, dict]: A tuple containing the image as a PyTorch tensor
and its corresponding class information as a dictionary.
"""
try:
img_path = self.annotation_df.loc[idx, 'image_path'] # Use column name for readability
# Error handling for corrupted files
if img_path is None:
raise Exception(f"Failed to read image: {img_path}")
image = Image.open(img_path)
# Log image loading with image details (already good)
# self.logger.info(f"Loading image from path: {img_path}, image shape is {channels} x {image_width} x {image_height}")
class_name = self.annotation_df.loc[idx, 'class_name'] # Use column name
class_index = self.annotation_df.loc[idx, 'class_index'] - 1
class_dict = {
'class_name': class_name,
'class_index': class_index
}
if self.transform:
image = self.transform(image)
return image, class_dict
except Exception as e:
self.logger.error(f"Error loading image: {img_path} - {e}")
# Handle error (raise, skip image, etc.)
# Here, we'll skip the image for simplicity
return None, None
Transforms
# Define image transformation using torchvision.transforms
train_transform = transforms.Compose([
# Resize images to 64x64
transforms.Resize((64, 64)),
# Check if the image is grayscale, if not, convert to grayscale while keeping 3 channels
transforms.Lambda(lambda img: img.convert('RGB') if img.mode != 'RGB' else img),
# Apply TrivialAugmentWide for data augmentation with customizable intensity
transforms.TrivialAugmentWide(num_magnitude_bins=31),
# Randomly flip images horizontally with a 50% chance
transforms.RandomHorizontalFlip(p=0.5),
# Convert PIL images to PyTorch tensors and normalize pixel values to [0.0, 1.0]
transforms.ToTensor()
])
# Create testing transform (no data augmentation)
test_transform = transforms.Compose([
# Resize images to 64x64
transforms.Resize((64, 64)),
# Check if the image is grayscale, if not, convert to grayscale while keeping 3 channels
transforms.Lambda(lambda img: img.convert('RGB') if img.mode != 'RGB' else img),
# Convert PIL images to PyTorch tensors
transforms.ToTensor()
])
Model Evaluation
The two curves in the image appear to be tracking the loss and accuracy of a machine learning model during training. The left curve labelled “loss” seems to be trending downwards, which is a good sign. This suggests that the model is improving its performance on the training data as the number of epochs increases. The right curve labelled “accuracy” also appears to be trending upwards. This is also a good sign, which means the model’s ability to correctly classify examples is improving.
It is generally desirable for the loss curve to trend downwards and the accuracy curve to trend upwards during training. This indicates that the model is learning to perform the task for which it is designed. However, it is important to note that the performance on the training data does not necessarily guarantee good performance on unseen data. In order to get a better sense of how well the model generalizes, it is important to evaluate its performance on a separate test dataset.
Next Steps for the Project...
Stay tuned for updates as we continue to push the boundaries of what's possible in the exciting world of computer vision!
By modularising the code, experimenting with diverse models, incorporating transfer learning, and leveraging PyTorch Lightning, we can enhance the project's efficiency, robustness, and scalability, paving the way for further advancements in vehicle recognition technology.
#ComputerVision #PyTorch #StanfordCarsDataset #MachineLearning #AI #ML #DeepLearning