登录查看更多内容

Computer Vision : A PyTorch Model Trained on the Stanford Cars Dataset

Ankush Arya

Data Scientist Partnering with Experts to Enhance Public Health Safety and Innovation!

发布日期: 2024年5月11日

Introduction

In the rapidly advancing field of computer vision, new breakthroughs constantly reshape the boundaries of what's possible. Among these milestones is the creation of a PyTorch framework. In this blog, I delve into the realm of Computer Vision using PyTorch, detailing the development of the TinyVGG architecture and its meticulous training on the esteemed Stanford Cars dataset. Join me as we embark on the journey behind this ground breaking project and uncover its profound implications for the future of vehicle recognition.

Code:

Data

The Stanford Cars dataset provides a rich collection of car images for machine learning tasks. It contains 16,185 images across 196 different car classes. Each class typically represents a specific make, model, and year of a car, ensuring detailed classification.

The dataset is meticulously split for training and testing, with nearly equal portions of 8,144 and 8,041 images respectively. This split allows researchers to train their models effectively and evaluate their performance on unseen data.

Dataset Class and transformers

The StanfordCarsCustom class is a versatile tool for efficiently managing image datasets. Through meticulous initialisation and utilisation of class attributes it allows for a structured approach to accessing and handling image data. Error handling mechanisms ensure robustness, enabling seamless data loading even in the face of potential challenges.

By defining custom transformations of torchvision.transforms in image preprocessing, such as resizing, conversion to RGB, and data augmentation, it can tailor the dataset to suit the needs of the machine learning model. The integration of DataLoader facilitates streamlined data loading, batching, and shuffling, laying a solid foundation for model training and evaluation. With these foundational elements in place, we're poised to embark on a journey of model development and experimentation, armed with the tools to unlock new insights and innovations in the exciting field of deep learning.

Training

There are two critical components essential for training and evaluating deep learning models: the TinyVGG architecture and the training/testing steps. The TinyVGG class defines a compact yet powerful neural network architecture inspired by the VGG model, comprising convolutional and pooling layers followed by a classifier. This architecture, meticulously crafted for image classification tasks, offers a balance between efficiency and performance.

Complementing the model architecture are the train_step and test_step functions, responsible for training and evaluating the model, respectively. These functions orchestrate the forward pass, loss calculation, and optimisation steps, ensuring efficient model training and evaluation. With these foundational elements in place, the program is equipped to embark on the journey of model development in the dynamic field of deep learning.

Towards Data Science 4 个月前

Complete Data Science BootCamp!

Free Online Courses With Certificates 1 年前

DeepLearning.AI TensorFlow Developer Professional…

Free Online Courses 1 年前

# 1. Subclass torch.utils.data.Dataset
class StanfordCarsCustom(Dataset):
    """Custom Image Dataset for PyTorch.

    This class loads images and class information from a CSV file containing annotations.
    It inherits from `torch.utils.data.Dataset` to provide a structured way to access
    and manage image data.

    Args:
        annotations_file (str): Path to the CSV file containing image annotations.
        transform (torchvision.transforms, optional): Transformations to apply to images.
            Defaults to None.
    """

    # 2. Initialize with directory and transform (optional)
    def __init__(self, annotations_file, transform=None) -> None:
        """
        Initializes the StanfordCarsCustom dataset.

        Args:
            annotations_file (str): Path to the CSV file containing annotations.
            transform (torchvision.transforms, optional): Transformations to apply to images.
                Defaults to None.
        """

        # 3. Create class attributes
        self.annotation_df = pd.read_csv(annotations_file)
        self.transform = transform
        self.classes = self.annotation_df['class_name'].unique()
        # Logging initialization
        self.logger = logging.getLogger(__name__)

    # 5. Overwrite __len__() method (optional but recommended)
    def __len__(self) -> int:
        """
        Returns the total number of images in the dataset.

        Returns:
            int: The total number of images.
        """

        return len(self.annotation_df)

    # 6. Overwrite __getitem__() method (required)
    def __getitem__(self, idx: int):
        """
        Retrieves one sample (image and its class label) from the dataset.

        Args:
            idx (int): Index of the sample to retrieve.

        Returns:
            Tuple[torch.Tensor, dict]: A tuple containing the image as a PyTorch tensor
                                       and its corresponding class information as a dictionary.
        """
        try:
            img_path = self.annotation_df.loc[idx, 'image_path']  # Use column name for readability

            # Error handling for corrupted files
            if img_path is None:
                raise Exception(f"Failed to read image: {img_path}")

            image = Image.open(img_path)

            # Log image loading with image details (already good)
            # self.logger.info(f"Loading image from path: {img_path}, image shape is {channels} x {image_width} x {image_height}")

            class_name = self.annotation_df.loc[idx, 'class_name']  # Use column name
            class_index = self.annotation_df.loc[idx, 'class_index'] - 1

            class_dict = {
                'class_name': class_name,
                'class_index': class_index
            }

            if self.transform:
                image = self.transform(image)
            return image, class_dict

        except Exception as e:
            self.logger.error(f"Error loading image: {img_path} - {e}")
            # Handle error (raise, skip image, etc.)
            # Here, we'll skip the image for simplicity
            return None, None

Transforms

# Define image transformation using torchvision.transforms
train_transform = transforms.Compose([
    # Resize images to 64x64
    transforms.Resize((64, 64)),

    # Check if the image is grayscale, if not, convert to grayscale while keeping 3 channels
    transforms.Lambda(lambda img: img.convert('RGB') if img.mode != 'RGB' else img),
    
    # Apply TrivialAugmentWide for data augmentation with customizable intensity
    transforms.TrivialAugmentWide(num_magnitude_bins=31), 

    # Randomly flip images horizontally with a 50% chance
    transforms.RandomHorizontalFlip(p=0.5),

    # Convert PIL images to PyTorch tensors and normalize pixel values to [0.0, 1.0]
    transforms.ToTensor()  
])

# Create testing transform (no data augmentation)
test_transform = transforms.Compose([
    # Resize images to 64x64
    transforms.Resize((64, 64)),
    
    # Check if the image is grayscale, if not, convert to grayscale while keeping 3 channels
    transforms.Lambda(lambda img: img.convert('RGB') if img.mode != 'RGB' else img),
    
    # Convert PIL images to PyTorch tensors
    transforms.ToTensor()
])

Model Evaluation

The two curves in the image appear to be tracking the loss and accuracy of a machine learning model during training. The left curve labelled “loss” seems to be trending downwards, which is a good sign. This suggests that the model is improving its performance on the training data as the number of epochs increases. The right curve labelled “accuracy” also appears to be trending upwards. This is also a good sign, which means the model’s ability to correctly classify examples is improving.

It is generally desirable for the loss curve to trend downwards and the accuracy curve to trend upwards during training. This indicates that the model is learning to perform the task for which it is designed. However, it is important to note that the performance on the training data does not necessarily guarantee good performance on unseen data. In order to get a better sense of how well the model generalizes, it is important to evaluate its performance on a separate test dataset.

Next Steps for the Project...

Stay tuned for updates as we continue to push the boundaries of what's possible in the exciting world of computer vision!

Modularise the Code: Break down the existing codebase into modular components to enhance readability, scalability, and maintainability. This modular approach will facilitate easier debugging, testing, and future enhancements.
Experiment with Other Computer Vision Models : Explore alternative computer vision architectures beyond TinyVGG to evaluate their performance.
Incorporate Transfer Learning: Implement transfer learning techniques to leverage pre-trained models and fine-tune them on the Stanford Cars dataset. Transfer learning can expedite training, improve model accuracy, and mitigate overfitting by leveraging knowledge learned from similar tasks.
Utilise PyTorch Lightning: Transition the project to PyTorch Lightning framework to streamline experimentation and model development. PyTorch Lightning offers built-in experiment tracking, distributed training support, and automatic optimisation, simplifying the development process and accelerating iteration cycles.

By modularising the code, experimenting with diverse models, incorporating transfer learning, and leveraging PyTorch Lightning, we can enhance the project's efficiency, robustness, and scalability, paving the way for further advancements in vehicle recognition technology.

#ComputerVision #PyTorch #StanfordCarsDataset #MachineLearning #AI #ML #DeepLearning

Computer Vision : A PyTorch Model Trained on the Stanford Cars Dataset

Ankush Arya

Data Scientist Partnering with Experts to Enhance Public Health Safety and Innovation!

Introduction

Code:

Data

Dataset Class and transformers

Training

领英推荐

Transforms

Model Evaluation

Next Steps for the Project...

更多精彩文章

社区洞察

其他会员也浏览了

Top Libraries for Building My AI Project

AIML 10- Building Custom Image Datasets in PyTorch

What does TensorFlow entail? A breakdown of the machine learning library.

AI Framework for Beginners: TensorFlow

Migrating Your PyTorch Code for Gen AI Conversion: A Comprehensive Guide

TensorFlow-Keras using Mnist Dataset

Chapter 2.2 : Self-Driving Car [Intro to TensorFlow & Deep Neural Network]

MLBP 7: TensorFlow’s moves towards PyTorch + How Hinton’s new CapNets might change everything

PyTorch: Gradient Descent, Stochastic Gradient Descent and Mini Batch Gradient Descent (Code included)

Pioneers of AI: Shaping the Future with Vision and Resilience

Introduction

Code:

Data

Dataset Class and transformers

Training

领英推荐

Transforms

Model Evaluation

Next Steps for the Project...

Customer churn

2022年11月26日

Understanding interactions between the predictors and response variable

2022年11月19日

Download any PDF easily with python

2022年2月12日

Car model prediction using CNN

2022年1月2日

Automating laptop configuration

2020年10月25日

Is India flattening the curve?

2020年4月29日

COVID19 growth viz-a-viz India

2020年4月21日

Quest for COVID19 Data

2020年4月9日

社区洞察

其他会员也浏览了

Top Libraries for Building My AI Project

AIML 10- Building Custom Image Datasets in PyTorch

What does TensorFlow entail? A breakdown of the machine learning library.

AI Framework for Beginners: TensorFlow

Migrating Your PyTorch Code for Gen AI Conversion: A Comprehensive Guide

TensorFlow-Keras using Mnist Dataset

Chapter 2.2 : Self-Driving Car [Intro to TensorFlow & Deep Neural Network]

MLBP 7: TensorFlow’s moves towards PyTorch + How Hinton’s new CapNets might change everything

PyTorch: Gradient Descent, Stochastic Gradient Descent and Mini Batch Gradient Descent (Code included)

Pioneers of AI: Shaping the Future with Vision and Resilience