登录查看更多内容

DeepSORT Algorithm For Object Tracking

Shashank V Raghavan??

Artificial Intelligence?| Autonomous Systems??| Resident Robot Geek??| Quantum Computing??| Product and Program Management??

发布日期: 2025年2月13日

DeepSORT (Deep Simple Online and Realtime Tracking) is an advanced object tracking algorithm that builds upon the original SORT (Simple Online and Realtime Tracking) by incorporating deep learning for more robust performance, especially in complex environments with occlusions and similar-looking objects.

Background: SORT Recap

SORT uses:

Kalman Filter: For predicting the next position of objects based on motion (position and velocity).
Hungarian Algorithm: For solving the assignment problem, matching detected objects between consecutive frames based on the Intersection over Union (IoU) of their bounding boxes.

Limitations of SORT:

Fails in scenarios with occlusion (when objects are temporarily hidden).
Struggles with appearance similarity (e.g., two people wearing similar clothes).

Key Components of DeepSORT:

Detection: Requires external object detectors like YOLO, Faster R-CNN, or SSD to provide bounding boxes for objects in each frame.

Motion Model (Kalman Filter): Predicts the object's position in the next frame based on its current motion (velocity, position, etc.).

Data Association (Hungarian Algorithm): Matches current detections with existing tracked objects using a cost matrix.

Appearance Descriptor (Deep Learning):

Uses a Convolutional Neural Network (CNN) to extract feature embeddings from detected objects.
This helps maintain object identity over time, even during partial occlusions or abrupt motion changes.

Combined Cost Metric:

Motion Cost (IoU): Measures overlap between predicted and detected bounding boxes.
Appearance Cost (Cosine Distance): Compares feature embeddings of detections and tracked objects.
A weighted combination of both metrics improves tracking robustness.

领英推荐

The Art of Balance: Understanding and Optimizing…

Huenei IT Services 1 个月前

The Quest for Interpretable Machine Learning Models

Vizuara 9 个月前

Sequential Learning through Knowledge Distillation and…

Striveworks 2 年前

How DeepSORT Works?

Detection (Input): Requires external object detectors like YOLO, Faster R-CNN, or SSD to provide bounding boxes for objects in each frame.

Motion Model (Prediction): Kalman Filter predicts the next position of each tracked object based on its previous state (position, velocity, etc.).

Appearance Descriptor (Deep Learning):

DeepSORT introduces a Convolutional Neural Network (CNN) to extract a feature vector (appearance embedding) for each detected object. This helps distinguish between visually similar objects.
These embeddings are typically 128-dimensional vectors that capture unique visual characteristics.

Data Association (Matching):

Combines IoU (spatial information) and cosine distance (appearance similarity) to associate current detections with existing tracks.
Hungarian Algorithm is then used to optimally match detections to tracks based on this combined metric.

Track Management:

Confirmed Tracks: Objects that have been consistently detected over multiple frames.
Tentative Tracks: Newly detected objects awaiting confirmation.
Deleted Tracks: Tracks removed after missing detections for a threshold number of frames.

Why DeepSORT is Effective

Robust to Occlusions: Appearance embeddings help maintain identity even when objects are temporarily hidden.
Reduced Identity Switches: Combines spatial and visual data, making it less likely to confuse similar-looking objects.
Scalable: Can handle multiple objects in real-time applications like pedestrian tracking, vehicle tracking, etc.

Implementation of DeepSORT for object tracking using YOLOv5 as the object detector. This setup will help us detect and track multiple objects in a video stream.

pip install torch torchvision torchaudio
pip install opencv-python
pip install numpy
pip install filterpy
pip install scikit-learn
pip install yolov5

import cv2
import torch
import numpy as np
from filterpy.kalman import KalmanFilter
from scipy.spatial import distance
from yolov5 import YOLOv5

# Load YOLOv5 model
model = YOLOv5("yolov5s.pt")  # Use 'yolov5s' for speed, 'yolov5m' or 'yolov5l' for better accuracy

# Kalman Filter Tracker class
class Tracker:
    def __init__(self, bbox, feature, tracker_id):
        self.kalman = KalmanFilter(dim_x=7, dim_z=4)
        self.kalman.F = np.array([
            [1, 0, 0, 0, 1, 0, 0],
            [0, 1, 0, 0, 0, 1, 0],
            [0, 0, 1, 0, 0, 0, 1],
            [0, 0, 0, 1, 0, 0, 0],
            [0, 0, 0, 0, 1, 0, 0],
            [0, 0, 0, 0, 0, 1, 0],
            [0, 0, 0, 0, 0, 0, 1]
        ])
        self.kalman.H = np.array([
            [1, 0, 0, 0, 0, 0, 0],
            [0, 1, 0, 0, 0, 0, 0],
            [0, 0, 1, 0, 0, 0, 0],
            [0, 0, 0, 1, 0, 0, 0]
        ])
        self.kalman.R[2:, 2:] *= 10.
        self.kalman.P[4:, 4:] *= 1000.
        self.kalman.P *= 10.
        self.kalman.Q[-1, -1] *= 0.01
        self.kalman.Q[4:, 4:] *= 0.01
        
        self.kalman.x[:4] = bbox.reshape((4, 1))
        self.feature = feature
        self.tracker_id = tracker_id
        self.hits = 1
        self.no_losses = 0

    def predict(self):
        self.kalman.predict()

    def update(self, bbox, feature):
        self.kalman.update(bbox)
        self.feature = feature
        self.hits += 1
        self.no_losses = 0

# Feature extraction using a simple color histogram (placeholder for a CNN-based descriptor)
def get_features(image, bbox):
    x1, y1, x2, y2 = map(int, bbox)
    crop = image[y1:y2, x1:x2]
    hist = cv2.calcHist([crop], [0, 1, 2], None, [8, 8, 8], [0, 256, 0, 256, 0, 256])
    return cv2.normalize(hist, hist).flatten()

# Data association using cosine similarity
def associate_detections(tracks, detections, features):
    if len(tracks) == 0:
        return np.empty((0, 2), dtype=int), np.arange(len(detections)), []

    cost_matrix = np.zeros((len(tracks), len(detections)), dtype=np.float32)

    for i, track in enumerate(tracks):
        for j, feature in enumerate(features):
            cost_matrix[i, j] = distance.cosine(track.feature, feature)
          
    matched_indices = []
    while cost_matrix.size > 0 and cost_matrix.min() < 0.5:  # Threshold
        i, j = np.unravel_index(cost_matrix.argmin(), cost_matrix.shape)
        matched_indices.append((i, j))
        cost_matrix[i, :] = 1
        cost_matrix[:, j] = 1

    unmatched_tracks = list(set(range(len(tracks))) - {i for i, _ in matched_indices})
    unmatched_detections = list(set(range(len(detections))) - {j for _, j in matched_indices})

    return matched_indices, unmatched_tracks, unmatched_detections

# Video capture
cap = cv2.VideoCapture("input_video.mp4")
trackers = []
tracker_id = 0

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # YOLOv5 detection
    results = model.predict(frame)
    detections = results.xyxy[0].numpy()

    # Extract features for detections
    features = [get_features(frame, det[:4]) for det in detections]

    # Predict new locations for all tracks
    for tracker in trackers:
        tracker.predict()

    # Associate detections to existing tracks
    matches, unmatched_tracks, unmatched_detections = associate_detections(trackers, detections, features)

    # Update matched trackers
    for track_idx, det_idx in matches:
        bbox = detections[det_idx][:4]
        feature = features[det_idx]
        trackers[track_idx].update(bbox, feature)

    # Create new trackers for unmatched detections
    for det_idx in unmatched_detections:
        bbox = detections[det_idx][:4]
        feature = features[det_idx]
        trackers.append(Tracker(bbox, feature, tracker_id))
        tracker_id += 1

    # Remove lost trackers
    trackers = [t for t in trackers if t.no_losses < 5]

    # Draw bounding boxes
    for tracker in trackers:
        x1, y1, x2, y2 = map(int, tracker.kalman.x[:4])
        cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
        cv2.putText(frame, f'ID: {tracker.tracker_id}', (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)

    cv2.imshow("DeepSORT Tracking", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

How This Works:

Object Detection: Uses YOLOv5 to detect objects in each frame.
Feature Extraction: Extracts simple color histograms (this can be replaced with a CNN for better accuracy).
Kalman Filter: Predicts the next position of tracked objects.
Data Association: Matches detections to tracks using cosine similarity of features.
Track Management: Updates matched tracks, creates new ones, and removes lost tracks.

要查看或添加评论，请登录

Shashank V Raghavan??的更多文章

Deep Learning Models for PID Control in Robotics

2025年2月25日

Deep Learning Models for PID Control in Robotics

PID controllers are widely used in robotics for motion control, trajectory tracking, and balancing tasks. However, they…
Optics in Quantum Computers

2025年2月4日

Optics in Quantum Computers

Optics play a crucial role in quantum computing, especially in photonic quantum computing and quantum communication…

1 条评论
AI-enabled optical sensor ViDAR (Visual Detection and Ranging)

2025年1月16日

AI-enabled optical sensor ViDAR (Visual Detection and Ranging)

ViDAR (Visual Detection and Ranging) is an advanced optical sensor technology used for wide-area surveillance…
Reinforcement Learning Frameworks for Decision-Making in Autonomous Navigation

2025年1月7日

Reinforcement Learning Frameworks for Decision-Making in Autonomous Navigation

Reinforcement Learning (RL) stands at the forefront of artificial intelligence, offering transformative capabilities…
Sensor Fusion (LiDAR + Camera) PointPillars

2024年12月30日

Sensor Fusion (LiDAR + Camera) PointPillars

LiDAR and camera fusion algorithms combine data from LiDAR sensors (which provide precise depth and 3D spatial…
Point cloud analysis using ICP

2024年12月26日

Point cloud analysis using ICP

Point cloud analysis in LiDAR systems is a critical aspect of computer vision, enabling tasks like object detection…
Noise Filtering: LiDAR Systems

2024年12月24日

Noise Filtering: LiDAR Systems

Noise filtering in LiDAR systems is critical for ensuring accurate and reliable data. Noise in LiDAR data can result…
3D Point Cloud Segmentation

2024年12月14日

3D Point Cloud Segmentation

What is Point Cloud Segmentation? A point cloud is an unstructured 3D data representation of the world, typically…
Shadowless 3D Perception

2024年12月2日

Shadowless 3D Perception

Shadowless 3D Perception is a concept often linked to advancements in computer vision, machine learning, and robotics…
Robotic Path Planning: RRT and RRT*

2024年11月6日

Robotic Path Planning: RRT and RRT*

The robotic path planning problem is a classic. A robot, with certain dimensions, is attempting to navigate between…

See all articles

DeepSORT Algorithm For Object Tracking

Shashank V Raghavan??

Artificial Intelligence?| Autonomous Systems??| Resident Robot Geek??| Quantum Computing??| Product and Program Management??

Background: SORT Recap

SORT uses:

Limitations of SORT:

Key Components of DeepSORT:

领英推荐

How DeepSORT Works?

Why DeepSORT is Effective

How This Works:

Shashank V Raghavan??的更多文章

其他会员也浏览了

What Is Stable Diffusion and How Does It Work?

Artificial Neural Network Model Classification and Regression

Uncovering Hidden Patterns: How AI Reveals Insights Beyond Human Perception

The Hierarchical Temporal Memory (HTM) Algorithm

A Comprehensive Overview of Classification Methods

Application of stratigraphy intelligent correlation

The Basics of GANs: Creating Realistic Data with Simple Examples

N-BEATS: The Unique Interpretable Deep Learning Model for Time Series Forecasting

Mastering Regularization: The Complete Guide to All Strategies

Background: SORT Recap

SORT uses:

Limitations of SORT:

Key Components of DeepSORT:

领英推荐

How DeepSORT Works?

Why DeepSORT is Effective

How This Works:

Shashank V Raghavan??的更多文章

Deep Learning Models for PID Control in Robotics

Optics in Quantum Computers

AI-enabled optical sensor ViDAR (Visual Detection and Ranging)

Reinforcement Learning Frameworks for Decision-Making in Autonomous Navigation

Sensor Fusion (LiDAR + Camera) PointPillars

Point cloud analysis using ICP

Noise Filtering: LiDAR Systems

3D Point Cloud Segmentation

Shadowless 3D Perception

Robotic Path Planning: RRT and RRT*

其他会员也浏览了

What Is Stable Diffusion and How Does It Work?

Artificial Neural Network Model Classification and Regression

Uncovering Hidden Patterns: How AI Reveals Insights Beyond Human Perception

The Hierarchical Temporal Memory (HTM) Algorithm

A Comprehensive Overview of Classification Methods

Application of stratigraphy intelligent correlation

The Basics of GANs: Creating Realistic Data with Simple Examples

N-BEATS: The Unique Interpretable Deep Learning Model for Time Series Forecasting

Mastering Regularization: The Complete Guide to All Strategies