Custom Object Detector

Recently I had a chance to try Tensorflow object detection API to develop a custom object detector - an object detection model for a new category using a pre-trained model on other categories and different data. The team that did this work at Google has done an impressive work on benchmarking all the state of the art detectors in TF! They have made it very easy for anyone to pickup one of the existing pre-trained model and fine-tune it for different custom cases. I thought this post might help someone trying to do the same. This could definitely give a good initial base line.

Prerequisite: Follow the install instruction

1. Data Preparation:

We need: (1) Folder with images (2) List of train images with bounding-boxes (3) List of test or eval images with bounding-boxes

If you do not have your own data and just want to try the fine-tuning on existing data you can use any publicly available data. For example, you can use the public dataset available here: https://github.com/datitran/raccoon_dataset and use their data format and scripts for TF record preparation.

# train and test list as csv
image_name.jpg width, height, class, xmin, ymin, xmax, ymax
 
# Convert the data to TF record format
python generate_tfrecord.py --csv_input=data/train_labels.csv  
							--output_path=train.record

2. Train Preparation:

(a) Checkout the TF Object detection model zoo.

(b) Download a pre-trained model from the model zoo, for example faster_rcnn_inception_resnet_v2_atrous_coco

(c) Locate the appropriate config file at sample config

(d) Create this recommended folder structure for your experiments:

#folder structure
+data
  -label_map file
  -train TFRecord file
  -eval TFRecord file
+models
  + model
    -pipeline config file
    +train
    +eval


+ is a dir
- is a file

(d) Copy the sample config file to model and set these variable:

PIPELINE_CONFIG_PATH=/home/ubuntu/TF/models/research/object_detection/models/model/faster_rcnn_inception_resnet_v2_atrous_coco.config

MODEL_DIR=/home/ubuntu/TF/models/research/object_detection/models/model/

NUM_TRAIN_STEPS=20000 # based on your data

SAMPLE_1_OF_N_EVAL_EXAMPLES=1


(e) Create a label map file following the sample label files, for example for two objects:

item {
  id: 1
  name: 'obj1'
  display_name: 'custom-object-1'
}

item {
  id: 2
  name: 'obj2'
  display_name: 'custom-object-2'
}


(f) Modify config file:

Change batch-size if necessary
Change data paths and label map file

3. Train Object Detector:

python object_detection/model_main.py     
		--pipeline_config_path=${PIPELINE_CONFIG_PATH}     
		--model_dir=${MODEL_DIR}     
		--num_train_steps=${NUM_TRAIN_STEPS}     
		--sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES     
		—alsologtostderr

Run Tensorboard:

# run tensorboard 
tensorboard --logdir=./research/object_detection/models/model/

You should be able to see the progress and performance on Tensorboard:

Other similar tutorials on web on same topic:

https://medium.com/@WuStangDan/step-by-step-tensorflow-object-detection-api-tutorial-part-1-selecting-a-model-a02b6aabe39e

https://medium.com/@WuStangDan/step-by-step-tensorflow-object-detection-api-tutorial-part-2-converting-dataset-to-tfrecord-47f24be9248d

https://towardsdatascience.com/building-a-toy-detector-with-tensorflow-object-detection-api-63c0fdf2ac95

https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9

Peter Mankowski

I am an accomplished and results-focused individual with extensive professional experience across ai, software. Seven start-ups, one IPO. Current interests: Gemini, RAG, LLM, LangGraph, computer vision, robotics,

6 年

Well done! Congrats.

回复

要查看或添加评论,请登录

Jayant Kumar的更多文章

  • DeepSeek-R1: A Pure RL-based Reasoning Model

    DeepSeek-R1: A Pure RL-based Reasoning Model

    I summarize the key steps involved in creating the DeepSeek models, from the foundational development of DeepSeek-R1 to…

    1 条评论
  • LLaVA-OneVision

    LLaVA-OneVision

    The LLaVA-NeXT series represents a groundbreaking evolution in large multimodal models with each iteration bringing…

    2 条评论
  • GraphRAG: Powerful but Expensive and Slow Solution

    GraphRAG: Powerful but Expensive and Slow Solution

    Microsoft's GraphRAG architecture represents a significant advancement in Retrieval-Augmented Generation (RAG) systems,…

    2 条评论
  • SIGIR Day 1 - Keynotes and Industry Papers

    SIGIR Day 1 - Keynotes and Industry Papers

    Day 1 started with the opening remarks from general/program chairs. Some key insights are as follows: RecSys has the…

  • LLM Alignment: Direct Preference Optimization

    LLM Alignment: Direct Preference Optimization

    In the realm of language models (LMs), alignment is essential to ensure that the outputs generated by these models meet…

    1 条评论
  • Behind the Rankings: LLM Model Evaluation in Benchmark Datasets

    Behind the Rankings: LLM Model Evaluation in Benchmark Datasets

    Over the past few days, there's been a flurry of posts discussing the newly unveiled Llama 3 model and its impressive…

  • Navigating the Shifting Tides: Reflections on the Rollercoaster Ride of 2023

    Navigating the Shifting Tides: Reflections on the Rollercoaster Ride of 2023

    The Unfolding Drama in Early 2023: Unrealistic Projections, Layoffs, and the Pressure to Innovate As the curtains rose…

    1 条评论
  • AI Horizons: A Closer Look at the Five Big AI Bets in 2023

    AI Horizons: A Closer Look at the Five Big AI Bets in 2023

    As we navigate the ever-evolving landscape of artificial intelligence, it's natural to wonder – which bets are paying…

    1 条评论
  • BERT as a service

    BERT as a service

    There are multiple ways of leveraging the open source BERT model for your NLP work, for example, via huggingface…

  • Learning by Teaching

    Learning by Teaching

    I had heard before that the best way to learn anything is to try to teach it to others. If you can explain a topic of…

    3 条评论

社区洞察

其他会员也浏览了