登录查看更多内容

Custom Object Detector

Jayant Kumar

Principal ML Scientist at Adobe | Technical Advisor at Preffect | Multimodal AI | Large language models and Knowledge Graph applications

发布日期: 2018年12月2日

Recently I had a chance to try Tensorflow object detection API to develop a custom object detector - an object detection model for a new category using a pre-trained model on other categories and different data. The team that did this work at Google has done an impressive work on benchmarking all the state of the art detectors in TF! They have made it very easy for anyone to pickup one of the existing pre-trained model and fine-tune it for different custom cases. I thought this post might help someone trying to do the same. This could definitely give a good initial base line.

Prerequisite: Follow the install instruction.

1. Data Preparation:

We need: (1) Folder with images (2) List of train images with bounding-boxes (3) List of test or eval images with bounding-boxes

If you do not have your own data and just want to try the fine-tuning on existing data you can use any publicly available data. For example, you can use the public dataset available here: https://github.com/datitran/raccoon_dataset and use their data format and scripts for TF record preparation.

# train and test list as csv
image_name.jpg width, height, class, xmin, ymin, xmax, ymax
 
# Convert the data to TF record format
python generate_tfrecord.py --csv_input=data/train_labels.csv  
							--output_path=train.record

2. Train Preparation:

(a) Checkout the TF Object detection model zoo.

(b) Download a pre-trained model from the model zoo, for example faster_rcnn_inception_resnet_v2_atrous_coco

(d) Create this recommended folder structure for your experiments:

#folder structure
+data
  -label_map file
  -train TFRecord file
  -eval TFRecord file
+models
  + model
    -pipeline config file
    +train
    +eval


+ is a dir
- is a file

(d) Copy the sample config file to model and set these variable:

PIPELINE_CONFIG_PATH=/home/ubuntu/TF/models/research/object_detection/models/model/faster_rcnn_inception_resnet_v2_atrous_coco.config

MODEL_DIR=/home/ubuntu/TF/models/research/object_detection/models/model/

NUM_TRAIN_STEPS=20000 # based on your data

SAMPLE_1_OF_N_EVAL_EXAMPLES=1

(e) Create a label map file following the sample label files, for example for two objects:

item {
  id: 1
  name: 'obj1'
  display_name: 'custom-object-1'
}

item {
  id: 2
  name: 'obj2'
  display_name: 'custom-object-2'
}

(f) Modify config file:

Change batch-size if necessary
Change data paths and label map file

3. Train Object Detector:

python object_detection/model_main.py     
		--pipeline_config_path=${PIPELINE_CONFIG_PATH}     
		--model_dir=${MODEL_DIR}     
		--num_train_steps=${NUM_TRAIN_STEPS}     
		--sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES     
		—alsologtostderr

Run Tensorboard:

# run tensorboard 
tensorboard --logdir=./research/object_detection/models/model/

You should be able to see the progress and performance on Tensorboard:

Other similar tutorials on web on same topic:

https://medium.com/@WuStangDan/step-by-step-tensorflow-object-detection-api-tutorial-part-1-selecting-a-model-a02b6aabe39e

https://medium.com/@WuStangDan/step-by-step-tensorflow-object-detection-api-tutorial-part-2-converting-dataset-to-tfrecord-47f24be9248d

https://towardsdatascience.com/building-a-toy-detector-with-tensorflow-object-detection-api-63c0fdf2ac95

https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9

Peter Mankowski

I am an accomplished and results-focused individual with extensive professional experience across ai, software. Seven start-ups, one IPO. Current interests: Gemini, RAG, LLM, LangGraph, computer vision, robotics,

6 年

Well done! Congrats.

查看更多评论

要查看或添加评论，请登录

Jayant Kumar的更多文章

DeepSeek-R1: A Pure RL-based Reasoning Model

2025年1月26日

DeepSeek-R1: A Pure RL-based Reasoning Model

I summarize the key steps involved in creating the DeepSeek models, from the foundational development of DeepSeek-R1 to…

1 条评论
LLaVA-OneVision

2024年9月21日

LLaVA-OneVision

The LLaVA-NeXT series represents a groundbreaking evolution in large multimodal models with each iteration bringing…

2 条评论
GraphRAG: Powerful but Expensive and Slow Solution

2024年7月29日

GraphRAG: Powerful but Expensive and Slow Solution

Microsoft's GraphRAG architecture represents a significant advancement in Retrieval-Augmented Generation (RAG) systems,…

2 条评论
SIGIR Day 1 - Keynotes and Industry Papers

2024年7月16日

SIGIR Day 1 - Keynotes and Industry Papers

Day 1 started with the opening remarks from general/program chairs. Some key insights are as follows: RecSys has the…
LLM Alignment: Direct Preference Optimization

2024年7月13日

LLM Alignment: Direct Preference Optimization

In the realm of language models (LMs), alignment is essential to ensure that the outputs generated by these models meet…

1 条评论
Behind the Rankings: LLM Model Evaluation in Benchmark Datasets

2024年4月20日

Behind the Rankings: LLM Model Evaluation in Benchmark Datasets

Over the past few days, there's been a flurry of posts discussing the newly unveiled Llama 3 model and its impressive…
Navigating the Shifting Tides: Reflections on the Rollercoaster Ride of 2023

2023年12月31日

Navigating the Shifting Tides: Reflections on the Rollercoaster Ride of 2023

The Unfolding Drama in Early 2023: Unrealistic Projections, Layoffs, and the Pressure to Innovate As the curtains rose…

1 条评论
AI Horizons: A Closer Look at the Five Big AI Bets in 2023

2023年12月22日

AI Horizons: A Closer Look at the Five Big AI Bets in 2023

As we navigate the ever-evolving landscape of artificial intelligence, it's natural to wonder – which bets are paying…

1 条评论
BERT as a service

2020年5月17日

BERT as a service

There are multiple ways of leveraging the open source BERT model for your NLP work, for example, via huggingface…
Learning by Teaching

2015年8月22日

Learning by Teaching

I had heard before that the best way to learn anything is to try to teach it to others. If you can explain a topic of…

3 条评论

See all articles

Custom Object Detector

Jayant Kumar

Principal ML Scientist at Adobe | Technical Advisor at Preffect | Multimodal AI | Large language models and Knowledge Graph applications

1. Data Preparation:

2. Train Preparation:

3. Train Object Detector:

Jayant Kumar的更多文章

社区洞察

其他会员也浏览了

Fidel Vetino Working with "Arrays"in Spark 3.5

23-4-1 Getting started with Pinecone Vector Database

ML Pipelines for Model Tuning

Fun with Graphing in Power BI - Part SQRT(POWER(SQRT(8),2) + POWER(SQRT(8),2))

Learning path for Datascience

Understanding the ThreeSum Problem: Solution Steps and Complexity Analysis.

Sorting in Data Structure: Categories & Types [With Examples]

Meta-analysis - 5 part series plus interpretation articles - Orientation and links

Formula of the Day: The Softmax Function

Different random forest packages in R

1. Data Preparation:

2. Train Preparation:

3. Train Object Detector:

Jayant Kumar的更多文章

DeepSeek-R1: A Pure RL-based Reasoning Model

LLaVA-OneVision

GraphRAG: Powerful but Expensive and Slow Solution

SIGIR Day 1 - Keynotes and Industry Papers

LLM Alignment: Direct Preference Optimization

Behind the Rankings: LLM Model Evaluation in Benchmark Datasets

Navigating the Shifting Tides: Reflections on the Rollercoaster Ride of 2023

AI Horizons: A Closer Look at the Five Big AI Bets in 2023

BERT as a service

Learning by Teaching

社区洞察

其他会员也浏览了

Fidel Vetino Working with "Arrays"in Spark 3.5

23-4-1 Getting started with Pinecone Vector Database

ML Pipelines for Model Tuning

Fun with Graphing in Power BI - Part SQRT(POWER(SQRT(8),2) + POWER(SQRT(8),2))

Learning path for Datascience

Understanding the ThreeSum Problem: Solution Steps and Complexity Analysis.

Sorting in Data Structure: Categories & Types [With Examples]

Meta-analysis - 5 part series plus interpretation articles - Orientation and links

Formula of the Day: The Softmax Function

Different random forest packages in R