When AI meets Ornithology
"Dad, I believe Cobalt sleeps 10 hours a day." My daughter said.
Cobalt is the name of her pet bird, a beautiful cobalt colored parakeet. Honestly I do not think this statement can be true. Because I keep on hearing her on the mission of destroying her bird toys over the night, and chirps during the day.
"No, I don't think so. Coby is so active during the night, and she's super cranky during the day."
"How do you know? She might be a natural grump." Since when my girl can talk back like this? But this is a valid question. Yes or no?
To settle this difference, we placed a video camera next to Cobalt's cage. Nothing fancy, just something like this, available on sale from local Fry's:
It comes with 16GB onboard storage, enough for 4-5 days of continuous capture of H264 video streams. In addition to that, it comes with 10 infrared LEDs for night vision. We placed this camera behind the bird cage and it starts recording around the clock.
Since the camera is designed as an IOT device, it is intrinsically an embedded Linux device with a mount volume, therefore I am confident my DL workstation (https://www.dhirubhai.net/pulse/how-build-deep-learning-workstation-450-min-yi-shen/) will be able to import all the video data files. For offline video analysis, this will be good enough. We then transferred 32GB of video data files to the workstation. Things are looking good. Data collection is done -- even though we did not know where we were aiming to, or whether the camera could see the bird at all.
With the raw data files in hand, we need a software, if fact, a suite of software, to read them. There are three major components: object recognition framework, computer vision library, and a video decoder library. This part is not conceptually difficult. Setting up the system is just like cooking a dish, you need to know who to compile before who using what version of gcc, etc. In short, the OpenCV (computer vision library) needs to be built with ffmpeg (decoder) support during the configuration, and the darknet/YOLO (object recognition framework) needs to be built with OpenCV and CUDA support, all using a specific version of gcc. (gcc 4.9 for nvcc and CUDA 8.0, if you have to ask)
The central part of this study is darknet/YOLO neural net framework. As a backup, I have also prepared a Keras version of YOLO with tensorflow backend in case I cannot make the GPU work with the original darknet/YOLO. The YOLO (you only look once) is an extremely fast object recognition model developed by Joseph Redmon. (https://pjreddie.com/darknet/yolo/) He even has a TED video, you should check it out. (https://www.ted.com/talks/joseph_redmon_how_a_computer_learns_to_recognize_objects_instantly) What does the object recognition do? Technically it is a convolutional neural network (CNN) that tells you WHAT object it sees in the image, and WHERE it is.
We were so excited when the OpenCV finally worked. Then we pushed in the first image. Here is what we got.
I am so glad that I have set the camera properly. But I am also pretty sure I do not have Sports Ball and Giraffe in the cage. And I don't wear ties. My daughter went through a bunch of PNGs (at that time the GPU version was not ready yet, there were not a lot of images) are told me the parakeet was classified as bird, giraffe, and zebra most of the time.
Also, I have noticed the darknet/YOLO was not running fast on a i5 6400 CPU. The typical performance is like 10-12 seconds a frame - a dismal 0.1 FPS (frame per second). At this point I was about to abort this effort as the out-of-box classification performance and throughput are both seemingly disappointing.
However, I know from my own experience as a data scientist: complex models rarely works as expected off-the-shelf. You need to know how they work, the assumption, and a lot of manual inspections to make sure it works as it is supposed to. Honestly, I was forced to make the computer machinary work because my daughter told me she has signed up for her science fair. Thanks for the vote of confidence.
We figured out what is wrong with the "giraffe problem" first. The default YOLO model (yolo.weights) is trained with MS COCO (Common Objects in Context) dataset, which has 80 classes, and the dataset probably does not include a bird in the cage. We then experimented with YOLO with VOC training (which has only 21 classes) -- and viola, it's a bird all the time!
The GPU speed up took another few days to come. The compiler failed every time when I set GPU=1. Turns out CUDA does not like newer gcc. Installed gcc 4.9, pointed cc to that version, still failed. Finally I realized I have to help nvcc (nvidia CUDA compiler) point to the gcc too. After a few other hacks... it finally worked. On my GTX 1060 GPU, the throughput has improved from 0.1 to 25-30 FPS. That is immense. It means I can do one year worth of analysis in a bit more than a day. Suddenly whole project became possible since we have several millions of images to work with. (even at a reduced 12 FPS video, 1 day = 1.037 million frames)
After setting the computer up, my daughter went ahead and inspect all the video footages with object recognition overlays one by one. At this stage, we were still not sure whether we have captured Coby sleeping in front of the camera. Then we sat down... not really. In fact, my daughter did most of the manual inspection work while I did indoor cycling on Zwift.
During the manual process, she found the raw video data from some days are not usable because one of the following problems: wrong angle, accidentally obscured by the curtain, or camera placed too far away. She has started doing what all data scientists do instinctively -- cleaning up the data. We finally agreed a 36-hour continuous video capture is good for our analysis. The video segment contains about 3 million images. We also noted several long video segments where Coby is clearly sleeping, and some others being very active. She has just unknowingly labelled the dataset.
What metric or feature is most suitable for identifying current state of Coby? We can review these two videos, first, sleepy Coby,
And active Coby,
She has just learned the area of shapes in the Cartesian coordinate in the middle school. After watching the green rectangles dancing for hours, she then suggested a good metric would be the area of the "bird box". Then we found a region of clear sleep-awake transition, then we plot the "bird box" area (in red) with time:
Clearly the sleepy Coby exhibits much lower variance in "bird box" area. When Coby is awake, the box area becomes jumpier, or higher variance. The sleep-awake boundary happened on ~6:40AM, about 13 minutes after the sunrise of that day. With this simple feature, we have come up with a very simple "Cobalt sleep-o-meter" (a binary classifier) based on one feature, the variance of the "bird box" area in 240 frames. It successfully captured some of Coby's long naps we have not noticed before, validated by the raw video data.
So here comes the original question we asked: does Cobalt sleep at least 10 hours a day? The sleep-o-meter indicated Cobalt sleeps about 80% of time during the night hours (from 9pm to 6am, 9 hours in the darkness), and about 22% of time during the daytime (15 hours). Therefore, on average Coby sleeps 10.5 hours a day, very close to her original guess. I guess we all have to believe in the ornithologist sometimes. They know what they are doing. And AI can provide a little help.
Bring manufacturing back Connecticut
3 年We have two parakeets. Emmmm. Insane.
Founder at Yaton Labs LLC
6 年Awesome project!
Professor at Stanford University
6 年WOW! Such a fun project for AI!