Activity Recognition Through Temporal Templates

Activity Recognition Through Temporal Templates

I really enjoyed my computer vision final project, so I wanted to share:

At each frame, a random forest, used as a discriminative classifier, is identifying the current action (top left) through transforming the motion history image (MHI, top right) and the motion energy image (MEI, bottom left) into Hu Moments that account for location, rotation, and most importantly scale.

The math is a bit verbose, but pretty straight forward.

The "motion" between images is their subtracted value subject to some threshold θ set to a value τ.

No alt text provided for this image


Then the MEI & MHI can be calculated respectively:

No alt text provided for this image


No alt text provided for this image


Now that the action is captured, the invariants are calculated:

No alt text provided for this image


No alt text provided for this image


No alt text provided for this image


No alt text provided for this image

I know.. kind of ugly, but it works well! That last vector is fed into a model, in this case a random forest, which over many frames recognizes action. I ended up with a 99% accuracy in training and a 71% with cross validation. Clearly, things were overfit, however, the main objective was achieved for finding out that action could be represented by MHI, MEI, & Hu.

This work followed Bobick's and Davis' 2001 paper, however, new methods involving HMMs (source) and 3D CNNs (source) with deep learning are where things seem to be today.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了