Computer Vision Playground
Introduction
I have built a computer vision playground on Hugging Face so I and anybody else can quickly play with different computer vision models as easily as possible.
Further down I explain how you can make your own playground and how what I have done will help you do it faster. At the end I show you a new space I built that does object detection, just enter any photo, movie, or YouTube URL, or even use your webcam.
To start
You can just start by trying it out, click on the below.
I have created code to take input from
These pass the images or extracted frames automatically to an analysis function you can provide and play with. The example I have included is face detection with sentiment analysis. Below is an example output from an uploaded Friends show image.
Here is the generated output
Here is a short scrolling video of the full interface and output.
Making your own playground
Beginner programming is required but Gen AI can help with that. You will be amazed.
I have put all the instructions in the README file, see here.
All the source files are located by clicking on the Files tab at the top of the Hugging Face interface.
领英推荐
The README.md file contains step-by-step instructions to clone the code to your local machine and to make the existing code run locally.
Quickly make your own playgrounds - Trying YOLO-8!
I have tried to make it easy to copy the code, update with a new model and detection use case, and not have to worry about how to get the media to use it, be it photos, a live webcam stream, or a movie. You can then upload to Hugging Face under your account and share your playground and insights.
I quickly tried the latest YOLO v8 object detection model using Ultralytics.
Update the analyze_frame function in file app.py
This is the function to replace if you want to change what is detected...
I have truncated the sentiment analysis on line 67 to fit the code into the image. You can see what variables must be set in the comments above the function and how they are set in this code.
img_container["input"] - holds the input frame contents
img_container["analyzed"] - holds the analyzed frame with any added annotations
img_container["analysis_time"] - holds how long the analysis has taken (in ms)
# img_container["detections"] - holds the analysis metadata results
I replaced the above function with this version.
It was relatively easy and ChatGTP/Co-pilot are amazing helpers. Try it!
Here is an example video from the above when streaming a YouTube video.
To try it yourself I also uploaded this new playground to Hugging Face. Try it out using your webcam and holding different things. YOLO detects all kinds of objects.
In closing
It has never been easier to explore and discover the latest possibilities with all kinds of AI and models, including the latest in computer vision. Start learning today!
Too much time on your hands?