Web AI Monthly #21: How to make videos watch themselves to do work for you, new browser APIs / Web AI library updates, and local text to speech
Jason Mayes
Web AI Lead @Google 13+yrs. Agent / LLM whisperer. On-device Artificial Intelligence / Machine Learning using Chrome | TensorFlow.js | MediaPipe. ?? Web Engineering + innovation ??
An ask to readers: Enjoy the content? Help me help you, by giving us a share. Let's centralize the community with all the amazing work being produced in this space and shine light on your most awesome creations so people who care can see them.
Tag me (Jason Mayes) if you make something noteworthy for future editions so I can help get eyes on your Web AI work - many of my readers work for top global tech companies or top startups. We have subscribers ranging from decision makers (think C-level, VPs, and Directors) to folk on the frontlines using this stuff day to day (SWEs, web engineers and researchers). You never know who may see your creations. Alright, lets go!
What if videos could watch themselves?
So I recently had one of those moments where I was frustrated with the inefficiencies of a process so I decided to solve the problem. How? With Web AI of course - how else was I going to do it?!
Turns out, the problem I wanted to solve, is actually pretty important for the whole creative industry or folk who just want to perform smarter search across multimedia.
Introducing doesVideoContain() a little experimental library currently powered by Transformers.js models. It allows you to specify something to look for, like "a cat wearing a hat", and then select some video from your local hard drive (no upload to any cloud), and as the video plays in the webpage it will automatically extract screenshots of any frames that match the target phrase! Let's see it in action where by I am searching for the words "jason mayes" or "a person standing on a plane" and notice how images containing those matches appear at the bottom:
Pretty cool right? I made this pretty fast in just a few hours on a weekend, so lots of room to expand this, as currently I just wrote a simple cosine similarity check find matches.
Imagine you use this with the Chrome filesystem API to index all your videos on your hard drive and store the embeddings in local storage or IndexDB. You could then perform a very fast search retrieval all locally without any cloud processing - on prem AI thanks to Web AI in the browser!
I would love to hear what you think so try it for yourself live here on CodePen (warning 1GB download but will be cached for future use - so future page loads will be faster) and let me know what features you would like to see in the future.
?Check out and star the Github source code here: https://github.com/jasonmayes/doesVideoContain
Transformers.js releases v3 and Chrome wants your feedback on new browser APIs!
Juicy updates this way come in the world of Web AI libraries and APIs.
Transformers.js V3
The popular Web AI library by Joshua Lochner at Hugging Face just got upgraded to v3! What does that mean? Well, even more state-of-the-art Machine Learning models for the web are now available, and even better they now have WebGPU support to run faster than ever. This is a must try this month. See what Joshua has to say over on Twitter:
To run a model on WebGPU, all you need to do is specify { ????????????: '????????????' } when loading the model/pipeline.
Check out his Scrimba example here to see how you can calculate embeddings for a sentence in v3: https://v2.scrimba.com/s0lmm0qh1q
Learn more on the project page: https://huggingface.co/docs/transformers.js/en/index
Chrome / W3C / WICG want your feedback on proposed APIs of the future
This month we are also seeking feedback on new API proposals over on GitHub for some pretty interesting browser level APIs:
Built in AI in Google Chrome is certainly turning heads. Even the CEO of Shopify Tobias Lütke stated "this is going to change things" in his public tweet. An interesting future ahead for #WebAI.
领英推荐
And fear not Tobias, we hear you on the setup pains - this is something we have been looking into as our experimental APIs mature, to be less experimental in the future, which should bring with it more stable and solid UX for usage.
Amazing new demos - including text to speech!
A Web AI newsletter would not be complete without at least a few demos to show off since last time so let's take a look at a few awesome ones this month.
True TTS in your browser - no cloud
The awesome Laurent Denoue posted a link to a new Hugging Face demo that achieves decent quality text to speech right in your browser entirely client side:
This is achieved using 微软 's ONNX Runtime Web to accelerate the model inference on the browser side.
Try the demo for yourself right now: https://huggingface.co/spaces/diffusionstudio/vits-web
Candle Moondream 2: Generate descriptions for images
Radamés Ajna recently shared the fact that a quantized version of Moondream2 is now running in the browser with Candle for Rust. It is a work in progress so a bit on the slower side right now via WebAssembly, but it works!
Incredible times we are living in. Try it yourself, though the weights are 1.5GB so have good WiFi: https://huggingface.co/spaces/radames/Candle-Moondream-2
Source code here: https://github.com/huggingface/candle/pull/1999
Robotics + Web AI
Many JS developers overlook the superpowers they have at their disposal when it comes to hardware control. With APIs like Web BLE, Web Serial, Web USB you have many ways to control 3rd party hardware wirelessly or via USB, or even via the network with Web RTC.
Well, Vladimir Glukhov has a bunch of great demos he has been working on over on his IG where by he gets some seriously cool super powers with his JavaScript + Robotics knowledge combined, and with 22,000 likes of this post it seems others are excited by this combination too:
Connect with Vladimir on Instagram to see his work in action and to stay up to date with his latest creations.
See you next time!
If you're new to this space and want to learn Web AI, you can get started fast with my free Google Developers course here (no background in AI needed, just a love for JavaScript and curiosity for AI - I will teach you from zero). Or get inspired via my Show & Tell - I got you either way!
See you next time with even more great content and please do tag me (Jason Mayes ) if you make or find something for future editions - I need your help lovely #WebAI community as things are moving so fast!
Cheers!
Jason Mayes (that Web AI guy).
Data Analyst | e-commerce | Google BigQuery
6 个月Just WOW ??
Technology Solutions Partner | Fueling Digital Growth in Logistics, Transportation, Supply Chain
6 个月Impressive