Web AI Monthly #21: How to make videos watch themselves to do work for you, new browser APIs / Web AI library updates, and local text to speech
Web AI / Web ML Monthly Newsletter - September 2024 Edition, by Jason Mayes, Web AI Lead at Google

Web AI Monthly #21: How to make videos watch themselves to do work for you, new browser APIs / Web AI library updates, and local text to speech

An ask to readers: Enjoy the content? Help me help you, by giving us a share. Let's centralize the community with all the amazing work being produced in this space and shine light on your most awesome creations so people who care can see them.

Tag me (Jason Mayes) if you make something noteworthy for future editions so I can help get eyes on your Web AI work - many of my readers work for top global tech companies or top startups. We have subscribers ranging from decision makers (think C-level, VPs, and Directors) to folk on the frontlines using this stuff day to day (SWEs, web engineers and researchers). You never know who may see your creations. Alright, lets go!

What if videos could watch themselves?

So I recently had one of those moments where I was frustrated with the inefficiencies of a process so I decided to solve the problem. How? With Web AI of course - how else was I going to do it?!

Turns out, the problem I wanted to solve, is actually pretty important for the whole creative industry or folk who just want to perform smarter search across multimedia.

Introducing doesVideoContain() a little experimental library currently powered by Transformers.js models. It allows you to specify something to look for, like "a cat wearing a hat", and then select some video from your local hard drive (no upload to any cloud), and as the video plays in the webpage it will automatically extract screenshots of any frames that match the target phrase! Let's see it in action where by I am searching for the words "jason mayes" or "a person standing on a plane" and notice how images containing those matches appear at the bottom:

DoesVideoContain() library where by you can get videos to do useful work for you so you dont have to watch them!
doesVideoContain() library in action extracting matching frames for some search text in a video for you!

Pretty cool right? I made this pretty fast in just a few hours on a weekend, so lots of room to expand this, as currently I just wrote a simple cosine similarity check find matches.

Imagine you use this with the Chrome filesystem API to index all your videos on your hard drive and store the embeddings in local storage or IndexDB. You could then perform a very fast search retrieval all locally without any cloud processing - on prem AI thanks to Web AI in the browser!

I would love to hear what you think so try it for yourself live here on CodePen (warning 1GB download but will be cached for future use - so future page loads will be faster) and let me know what features you would like to see in the future.

?Check out and star the Github source code here: https://github.com/jasonmayes/doesVideoContain

Transformers.js releases v3 and Chrome wants your feedback on new browser APIs!

Juicy updates this way come in the world of Web AI libraries and APIs.

Transformers.js V3

The popular Web AI library by Joshua Lochner at Hugging Face just got upgraded to v3! What does that mean? Well, even more state-of-the-art Machine Learning models for the web are now available, and even better they now have WebGPU support to run faster than ever. This is a must try this month. See what Joshua has to say over on Twitter:

To run a model on WebGPU, all you need to do is specify { ????????????: '????????????' } when loading the model/pipeline.

Check out his Scrimba example here to see how you can calculate embeddings for a sentence in v3: https://v2.scrimba.com/s0lmm0qh1q

Learn more on the project page: https://huggingface.co/docs/transformers.js/en/index


Chrome / W3C / WICG want your feedback on proposed APIs of the future

This month we are also seeking feedback on new API proposals over on GitHub for some pretty interesting browser level APIs:

  1. Writing Assistance APIs - browsers and operating systems are increasingly expected to gain access to a language model. Web applications can benefit from using language models for a variety of use cases. There is now a proposal for a group of APIs that use language models to give web developers high-level assistance with writing. Add your feedback and thoughts here: https://github.com/WICG/proposals/issues/163
  2. Web Translation API - browsers are increasingly offering language translation to their users. Such translation capabilities can also be useful to web developers. To perform translation in such cases, web sites currently have to either call out to cloud APIs, or bring their own translation models and run them using technologies like WebAssembly and WebGPU. This proposal introduces a new JavaScript API for exposing a browser's existing language translation abilities to web pages, so that if present, they can serve as a simpler and less resource-intensive alternative. Add your feedback here: https://github.com/w3ctag/design-reviews/issues/948

Built in AI in Google Chrome is certainly turning heads. Even the CEO of Shopify Tobias Lütke stated "this is going to change things" in his public tweet. An interesting future ahead for #WebAI.

And fear not Tobias, we hear you on the setup pains - this is something we have been looking into as our experimental APIs mature, to be less experimental in the future, which should bring with it more stable and solid UX for usage.

Amazing new demos - including text to speech!

A Web AI newsletter would not be complete without at least a few demos to show off since last time so let's take a look at a few awesome ones this month.

True TTS in your browser - no cloud

The awesome Laurent Denoue posted a link to a new Hugging Face demo that achieves decent quality text to speech right in your browser entirely client side:

This is achieved using 微软 's ONNX Runtime Web to accelerate the model inference on the browser side.

Try the demo for yourself right now: https://huggingface.co/spaces/diffusionstudio/vits-web


Candle Moondream 2: Generate descriptions for images

Radamés Ajna recently shared the fact that a quantized version of Moondream2 is now running in the browser with Candle for Rust. It is a work in progress so a bit on the slower side right now via WebAssembly, but it works!

Quantized moondream2 running in the browser via Web AI
Generate descriptions for your images with Moondream2 in browser - no cloud.

Incredible times we are living in. Try it yourself, though the weights are 1.5GB so have good WiFi: https://huggingface.co/spaces/radames/Candle-Moondream-2

Source code here: https://github.com/huggingface/candle/pull/1999


Robotics + Web AI

Many JS developers overlook the superpowers they have at their disposal when it comes to hardware control. With APIs like Web BLE, Web Serial, Web USB you have many ways to control 3rd party hardware wirelessly or via USB, or even via the network with Web RTC.

Well, Vladimir Glukhov has a bunch of great demos he has been working on over on his IG where by he gets some seriously cool super powers with his JavaScript + Robotics knowledge combined, and with 22,000 likes of this post it seems others are excited by this combination too:

Web AI and Robotics get along well giving super powers to JS devs
Fun fact: You can use JavaScript and Web AI to control robotics!

Connect with Vladimir on Instagram to see his work in action and to stay up to date with his latest creations.


See you next time!

If you're new to this space and want to learn Web AI, you can get started fast with my free Google Developers course here (no background in AI needed, just a love for JavaScript and curiosity for AI - I will teach you from zero). Or get inspired via my Show & Tell - I got you either way!

See you next time with even more great content and please do tag me (Jason Mayes ) if you make or find something for future editions - I need your help lovely #WebAI community as things are moving so fast!

Cheers!

Jason Mayes (that Web AI guy).

Bartosz Figas

Data Analyst | e-commerce | Google BigQuery

6 个月

Just WOW ??

Roman Pazyuk

Technology Solutions Partner | Fueling Digital Growth in Logistics, Transportation, Supply Chain

6 个月

Impressive

要查看或添加评论,请登录

Jason Mayes的更多文章

社区洞察

其他会员也浏览了