登录查看更多内容

How to Convert Audio and Video to Text Locally for Free Using Whisper WebGPU

Vladislav G.

Marketing Director & Lead Web Developer | Driving Data-Driven Marketing Strategies with Technical Expertise

发布日期: 2024年10月13日

In this article, I’m going to show you how you can easily transcribe audio and video files on your own computer using Whisper WebGPU — without needing an internet connection.

Initial Requirements to Run Whisper Model Locally

We are going to build a web application, so you would need:

Terminal
Git
Node.js & NPM
Browser: Chrome or Firefox
Windows 10+, Linux, or Mac (note: Safari on MacOS not fully support WebGPU)

The basic hardware requirements:

Multi-core processor i5 or higher.
16 GB of RAM is preferable for large audio or video files.

What is Whisper from OpenAI

Whisper is an advanced speech recognition system developed by OpenAI. It’s designed to transcribe spoken language into written text and can also translate different languages. Whisper is known for its accuracy and ability to understand a variety of accents, languages, and even background noise, making it one of the most reliable tools for converting audio to text.

One of the best things about Whisper is that it’s open-source, meaning anyone can access it and use it for free. It can be run on cloud servers or even on your local computer, depending on your needs.

Official website: https://openai.com/index/whisper/

Hugging Face’s Transformers.js and ONNX Runtime Web

We are going to use Whisper WebGPU https://github.com/xenova/whisper-web/tree/experimental-webgpu project.

This project utilizes OpenAI’s Whisper model and runs entirely on your device using WebGPU. It also leverages Hugging Face’s Transformers.js and ONNX Runtime Web, allowing all computations to be performed locally on your device without the need for server-side processing. This means that once the model is loaded, you won’t need an internet connection.

Key Features of Whisper WebGPU:

Real-Time In-Browser Processing: The technology allows for real-time speech recognition within the browser, enhancing user privacy by eliminating the need to send data to external servers.
Multilingual Support: It supports transcription and translation in 100 languages, making it a versatile tool for global applications.
Local Computation: By leveraging WebGPU technology, the model runs entirely on the user’s device, which not only enhances privacy but also allows for offline functionality once the model is initially loaded.
Model Characteristics: The core model used is Whisper-base, which is optimized for web inference with a size of approximately 200 MB. This makes it lightweight yet powerful enough for real-time applications.

How to Run Whisper Model Locally (Ubuntu, Linux)

I will show you how to run it on Ubuntu (Linux). However, if you use Windows or Mac, you can follow the same steps inside, but you have to use the terminal.

Step 1. Istall GIT, Node.JS, and NPM

If you are using Ubuntu, Git should be already there. However, if it’s not, use this command:

sudo apt update
sudo apt install git

Install Node.js:

sudo apt install nodejs

Install NPM:

sudo apt install npm

Step 2. Turn on WebGPU in the Browser

Ensure your browser is configured to support WebGPU. Inside address bar in Crome Browser write chrome://flags, then find “Unsafe WebGPU Support” enable it, and relaunch the browser.

Towards AI 5 个月前

Open Source AI Models: Coding Outside the Proprietary…

Neil Sahota 8 个月前

The Rise of Open Source: A Wake-up Call for AI Giants

Bruno W Agra 1 年前

This is still an experimental feature in some browsers, so you may need to enable it in browser settings.

You can check the WebGPU status by opening chrome://gpu/ in your browser.

In some cases on Ubuntu, even after relaunching, WebGPU could be disabled. In this case, try to open the browser with the following command:

/opt/google/chrome/chrome --enable-unsafe-webgpu

Step 3. Clone Repository and Install Dependencies

Clone the Whisper WebGPU project by following the command:

git clone https://github.com/xenova/whisper-web.git

Once the cloning process is finished, go inside the folder whisper-web:

cd whisper-web

Then run the following command:

npm install

After that run:

npm run dev

To start a web server. The URL of your web application will be available in the terminal window. E.g. https://localhost:5174/

Step 4. Run the Application

Go to your browser and open the URL from the terminal to see your application.

This web application supports various audio and video formats and even recording from your microphone.

To start the transcription process, simply provide the URL to the audio or upload the video file from your local computer.

Video Tutorial

Watch on YouTube: Audio and Video to text converter.

Conclusion

Whisper WebGPU represents a significant step forward in speech recognition technology by bringing powerful, AI-driven transcription and translation capabilities directly to your browser. By utilizing OpenAI’s Whisper model and advanced tools like WebGPU, Transformers.js, and ONNX Runtime Web, this project makes real-time, offline transcription accessible to everyone while also prioritizing privacy and convenience.

If you like this tutorial, please follow me on YouTube, or join my Telegram. Thanks! :)

Vladislav G.

Marketing Director & Lead Web Developer | Driving Data-Driven Marketing Strategies with Technical Expertise

1 个月

Real-Time Audio to Text in Your Browser https://youtu.be/YQWNuRTCcUk?si=G35I7B_gb6GYaT9m

要查看或添加评论，请登录

Vladislav G.的更多文章

Napkin AI: Text to Visuals AI

2024年11月26日

Napkin AI: Text to Visuals AI

I want to introduce you to a tool that could be very useful in your daily workflow — Napkin AI. Napkin AI is an…

1 条评论
How to Use NotebookLM Podcast Demo

2024年11月18日

How to Use NotebookLM Podcast Demo

Have you ever felt overwhelmed trying to make sense of dense notes or wished you could listen to them like a podcast…

1 条评论
Free AI Prompt Generator – Instantly Boost Your Creativity!

2024年11月11日

Free AI Prompt Generator – Instantly Boost Your Creativity!

In today’s article, I’m going to show you a simple, free tool that can help you create great prompts for any AI model…

1 条评论
NotebookLM vs. Perplexity Spaces: The Ultimate Guide to These AI Tools!

2024年10月30日

NotebookLM vs. Perplexity Spaces: The Ultimate Guide to These AI Tools!

I recently explored two powerful AI tools?—?NotebookLM and Perplexity Spaces?—?and want to share my comprehensive…

1 条评论
Set up a local AI like ChatGPT on your computer (Web Interface)

2024年10月1日

Set up a local AI like ChatGPT on your computer (Web Interface)

Did you know you can run powerful AI models right on your computer? It’s true! Today, I will show you how easy it is to…
Mastering Replit: A Beginner's Tutorial on Using Replit Agent

2024年9月23日

Mastering Replit: A Beginner's Tutorial on Using Replit Agent

Did you know that coding tools have evolved so rapidly that they’re now shaping the future of programming itself? As…
Explore Cursor AI on Linux: 2024 Setup & Insights

2024年9月16日

Explore Cursor AI on Linux: 2024 Setup & Insights

Today, we’ll explore Cursor AI. Whether you’re an experienced developer or a beginner, this article will be helpful…
Clean Code: Stop Writing Bad Python Code — Lessons from Uncle Bob

2024年9月10日

Clean Code: Stop Writing Bad Python Code — Lessons from Uncle Bob

Are you tired of writing messy and unorganized code that leads to frustration and bugs? You can transform your code…
Gemini Gems Offers Free 1-Month Trial for Custom AI Chatbots

2024年9月2日

Gemini Gems Offers Free 1-Month Trial for Custom AI Chatbots

Google has launched an exciting new feature called “Gems” within its Gemini platform. This feature allows users to…
Ladybird Browser: An Independent Browser's Ambitious Path

2024年8月26日

Ladybird Browser: An Independent Browser's Ambitious Path

In this article, I would like to introduce a new web browser called Ladybird. This ambitious open-source project aims…

See all articles

How to Convert Audio and Video to Text Locally for Free Using Whisper WebGPU

Vladislav G.

Marketing Director & Lead Web Developer | Driving Data-Driven Marketing Strategies with Technical Expertise

Initial Requirements to Run Whisper Model Locally

What is Whisper from OpenAI

Hugging Face’s Transformers.js and ONNX Runtime Web

How to Run Whisper Model Locally (Ubuntu, Linux)

Step 1. Istall GIT, Node.JS, and NPM

Step 2. Turn on WebGPU in the Browser

领英推荐

Step 3. Clone Repository and Install Dependencies

Step 4. Run the Application

Video Tutorial

Conclusion

Vladislav G.的更多文章

社区洞察

其他会员也浏览了

The 6 Best LLM Tools To Run Models Locally

Is the Future Open-Source? Ollama on Windows 11

String - Encoding from zero to hero

Unveiling the Future: My Unforgettable Journey at Google I/O Connect Amsterdam

An Introductory Guide To LangSmith

Local Llama

TrueFoundry Newsletter #19: Open Source LLMs & Authenticating Services ??

Optimistic Updates | December

Turning browsers into smart agents with GPT + ARIA

Initial Requirements to Run Whisper Model Locally

What is Whisper from OpenAI

Hugging Face’s Transformers.js and ONNX Runtime Web

How to Run Whisper Model Locally (Ubuntu, Linux)

Step 1. Istall GIT, Node.JS, and NPM

Step 2. Turn on WebGPU in the Browser

领英推荐

Step 3. Clone Repository and Install Dependencies

Step 4. Run the Application

Video Tutorial

Conclusion

Vladislav G.的更多文章

Napkin AI: Text to Visuals AI

How to Use NotebookLM Podcast Demo

Free AI Prompt Generator – Instantly Boost Your Creativity!

NotebookLM vs. Perplexity Spaces: The Ultimate Guide to These AI Tools!

Set up a local AI like ChatGPT on your computer (Web Interface)

Mastering Replit: A Beginner's Tutorial on Using Replit Agent

Explore Cursor AI on Linux: 2024 Setup & Insights

Clean Code: Stop Writing Bad Python Code — Lessons from Uncle Bob

Gemini Gems Offers Free 1-Month Trial for Custom AI Chatbots

Ladybird Browser: An Independent Browser's Ambitious Path

社区洞察

其他会员也浏览了

The 6 Best LLM Tools To Run Models Locally

Is the Future Open-Source? Ollama on Windows 11

String - Encoding from zero to hero

Unveiling the Future: My Unforgettable Journey at Google I/O Connect Amsterdam

An Introductory Guide To LangSmith

Local Llama

TrueFoundry Newsletter #19: Open Source LLMs & Authenticating Services ??

Optimistic Updates | December

Turning browsers into smart agents with GPT + ARIA