ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Geek Out Time: â€œVibe Codingâ€ on Google Colab with OpenAI & DeepSeek

Nedved Yang

å‘å¸ƒæ—¥æœŸ: 2025å¹´2æœˆ17æ—¥

(Also on Constellar tech blog https://medium.com/the-constellar-digital-technology-blog/geek-out-time-vibe-coding-on-google-colab-with-openai-deepseek-074805f64a28)

A new kind of coding, known as â€œvibe codingâ€, has emerged, a term popularized by Andrej Karpathy, former head of AI at Tesla and co-founder of OpenAI. This approach embraces the power of Large Language Models (LLMs) to accelerate coding by minimizing manual effort and relying on AI-generated completions. Instead of carefully crafting every line of code, developers guide AI tools with high-level instructions and iterate rapidly. With the advent of voice-powered AI interactions, coding becomes more intuitive, blurring the boundaries between ideation and execution.

Voice-powered coding is about flow. It enables developers to describe their intent naturally, relying on AI-assisted tools to translate high-level instructions into functioning code. Instead of laboriously writing every line, you explain what you need, and AI generates the implementation. The experience transforms coding into an interactive and iterative process, reducing friction and increasing efficiency. It is fun to try it out in the Geek Out Time.

The AI-Powered Evolution of Coding

The rise of AI coding assistants like GitHub Copilot, OpenAIâ€™s GPT models, and DeepSeek has fundamentally changed software development. Instead of manually structuring every function, developers now collaborate with AI to suggest, refine, and even generate full applications. This shift is leading to a new paradigm â€” one where coding is more about problem-solving and less about syntax mastery.

Karpathy describes this shift as a move toward an â€œAI-nativeâ€ development model. Engineers are increasingly adopting AI-driven workflows, acting more as orchestrators than traditional coders, guiding AI tools to build solutions efficiently. The goal is not just faster coding but an entirely new way to conceptualize and develop software.

Imagine building an application with minimal typing. You begin with a voice prompt: â€œCreate a function that analyzes sentiment in customer reviews and visualizes it as a bar chart.â€ The AI generates Python code, including data preprocessing, sentiment analysis using NLP, and a visualization with Matplotlib. You refine it further by instructing: â€œMake the colors gradient from red to green based on sentiment intensity.â€ Within minutes, you have a working prototype.

This is the essence of voice-powered coding: rapid iteration, real-time AI assistance, and an intuitive workflow where development feels more like a creative process than a mechanical task.

Running Voice-Powered Coding on Google Colab with OpenAI

One of the most exciting applications of AI-assisted coding is voice-to-code systems. Instead of typing, you speak your ideas, and AI translates them into executable code. Google Colab provides an excellent environment for this, where voice recognition tools integrate seamlessly with AI models.

Step 1: Install Required Packages

Run the following command in a Colab cell to install the necessary libraries:

!pip install openai scipy numpy ipython

é¢†è‹±æŽ¨è

Step-by-Step AI Roadmap & Gen AI Roadmap

Aqsa Z. 9 ä¸ªæœˆå‰

Issue #300 - The ML Engineer ??

Alejandro Saucedo 6 ä¸ªæœˆå‰

Issue #229 - THE ML ENGINEER ??

Alejandro Saucedo 1 å¹´å‰

Step 2: Set Up the AI-Powered Voice Coding System (OpenAI)

Now, copy and run the following full code in Google Colab:

import scipy.io.wavfile as wav
import numpy as np
import openai
import os
from openai import OpenAI
from IPython.display import Javascript
from google.colab import output
from base64 import b64decode

class VoiceCodeSystem:
    def __init__(self, api_key):
        self.client = OpenAI(api_key=api_key)
        self.sample_rate = 44100
        self.duration = 5  # Recording duration in seconds
        
    def record_audio(self):
        """Record audio using the browser's microphone"""
        print("Recording... Speak your code instructions (e.g., 'print hello')")
        # JavaScript to record audio in the browser
        record_js = """
        const sleep = ms => new Promise(r => setTimeout(r, ms));
        const record = (duration) => {
            return new Promise(async resolve => {
                const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
                const mediaRecorder = new MediaRecorder(stream);
                const chunks = [];
                mediaRecorder.ondataavailable = e => chunks.push(e.data);
                mediaRecorder.start();
                await sleep(duration * 1000);
                mediaRecorder.stop();
                mediaRecorder.onstop = () => {
                    const blob = new Blob(chunks, { type: 'audio/wav' });
                    const reader = new FileReader();
                    reader.onloadend = () => resolve(reader.result.split(',')[1]);
                    reader.readAsDataURL(blob);
                };
            });
        };
        record(%d)
        """ % self.duration
        
        # Execute JavaScript and get the recorded audio
        audio_b64 = output.eval_js(record_js)
        audio_bytes = b64decode(audio_b64)
        return audio_bytes
        
    def save_audio(self, audio_bytes, filename="temp.wav"):
        """Save the recording to a WAV file"""
        with open(filename, "wb") as f:
            f.write(audio_bytes)
        return filename
        
    def transcribe_audio(self, audio_file):
        """Use OpenAI Whisper API to transcribe audio"""
        try:
            with open(audio_file, "rb") as file:
                transcription = self.client.audio.transcriptions.create(
                    model="whisper-1",
                    file=file
                )
                return transcription.text
        except Exception as e:
            print(f"Error in transcription: {e}")
            return None
    
    def generate_code_with_gpt(self, text):
        """Use GPT to generate Python code from natural language"""
        try:
            prompt = f"""Convert the following natural language instruction into Python code.
            Return only the Python code without any explanation.
            
            Instruction: {text}
            
            Python code:"""
            
            response = self.client.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=[
                    {"role": "system", "content": "You are a Python code generator. Generate only valid Python code without any explanation or markdown formatting."},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.7,
                max_tokens=150
            )
            
            return response.choices[0].message.content.strip()
            
        except Exception as e:
            print(f"Error generating code: {e}")
            return None
        
    def run_session(self):
        """Run the voice coding session"""
        variables = {}
        temp_file = "temp.wav"
        
        print("Voice Coding Session Started!")
        print("\nSpeak your Python code instructions naturally.")
        print("Examples:")
        print("- 'Print hello'")
        print("- 'Create a list of numbers from 1 to 5'")
        print("- 'Calculate the square of 5'")
        print("- Say 'exit' or 'stop' to end the session")
        
        while True:
            try:
                # Record audio
                audio_bytes = self.record_audio()
                
                # Save to temporary file
                self.save_audio(audio_bytes, temp_file)
                
                # Transcribe audio
                text = self.transcribe_audio(temp_file)
                if text:
                    print("\nTranscribed text:", text)
                    
                    if "exit" in text.lower() or "stop" in text.lower():
                        print("Ending voice coding session")
                        break
                    
                    # Generate code using GPT
                    code = self.generate_code_with_gpt(text)
                    if code:
                        print("\nGenerated code:")
                        print(code)
                        print("\nOutput:")
                        try:
                            exec(code, globals(), variables)
                            print("\nCode executed successfully!")
                        except Exception as e:
                            print(f"Error executing code: {e}")
                    else:
                        print("Could not generate code from the instruction")
                        
            except Exception as e:
                print(f"Error in session: {e}")
                
            finally:
                # Clean up temporary file
                if os.path.exists(temp_file):
                    os.remove(temp_file)

def test_system(api_key):
    """Test the voice coding system with a simple instruction"""
    system = VoiceCodeSystem(api_key)
    
    # Test GPT code generation without voice
    test_instruction = "print hello"
    print("Testing code generation with:", test_instruction)
    
    code = system.generate_code_with_gpt(test_instruction)
    if code:
        print("\nGenerated test code:")
        print(code)
        return True
    return False

def start_voice_coding(api_key):
    """Initialize and start the voice coding system"""
    print("Testing system first...")
    if test_system(api_key):
        print("\nSystem test successful! Starting voice coding session...\n")
        system = VoiceCodeSystem(api_key)
        system.run_session()
    else:
        print("System test failed. Please check your API key and try again.")

# Usage example
if __name__ == "__main__":
    api_key = "sk-proj-xxxxxx"  # Replace with your actual OpenAI API key
    start_voice_coding(api_key)

Run it. I asked the simple program to print a hello and here is the result,

Alternative: Running Voice-Powered Coding on Google Colab with DeepSeek

For those looking for an alternative to OpenAI, DeepSeek provides a similar API for transcription and AI-powered code generation.

Step 1: Install Required Packages

Run this in a Colab cell:

!pip install requests scipy numpy ipython

Step 2: Set Up the AI-Powered Voice Coding System (DeepSeek)

# Install required packages
!pip install requests
!pip install scipy
!pip install numpy
!pip install ipython

import scipy.io.wavfile as wav
import numpy as np
import requests
import os
from IPython.display import Javascript
from google.colab import output
from base64 import b64decode

class VoiceCodeSystem:
    def __init__(self, api_key):
        self.api_key = api_key
        self.sample_rate = 44100
        self.duration = 5  # Recording duration in seconds
        
    def record_audio(self):
        """Record audio using the browser's microphone"""
        print("Recording... Speak your code instructions (e.g., 'print hello')")
        # JavaScript to record audio in the browser
        record_js = """
        const sleep = ms => new Promise(r => setTimeout(r, ms));
        const record = (duration) => {
            return new Promise(async resolve => {
                const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
                const mediaRecorder = new MediaRecorder(stream);
                const chunks = [];
                mediaRecorder.ondataavailable = e => chunks.push(e.data);
                mediaRecorder.start();
                await sleep(duration * 1000);
                mediaRecorder.stop();
                mediaRecorder.onstop = () => {
                    const blob = new Blob(chunks, { type: 'audio/wav' });
                    const reader = new FileReader();
                    reader.onloadend = () => resolve(reader.result.split(',')[1]);
                    reader.readAsDataURL(blob);
                };
            });
        };
        record(%d)
        """ % self.duration
        
        # Execute JavaScript and get the recorded audio
        audio_b64 = output.eval_js(record_js)
        audio_bytes = b64decode(audio_b64)
        return audio_bytes
        
    def save_audio(self, audio_bytes, filename="temp.wav"):
        """Save the recording to a WAV file"""
        with open(filename, "wb") as f:
            f.write(audio_bytes)
        return filename
        
    def transcribe_audio(self, audio_file):
        """Use DeepSeek API to transcribe audio"""
        try:
            url = "https://api.deepseek.com/v1/transcribe"  # Replace with DeepSeek's transcription endpoint
            headers = {
                "Authorization": f"Bearer {self.api_key}",
                "Content-Type": "audio/wav"
            }
            with open(audio_file, "rb") as file:
                response = requests.post(url, headers=headers, data=file)
                if response.status_code == 200:
                    return response.json().get("text")
                else:
                    print(f"Error in transcription: {response.status_code}, {response.text}")
                    return None
        except Exception as e:
            print(f"Error in transcription: {e}")
            return None
    
    def generate_code_with_deepseek(self, text):
        """Use DeepSeek API to generate Python code from natural language"""
        try:
            url = "https://api.deepseek.com/v1/generate-code"  # Replace with DeepSeek's code generation endpoint
            headers = {
                "Authorization": f"Bearer {self.api_key}",
                "Content-Type": "application/json"
            }
            data = {
                "prompt": f"Convert the following natural language instruction into Python code. Return only the Python code without any explanation.\n\nInstruction: {text}\n\nPython code:",
                "model": "deepseek-code",  # Replace with DeepSeek's code generation model
                "temperature": 0.7,
                "max_tokens": 150
            }
            response = requests.post(url, headers=headers, json=data)
            if response.status_code == 200:
                return response.json().get("code")
            else:
                print(f"Error generating code: {response.status_code}, {response.text}")
                return None
        except Exception as e:
            print(f"Error generating code: {e}")
            return None
        
    def run_session(self):
        """Run the voice coding session"""
        variables = {}
        temp_file = "temp.wav"
        
        print("Voice Coding Session Started!")
        print("\nSpeak your Python code instructions naturally.")
        print("Examples:")
        print("- 'Print hello'")
        print("- 'Create a list of numbers from 1 to 5'")
        print("- 'Calculate the square of 5'")
        print("- Say 'exit' or 'stop' to end the session")
        
        while True:
            try:
                # Record audio
                audio_bytes = self.record_audio()
                
                # Save to temporary file
                self.save_audio(audio_bytes, temp_file)
                
                # Transcribe audio
                text = self.transcribe_audio(temp_file)
                if text:
                    print("\nTranscribed text:", text)
                    
                    if "exit" in text.lower() or "stop" in text.lower():
                        print("Ending voice coding session")
                        break
                    
                    # Generate code using DeepSeek
                    code = self.generate_code_with_deepseek(text)
                    if code:
                        print("\nGenerated code:")
                        print(code)
                        print("\nOutput:")
                        try:
                            exec(code, globals(), variables)
                            print("\nCode executed successfully!")
                        except Exception as e:
                            print(f"Error executing code: {e}")
                    else:
                        print("Could not generate code from the instruction")
                        
            except Exception as e:
                print(f"Error in session: {e}")
                
            finally:
                # Clean up temporary file
                if os.path.exists(temp_file):
                    os.remove(temp_file)

def test_system(api_key):
    """Test the voice coding system with a simple instruction"""
    system = VoiceCodeSystem(api_key)
    
    # Test code generation without voice
    test_instruction = "print hello"
    print("Testing code generation with:", test_instruction)
    
    code = system.generate_code_with_deepseek(test_instruction)
    if code:
        print("\nGenerated test code:")
        print(code)
        return True
    return False

def start_voice_coding(api_key):
    """Initialize and start the voice coding system"""
    print("Testing system first...")
    if test_system(api_key):
        print("\nSystem test successful! Starting voice coding session...\n")
        system = VoiceCodeSystem(api_key)
        system.run_session()
    else:
        print("System test failed. Please check your API key and try again.")

# Usage example
if __name__ == "__main__":
    api_key = "your-deepseek-api-key"  # Replace with your actual DeepSeek API key
    start_voice_coding(api_key)

Both implementations provide an effective way to transform speech into functional Python code, allowing developers to choose the best fit for their workflow.

By integrating AI into the coding process, voice-powered programming is transforming how developers build software. Whether using OpenAI or DeepSeek, this approach enables a more fluid, creative, and efficient way to write code. While AI-powered coding offers significant benefits, it also introduces several concerns. One issue is over-reliance on AI, where developers might begin to accept AI-generated code without fully understanding it. Additionally, debugging AI-generated code can be more challenging, as developers may struggle to trace back unintended behaviors or inefficiencies. Will critical thinking and problem-solving skills erode if developers overly depend on AI for routine coding tasks rather than actively engaging with the logic and structure of their programsâ€¦. Nevertheless, we all have to learn, adopt, and evolve in one way or anotherâ€¦.

Enjoy the experimenting and have fun !

Prasun Kumar

I Help Businesses 10X Their Visibility & Generate Leads Organically | Growth & Content Strategy Expert | No Ads, Just Results!

3 å‘¨

https://www.dhirubhai.net/posts/prasun-kumar-978617145_is-vibe-coding-the-future-of-software-development-activity-7298652187629342720-_bQd?utm_source=share&utm_medium=member_desktop&rcm=ACoAACMnHawBKTS3U2I8XwvMzK9JXqtdZfeP77E

èµž

å›žå¤

Akshath K

RPA - Associate Lead at Ensemble Health Partners | A360 | Power Automate

3 å‘¨

Hi Nedved Yang - Thank you for this article. I was able to successfully install and use the functionality. Could you please guide me on how to start the framework after a system restart? I assume the services stop upon rebootâ€”what is the correct way to restart them? Appreciate your help.

èµž

å›žå¤

æŸ¥çœ‹æ›´å¤šè¯„è®º

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Nedved Yangçš„æ›´å¤šæ–‡ç«

Geek Out Time: Trying newly released OpenAIâ€™s Responses API with Web Search Tool in Google Colab

2025å¹´3æœˆ17æ—¥

Geek Out Time: Trying newly released OpenAIâ€™s Responses API with Web Search Tool in Google Colab

(Also on Constellar tech blog:â€¦

1 æ¡è¯„è®º
Geek Out Time: Building a Multi-Agent Financial Advisor Copilot with AG2 (formerly AutoGen), OpenAI, and DeepSeek LLM

2025å¹´3æœˆ3æ—¥

Geek Out Time: Building a Multi-Agent Financial Advisor Copilot with AG2 (formerly AutoGen), OpenAI, and DeepSeek LLM

(Also on Constellar tech blogâ€¦

2 æ¡è¯„è®º
Geek Out Time: Simulating Distributed Training on TPU & GPU in Google Colab

2025å¹´2æœˆ24æ—¥

Geek Out Time: Simulating Distributed Training on TPU & GPU in Google Colab

(Also on Constellar tech blogâ€¦
Geek Out Time: Mixture of Experts(MoE) vs. CNN: A Google Colab Experiment

2025å¹´2æœˆ10æ—¥

Geek Out Time: Mixture of Experts(MoE) vs. CNN: A Google Colab Experiment

(Also on Constellar tech blogâ€¦

4 æ¡è¯„è®º
Geek Out Time: Knowledge Distillation in TensorFlow- Smaller, Smarter Models in Google Colab

2025å¹´2æœˆ4æ—¥

Geek Out Time: Knowledge Distillation in TensorFlow- Smaller, Smarter Models in Google Colab

(Also on Constellar tech blogâ€¦
Geek Out Time: Build Your Own Autonomous AI Agent Backed by the Top Open-Source LLM DeepSeek v3 and Browser-Use Web UI-Right in Your Browser

2025å¹´1æœˆ20æ—¥

Geek Out Time: Build Your Own Autonomous AI Agent Backed by the Top Open-Source LLM DeepSeek v3 and Browser-Use Web UI-Right in Your Browser

(Also on Constellar tech blogâ€¦

2 æ¡è¯„è®º
Geek Out Time: AI Model Routing â€” Dynamically Choose Models Based on Question Complexity

2025å¹´1æœˆ13æ—¥

Geek Out Time: AI Model Routing â€” Dynamically Choose Models Based on Question Complexity

(Also on Constellar tech blogâ€¦
Geek Out Time: AI in the Browser- Run WebLLM for Powerful, Local LLM Experiences

2024å¹´12æœˆ23æ—¥

Geek Out Time: AI in the Browser- Run WebLLM for Powerful, Local LLM Experiences

(Also on Constellar tech blog https://nedvedyang.medium.

1 æ¡è¯„è®º
Geek Out Time: Exploring Opensource AnythingLLM â€” The All-in-One, Easy AI Platform for Local RAG and Intelligent Agents with Just a Click

2024å¹´12æœˆ9æ—¥

Geek Out Time: Exploring Opensource AnythingLLM â€” The All-in-One, Easy AI Platform for Local RAG and Intelligent Agents with Just a Click

(Also on Constellar tech blogâ€¦

3 æ¡è¯„è®º
Geek Out Time: Exploring LoRA on Google Colab: the Challenges of Base Model Upgrades

2024å¹´12æœˆ6æ—¥

Geek Out Time: Exploring LoRA on Google Colab: the Challenges of Base Model Upgrades

(Also on Constellar tech blogâ€¦

See all articles

Geek Out Time: â€œVibe Codingâ€ on Google Colab with OpenAI & DeepSeek

Nedved Yang

The AI-Powered Evolution of Coding

Running Voice-Powered Coding on Google Colab with OpenAI

Step 1: Install Required Packages

é¢†è‹±æŽ¨è

Step 2: Set Up the AI-Powered Voice Coding System (OpenAI)

Alternative: Running Voice-Powered Coding on Google Colab with DeepSeek

Step 1: Install Required Packages

Step 2: Set Up the AI-Powered Voice Coding System (DeepSeek)

Nedved Yangçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Coding for the first time with GPT-4 - feels like getting a superpower

Integrating OpenAI APIs with ChatMotor.ai : A Retex Guide

Recognize, Detect, Segment, and Moderate Your Images with a Single API! ??

OpenAIâ€™s o1 Model: The Next Leap in AIâ€™s Quest for Human-Like Reasoning

AI and Programmers: A Synergistic Relationship, Not a Job Threat

Achieving Codevolution: Unleashing the AI Force to Forge the Future of Coding

Daily Dose of Tech | 2024-01-30

How to Use ChatGPT API in Python?

Unlocking the Power of AI: Getting Started with DeepSeek API

Code Generation with Large Language Models (LLMs)

The AI-Powered Evolution of Coding

Running Voice-Powered Coding on Google Colab with OpenAI

Step 1: Install Required Packages

é¢†è‹±æŽ¨è

Step 2: Set Up the AI-Powered Voice Coding System (OpenAI)

Alternative: Running Voice-Powered Coding on Google Colab with DeepSeek

Step 1: Install Required Packages

Step 2: Set Up the AI-Powered Voice Coding System (DeepSeek)

Nedved Yangçš„æ›´å¤šæ–‡ç«

Geek Out Time: Trying newly released OpenAIâ€™s Responses API with Web Search Tool in Google Colab

Geek Out Time: Building a Multi-Agent Financial Advisor Copilot with AG2 (formerly AutoGen), OpenAI, and DeepSeek LLM

Geek Out Time: Simulating Distributed Training on TPU & GPU in Google Colab

Geek Out Time: Mixture of Experts(MoE) vs. CNN: A Google Colab Experiment

Geek Out Time: Knowledge Distillation in TensorFlow- Smaller, Smarter Models in Google Colab

Geek Out Time: Build Your Own Autonomous AI Agent Backed by the Top Open-Source LLM DeepSeek v3 and Browser-Use Web UI-Right in Your Browser

Geek Out Time: AI Model Routing â€” Dynamically Choose Models Based on Question Complexity

Geek Out Time: AI in the Browser- Run WebLLM for Powerful, Local LLM Experiences

Geek Out Time: Exploring Opensource AnythingLLM â€” The All-in-One, Easy AI Platform for Local RAG and Intelligent Agents with Just a Click

Geek Out Time: Exploring LoRA on Google Colab: the Challenges of Base Model Upgrades

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Coding for the first time with GPT-4 - feels like getting a superpower

Integrating OpenAI APIs with ChatMotor.ai : A Retex Guide

Recognize, Detect, Segment, and Moderate Your Images with a Single API! ??

OpenAIâ€™s o1 Model: The Next Leap in AIâ€™s Quest for Human-Like Reasoning

AI and Programmers: A Synergistic Relationship, Not a Job Threat

Achieving Codevolution: Unleashing the AI Force to Forge the Future of Coding

Daily Dose of Tech | 2024-01-30

How to Use ChatGPT API in Python?

Unlocking the Power of AI: Getting Started with DeepSeek API

Code Generation with Large Language Models (LLMs)

é¢†è‹±æŽ¨è

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†