登录查看更多内容

You Sing It: Putting Your Voice into Any Song Using Python

Pablo Schaffner Bofill

Principal Software Engineer & AI Specialist | Startup Co-Founder | Expert in Python, Full-Stack Development, & Tech Leadership | 20+ Years in Tech

发布日期: 2023年7月23日

Have you ever wanted to sing your favorite songs but felt held back by your vocal skills? What if there was a way to imprint your voice onto any song you like without actually singing it? This article will guide you through a fascinating project using Python to "sing" any song with your voice. Ready to take the stage?

Preparing Our Project

Before we start coding, let's make sure our development environment is set up correctly. We'll be using Conda, an open-source package management system that makes it easy to install and manage Python packages and environments. This tool is incredibly useful, especially for complex projects like this one.

Setting Up Conda and Our Project Folder

First, install Miniconda or Anaconda if you haven't already. Either of these will give you access to Conda.

Once Conda is installed, create a new directory for our project:

mkdir voice-replace-project
cd voice-replace-project

This folder, 'voice-replace-project', will contain all of the Python scripts and audio files you'll be using.

Now, create a new Conda enviroment in this directory:

conda create --name voice-replace-env python=3.9

This command creates a new environment called voice-replace-env with Python version 3.9. Feel free to change the name or Python version as necessary.

Activate the environment with the following command:

conda activate voice-replace-env

Installing the Necessary Libraries

After activating your Conda environment, we'll install the necessary Python libraries. Spleeter for audio separation, CoquiTTS for voice synthesis, and PyDub for audio processing:

pip install spleeter
pip install pydub
pip install daal==2021.4.0
pip install TTS

Preparing FFmpeg

For these libraries to work, especially Spleeter, you need to have FFmpeg installed on your machine. FFmpeg is a software suite to handle multimedia data. It provides command-line tools to convert, play, and record audio and video.

If you're using a Mac, you can easily install FFmpeg using Homebrew:

brew install ffmpeg

For other operating systems, you can download FFmpeg from the official FFmpeg site. Follow the instructions based on your specific operating system.

To check if FFmpeg is installed correctly, open a new terminal window (not the one where your Conda environment is active), and type ffmpeg -version. If it returns information about the installed FFmpeg version, you're all set!

The Project

Now that we're prepared, we'll starting exploring how to implement the required steps, and then merging everything on an easy to use terminal app.

Step 1: Separating the Vocals from the Song

Spleeter is an amazing library that can separate vocals and instrumentals from any song. Here's how you can use it to isolate the vocals:

from spleeter.separator import Separator

# Initialize separator in '2stems' mode.
separator = Separator('spleeter:2stems')

# Perform the separation.
separator.separate_to_file('path_to_your_song.mp3', 'output_directory')

Note that 'path_your_song.mp3' will be the path of the song file you want to process. This script will create two new files in the 'output_directory': 'vocals.wav' and 'accompaniment.wav'.

领英推荐

Power up with Apify

Apify 7 个月前

Creating Discord Bots with Python: A Step-by-Step Guide

Hadi E-Learning 2 个月前

Which one is the best programming language currently…

Lexunit 2 年前

Step 2: Generating Your Voice Version of the Vocals using CoquiTTS

CoquiTTS is a powerful, versatile Text-to-Speech library. We'll use its voice conversion capabilities to replace the original vocals in the song with your voice.

This process will require the vocal track extracted from the song and a sample of your voice, both in .wav format. Here's how we do it:

from TTS.api import TTS

# Initialize the CoquiTTS API.
tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False, gpu=False)

# Convert the original vocal track to sound like your voice.
tts.voice_conversion_to_file(source_wav="path_to_song_vocal_track.wav", target_wav="path_to_your_voice_sample.wav", file_path="output.wav")

Note: "path_to_song_vocal_track.wav" is a placeholder for the actual path to the .wav file of the extracted vocal track. "path_to_your_voice_sample.wav" should be replaced with the actual path to the .wav sample of your voice.

This script generates an output.wav file in the same directory. This file contains the new vocal track for your song, sung in your voice. Please make sure that both the vocal track and your voice sample are in .wav format, as CoquiTTS requires this. We'll later modify this snippet to ensure it's in the right format; or you can also use a tool like FFmpeg to convert .mp3.

Step 3: Mixing the Tracks Back Together

Now that we have the new vocal track in our voice and the original instrumental track, we can mix them back together. We'll use PyDub, a simple and easy-to-use Python library for audio processing.

Here is a Python script that does that:

from pydub import AudioSegment

# Load the instrumental and vocal tracks.
instrumental = AudioSegment.from_wav('path_to_instrumental_track.wav')
vocals = AudioSegment.from_wav('output.wav')

# Mix the two tracks together.
mixed = instrumental.overlay(vocals)

# Export the final mixed audio.
mixed.export('final_output.mp3', format='mp3')

Where 'path_to_instrumental_track.wav' is the path to the instrumental track extracted earlier.

Now, we can bring all the steps together in a single Python script !

Creating a CLI tool

Let's take it a step further and turn our Python script into a CLI (Command-Line Interface) tool. This will make it much easier to use. Here's a Python script that accomplishes this, using the argparse library to handle command-line arguments:

import argparse
from spleeter.separator import Separator
from TTS.api import TTS
from pydub import AudioSegment

def separate_vocals(song_path, output_dir):
? ? separator = Separator('spleeter:2stems')
? ? separator.separate_to_file(song_path, output_dir)

def convert_mp3_to_wav(mp3_path, wav_path):
? ? audio = AudioSegment.from_mp3(mp3_path)
? ? audio.export(wav_path, format='wav')

def generate_voice(vocal_track_path, voice_sample_path, output_path):
? ? tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False, gpu=False)
? ? tts.voice_conversion_to_file(source_wav=vocal_track_path, target_wav=voice_sample_path, file_path=output_path)

def mix_tracks(instrumental_track_path, vocal_track_path, final_output_path):
? ? instrumental = AudioSegment.from_wav(instrumental_track_path)
? ? vocals = AudioSegment.from_wav(vocal_track_path)
? ? mixed = instrumental.overlay(vocals)
? ? mixed.export(final_output_path, format='mp3')


def main():
? ? parser = argparse.ArgumentParser(description="Replace the vocals in a song with your own voice.")
? ? parser.add_argument('-s', '--sample-voice', required=True, help="Path to the sample of your voice in .mp3 format.")
? ? parser.add_argument('-a', '--audio', required=True, help="Path to the song file in .mp3 format.")
? ? args = parser.parse_args()

? ? print("Separating vocals and instrumentals...")
? ? separate_vocals(args.audio, 'output')
? ??
? ? print("Converting sample voice to .wav format...")
? ? convert_mp3_to_wav(args.sample_voice, 'voice_sample.wav')

? ? print("Generating voice signature...")
? ? generate_voice('output/vocals.wav', 'voice_sample.wav', 'new_vocals.wav')

? ? print("Mixing tracks...")
? ? mix_tracks('output/accompaniment.wav', 'new_vocals.wav', 'final_output.mp3')

? ? print("Done! Check out 'final_output.mp3' for the final result.")


if __name__ == "__main__":
? ? main()

You can run the script like this:

python replace_voice.py -s your_sample_voice_file.mp3 -a your_song_file.mp3

The script will create a new file named 'final_output.mp3', which is the original song sung in your voice!

Conclusion

Wow! What a fantastic journey we've been on. Now, you have a powerful tool at your disposal. You can not only sing your favorite songs but give them a whole new personal twist. And hey, who knows? You could end up discovering a hidden talent!

But wait, there's more! The voice replacement skills you've honed don't just apply to music. Think of the vast expanse of creative applications this opens up. You can provide voice-overs for characters in movies or video games. Maybe you can create voice simulations for learning experiences or even bring to life your very own virtual assistant.

One particularly exciting application could be in the realm of audio translations for videos. You could maintain the voice signatures of the original speakers, providing a seamless, authentic experience for listeners in different languages.

Let your imagination run wild! Experiment with different songs, voice textures, and tones. Who knows where this journey will take you next. Keep exploring, keep learning, and above all, keep having fun with code! Happy coding, folks!

#voicecloning #music #ai

Martin Artigues

Actor de Doblaje, Application Support / Infrastructure Support - EY GDS

8 个月

really excellent!!. Do you have any suggestions on how to improve a model to be more similar with the original voice provided? thanks!!, great job

Christian Geell

Independent Design Professional

1 年

Demo?

查看更多评论

要查看或添加评论，请登录

Pablo Schaffner Bofill的更多文章

Effortlessly Share Local Files with Temporary Public URLs Using remote-expose

2025年2月22日

Effortlessly Share Local Files with Temporary Public URLs Using remote-expose

Have you ever needed to share a local file over the internet without setting up a full-fledged server? Whether you're…
Versioning Your LLM Prompts, the i18n Way

2025年1月12日

Versioning Your LLM Prompts, the i18n Way

Large Language Models (LLMs) have been a game-changer for Python developers looking to build AI-driven applications…
Automate Any Web Service with AI: Building an API for the Web

2025年1月1日

Automate Any Web Service with AI: Building an API for the Web

When working with web services, we often hit a wall when there’s no API available. Maybe you need to automate a login…
Creating Realistic Podcasts with Python: A Developer’s Guide to Accessible and Engaging Audio Content

2024年11月20日

Creating Realistic Podcasts with Python: A Developer’s Guide to Accessible and Engaging Audio Content

Podcasts are a great way to share ideas, tell stories, and explain complex topics in an approachable format. They can…

2 条评论
Integrating React Components into NiceGUI: Bridging Python and React Seamlessly

2024年11月5日

Integrating React Components into NiceGUI: Bridging Python and React Seamlessly

I've always appreciated the simplicity and power of NiceGUI for building web interfaces with Python. However, I found…

5 条评论
Running Ruby Gems from Python

2024年10月14日

Running Ruby Gems from Python

Have you ever wished you could use a Ruby gem directly from your Python code? Perhaps there's a Ruby library that…
How to Build an Autonomous Web Browsing Agent

2024年8月27日

How to Build an Autonomous Web Browsing Agent

Last weekend, I took on a challenge: to build an autonomous web browsing agent capable of navigating websites…
Building a Proxy Server with Chrome Extension and FastAPI

2024年7月17日

Building a Proxy Server with Chrome Extension and FastAPI

In this tutorial, we'll build a proxy server using a Chrome extension to handle web requests through the user's browser…
Machine Learning in Your Browser? The Power of Transformers.js

2024年6月18日

Machine Learning in Your Browser? The Power of Transformers.js

Hello there! If you’ve ever dreamed of running machine learning models directly in your web browser without the heavy…

1 条评论
Simplify PR Reviews with My New GitHub Action: PR Rules Checker

2024年6月3日

Simplify PR Reviews with My New GitHub Action: PR Rules Checker

As developers, we know the importance of maintaining code quality and adhering to coding standards. But let's face…

See all articles

You Sing It: Putting Your Voice into Any Song Using Python

Pablo Schaffner Bofill

Principal Software Engineer & AI Specialist | Startup Co-Founder | Expert in Python, Full-Stack Development, & Tech Leadership | 20+ Years in Tech

Preparing Our Project

Setting Up Conda and Our Project Folder

Installing the Necessary Libraries

Preparing FFmpeg

The Project

Step 1: Separating the Vocals from the Song

领英推荐

Step 2: Generating Your Voice Version of the Vocals using CoquiTTS

Step 3: Mixing the Tracks Back Together

Creating a CLI tool

Conclusion

Pablo Schaffner Bofill的更多文章

社区洞察

其他会员也浏览了

Which one is the best programming language currently, and why is it Python?

The most popular language for web scraping in 2023?

Unlock the Power of Python Function Factories: Build Dynamic, Reusable Code for Web Scraping and More

Creating Interactive Map Applications in Python Using the Folium Module

4 Advanced Python Function Tricks

Managing and Sharing Reproducible Python Scripts with uv

Getting started with async in Python

Polymorphism Hidden in Plain Sight?

A Comparative Guide to Argparse, Click, and Typer for Python Command-Line Argument Parsing

Preparing Our Project

Setting Up Conda and Our Project Folder

Installing the Necessary Libraries

Preparing FFmpeg

The Project

Step 1: Separating the Vocals from the Song

领英推荐

Step 2: Generating Your Voice Version of the Vocals using CoquiTTS

Step 3: Mixing the Tracks Back Together

Creating a CLI tool

Conclusion

Pablo Schaffner Bofill的更多文章

Effortlessly Share Local Files with Temporary Public URLs Using remote-expose

Versioning Your LLM Prompts, the i18n Way

Automate Any Web Service with AI: Building an API for the Web

Creating Realistic Podcasts with Python: A Developer’s Guide to Accessible and Engaging Audio Content

Integrating React Components into NiceGUI: Bridging Python and React Seamlessly

Running Ruby Gems from Python

How to Build an Autonomous Web Browsing Agent

Building a Proxy Server with Chrome Extension and FastAPI

Machine Learning in Your Browser? The Power of Transformers.js

Simplify PR Reviews with My New GitHub Action: PR Rules Checker

社区洞察

其他会员也浏览了

Which one is the best programming language currently, and why is it Python?

The most popular language for web scraping in 2023?

Unlock the Power of Python Function Factories: Build Dynamic, Reusable Code for Web Scraping and More

Creating Interactive Map Applications in Python Using the Folium Module

4 Advanced Python Function Tricks

Managing and Sharing Reproducible Python Scripts with uv

Getting started with async in Python

Polymorphism Hidden in Plain Sight?

A Comparative Guide to Argparse, Click, and Typer for Python Command-Line Argument Parsing