You Sing It: Putting Your Voice into Any Song Using Python
Pablo Schaffner Bofill
Principal Software Engineer & AI Specialist | Startup Co-Founder | Expert in Python, Full-Stack Development, & Tech Leadership | 20+ Years in Tech
Have you ever wanted to sing your favorite songs but felt held back by your vocal skills? What if there was a way to imprint your voice onto any song you like without actually singing it? This article will guide you through a fascinating project using Python to "sing" any song with your voice. Ready to take the stage?
Preparing Our Project
Before we start coding, let's make sure our development environment is set up correctly. We'll be using Conda, an open-source package management system that makes it easy to install and manage Python packages and environments. This tool is incredibly useful, especially for complex projects like this one.
Setting Up Conda and Our Project Folder
First, install Miniconda or Anaconda if you haven't already. Either of these will give you access to Conda.
Once Conda is installed, create a new directory for our project:
mkdir voice-replace-project
cd voice-replace-project
This folder, 'voice-replace-project', will contain all of the Python scripts and audio files you'll be using.
Now, create a new Conda enviroment in this directory:
conda create --name voice-replace-env python=3.9
This command creates a new environment called voice-replace-env with Python version 3.9. Feel free to change the name or Python version as necessary.
Activate the environment with the following command:
conda activate voice-replace-env
Installing the Necessary Libraries
After activating your Conda environment, we'll install the necessary Python libraries. Spleeter for audio separation, CoquiTTS for voice synthesis, and PyDub for audio processing:
pip install spleeter
pip install pydub
pip install daal==2021.4.0
pip install TTS
Preparing FFmpeg
For these libraries to work, especially Spleeter, you need to have FFmpeg installed on your machine. FFmpeg is a software suite to handle multimedia data. It provides command-line tools to convert, play, and record audio and video.
If you're using a Mac, you can easily install FFmpeg using Homebrew:
brew install ffmpeg
For other operating systems, you can download FFmpeg from the official FFmpeg site. Follow the instructions based on your specific operating system.
To check if FFmpeg is installed correctly, open a new terminal window (not the one where your Conda environment is active), and type ffmpeg -version. If it returns information about the installed FFmpeg version, you're all set!
The Project
Now that we're prepared, we'll starting exploring how to implement the required steps, and then merging everything on an easy to use terminal app.
Step 1: Separating the Vocals from the Song
Spleeter is an amazing library that can separate vocals and instrumentals from any song. Here's how you can use it to isolate the vocals:
from spleeter.separator import Separator
# Initialize separator in '2stems' mode.
separator = Separator('spleeter:2stems')
# Perform the separation.
separator.separate_to_file('path_to_your_song.mp3', 'output_directory')
Note that 'path_your_song.mp3' will be the path of the song file you want to process. This script will create two new files in the 'output_directory': 'vocals.wav' and 'accompaniment.wav'.
领英推荐
Step 2: Generating Your Voice Version of the Vocals using CoquiTTS
CoquiTTS is a powerful, versatile Text-to-Speech library. We'll use its voice conversion capabilities to replace the original vocals in the song with your voice.
This process will require the vocal track extracted from the song and a sample of your voice, both in .wav format. Here's how we do it:
from TTS.api import TTS
# Initialize the CoquiTTS API.
tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False, gpu=False)
# Convert the original vocal track to sound like your voice.
tts.voice_conversion_to_file(source_wav="path_to_song_vocal_track.wav", target_wav="path_to_your_voice_sample.wav", file_path="output.wav")
Note: "path_to_song_vocal_track.wav" is a placeholder for the actual path to the .wav file of the extracted vocal track. "path_to_your_voice_sample.wav" should be replaced with the actual path to the .wav sample of your voice.
This script generates an output.wav file in the same directory. This file contains the new vocal track for your song, sung in your voice. Please make sure that both the vocal track and your voice sample are in .wav format, as CoquiTTS requires this. We'll later modify this snippet to ensure it's in the right format; or you can also use a tool like FFmpeg to convert .mp3.
Step 3: Mixing the Tracks Back Together
Now that we have the new vocal track in our voice and the original instrumental track, we can mix them back together. We'll use PyDub, a simple and easy-to-use Python library for audio processing.
Here is a Python script that does that:
from pydub import AudioSegment
# Load the instrumental and vocal tracks.
instrumental = AudioSegment.from_wav('path_to_instrumental_track.wav')
vocals = AudioSegment.from_wav('output.wav')
# Mix the two tracks together.
mixed = instrumental.overlay(vocals)
# Export the final mixed audio.
mixed.export('final_output.mp3', format='mp3')
Where 'path_to_instrumental_track.wav' is the path to the instrumental track extracted earlier.
Now, we can bring all the steps together in a single Python script !
Creating a CLI tool
Let's take it a step further and turn our Python script into a CLI (Command-Line Interface) tool. This will make it much easier to use. Here's a Python script that accomplishes this, using the argparse library to handle command-line arguments:
import argparse
from spleeter.separator import Separator
from TTS.api import TTS
from pydub import AudioSegment
def separate_vocals(song_path, output_dir):
? ? separator = Separator('spleeter:2stems')
? ? separator.separate_to_file(song_path, output_dir)
def convert_mp3_to_wav(mp3_path, wav_path):
? ? audio = AudioSegment.from_mp3(mp3_path)
? ? audio.export(wav_path, format='wav')
def generate_voice(vocal_track_path, voice_sample_path, output_path):
? ? tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False, gpu=False)
? ? tts.voice_conversion_to_file(source_wav=vocal_track_path, target_wav=voice_sample_path, file_path=output_path)
def mix_tracks(instrumental_track_path, vocal_track_path, final_output_path):
? ? instrumental = AudioSegment.from_wav(instrumental_track_path)
? ? vocals = AudioSegment.from_wav(vocal_track_path)
? ? mixed = instrumental.overlay(vocals)
? ? mixed.export(final_output_path, format='mp3')
def main():
? ? parser = argparse.ArgumentParser(description="Replace the vocals in a song with your own voice.")
? ? parser.add_argument('-s', '--sample-voice', required=True, help="Path to the sample of your voice in .mp3 format.")
? ? parser.add_argument('-a', '--audio', required=True, help="Path to the song file in .mp3 format.")
? ? args = parser.parse_args()
? ? print("Separating vocals and instrumentals...")
? ? separate_vocals(args.audio, 'output')
? ??
? ? print("Converting sample voice to .wav format...")
? ? convert_mp3_to_wav(args.sample_voice, 'voice_sample.wav')
? ? print("Generating voice signature...")
? ? generate_voice('output/vocals.wav', 'voice_sample.wav', 'new_vocals.wav')
? ? print("Mixing tracks...")
? ? mix_tracks('output/accompaniment.wav', 'new_vocals.wav', 'final_output.mp3')
? ? print("Done! Check out 'final_output.mp3' for the final result.")
if __name__ == "__main__":
? ? main()
You can run the script like this:
python replace_voice.py -s your_sample_voice_file.mp3 -a your_song_file.mp3
The script will create a new file named 'final_output.mp3', which is the original song sung in your voice!
Conclusion
Wow! What a fantastic journey we've been on. Now, you have a powerful tool at your disposal. You can not only sing your favorite songs but give them a whole new personal twist. And hey, who knows? You could end up discovering a hidden talent!
But wait, there's more! The voice replacement skills you've honed don't just apply to music. Think of the vast expanse of creative applications this opens up. You can provide voice-overs for characters in movies or video games. Maybe you can create voice simulations for learning experiences or even bring to life your very own virtual assistant.
One particularly exciting application could be in the realm of audio translations for videos. You could maintain the voice signatures of the original speakers, providing a seamless, authentic experience for listeners in different languages.
Let your imagination run wild! Experiment with different songs, voice textures, and tones. Who knows where this journey will take you next. Keep exploring, keep learning, and above all, keep having fun with code! Happy coding, folks!
Actor de Doblaje, Application Support / Infrastructure Support - EY GDS
8 个月really excellent!!. Do you have any suggestions on how to improve a model to be more similar with the original voice provided? thanks!!, great job
Independent Design Professional
1 年Demo?