Schubert's 'Unfinished' Symphony: How AI finished it on a smartphone!  + Python Code!

Schubert's 'Unfinished' Symphony: How AI finished it on a smartphone! + Python Code!

Did you know that machine learning can mimic Beethoven? Or that data science can turn a black-and-white film into 4K? It's not just tech talk; it's a creative revolution going on baby! And it's now all happening on the device you use to ignore phone calls from unknown numbers. The possibilities are endless, and the puns are plentiful.

You know that feeling when you're listening to a symphony, and it just stops? Yeah, neither did I. But Schubert's Symphony No. 8 did just that. It stopped, unfinished, like my attempts to assemble IKEA furniture. For centuries, people have been left hanging, like a bad 5G on a ski trip. Until now. And guess what? A smartphone with AI did it. Yes, the same device you use to take selfies and play Candy Crush.

AI: The New Maestro in Town (And It Fits in Your Pocket)

Enter Huawei's AI technology, a maestro with a digital baton, ready to finish what Schubert started. It's like Beethoven meeting Siri, a blend of classical genius with modern tech-savvy. And it all happened on a Huawei smartphone. That's right, the same phone you accidentally drop in the toilet now composes symphonies.


Listen to the Encore, Now with 100% More Smartphone

How They Did It: A Symphony in Your Pocket

Chinese technology company Huawei decided to try and use AI to complete Schubert's Symphony No. 8, with the help of composer Lucas Cantor. Engineers fed music, in the form of data, into the phone's dual Neural Processing Unit. The AI then created melodies from that information, and Cantor orchestrated those melodies into the final two movements. It's like having a personal Mozart on speed dial.

The new completion was performed in London at Cadogan Hall, in an event presented by Myleene Klass. Speaking ahead of the world premiere, Cantor said that working with the AI was "like having a collaborator who never gets tired, never runs out of ideas." But, he added, having his music performed immediately after Schubert's was slightly nerve-wracking: "It's a bit like being a comedian and having the greatest comedian in the world go on before you." Talk about a tough act to follow!


The Future of AI: Beyond Music (And Beyond Your Smartphone Screen)

Imagine a world where AI doesn't just crunch numbers; it speaks, sings, and composes. It's not just about algorithms; it's about artistry, creativity, and innovation. And it's all happening on the device you use to order pizza. Walter Ji, the president of Huawei Consumer Business Group, said: "We used the power of AI to extend the boundaries of what is humanly possible and see the positive role technology might have on modern culture." It's like having Beethoven, Shakespeare, and Einstein all rolled into one, without the funny hair or the need for a charger.


A Symphony of Possibilities (Now Playing on a Smartphone Near You)

The completion of Schubert's Unfinished Symphony is just the beginning. From text-to-speech models like Bark AI to music models that can mimic the great composers, AI is opening doors to endless creative possibilities. It's like having a personal Beethoven on your laptop, minus the grumpy attitude, and now on your smartphone too. Who knew your phone could do more than just take blurry photos of your cat?

Bark AI is a cutting-edge AI text-to-speech and music maker that's taking the tech world by storm. It's not just a tool; it's a creative partner, transforming the way we interact with sounds and music. The main difference between Bark AI and a text-to-speech generator is that Bark AI is a fully generative text-to-audio model that can generate not only speech but also music, background noise, and simple sound effects. In contrast, a text-to-speech generator focuses on converting text into spoken audio.

Plus something else that is cool, but also creepy too is that Bark AI can produce nonverbal communications like laughing, sighing, and crying while a text-to-speech generator cannot. As an open-source project, BARK invites collaboration and innovation from nerds and hipsters around the globe. However, due to the way it makes sound from pretty much scratch, BARK currently only produces 15 seconds of generated content. But fear not, fellow geeks and voice over enthusiasts! I've extended the python code so that it extends BARK's capabilities and speaks the text into a single long file using consistent words. The original suggestion was to use the nltk library but it made the voice over more unnatural when I ran it, and my method of segments worked better!

I also added some audio enhancements, made it output version 2 of the latest voices, 1 through speaker 9 so you can have your choice, and export them into individual .wav and .mp3 files... with built in EQ balancer depending on the voice, and because sharing is caring, I'll share the code! (I've got some cool stuff in the works for Bark, voice cloning with AI podcast producer/script writer agents that produce full stories, sound effects and music all from a television or film treatment upload... and a SaaS competitor of ElevenLabs.io in the works!)

import os
import re
import numpy as np
from bark import SAMPLE_RATE, generate_audio, preload_models
from scipy.io.wavfile import write as write_wav, read as read_wav
from scipy.signal import butter, lfilter, freqz
from datetime import datetime
from pydub import AudioSegment
from pydub.effects import compress_dynamic_range
from scipy.fftpack import fft
from IPython.display import Audio, display

# Set environment variables
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
os.environ["SUNO_USE_SMALL_MODELS"] = "0"
os.environ["SUNO_OFFLOAD_CPU"] = "0"

def butter_lowpass_filter(data, cutoff, fs, order=5):
??? b, a = butter(order, cutoff / (0.5 * fs), btype='low', analog=False)
??? return lfilter(b, a, data)

def detect_and_equalize_bass(data, threshold=0.1):
??? N = len(data)
??? freq_data = fft(data)
??? bass_content = np.abs(freq_data[:N//20])
??? bass_level = np.sum(bass_content) / N
??? if bass_level > threshold:
??????? b, a = butter(1, 100 / (SAMPLE_RATE / 2), btype='high')
??????? data = lfilter(b, a, data)
??? return data

def split_text_into_segments(text, max_segment_length=140):
??? sentences = re.split(r'(?<=[.!?])\s+', text)
??? segments = []
??? current_segment = ''
??? for sentence in sentences:
??????? if len(current_segment) + len(sentence) <= max_segment_length:
??????????? current_segment += ' ' + sentence
??????? else:
??????????? segments.append(current_segment.strip())
??????????? current_segment = sentence
??? if current_segment:
??????? segments.append(current_segment.strip())
??? return segments

def generate_and_save_audio_segments(text_segments, speaker_id):
??? preload_models()
??? audio_segments = []
??? for segment in text_segments:
??????? audio_array = generate_audio(segment, history_prompt=speaker_id)
??????? audio_segments.append(audio_array)
??? return audio_segments

def normalize_audio(audio_array):
??? return (audio_array / np.max(np.abs(audio_array)) * 32767).astype(np.int16)

def combine_and_save_audio(audio_segments, base_file_name="combined_bark_audio", speaker_number=0):
??? combined_audio = np.concatenate(audio_segments)
??? normalized_audio = normalize_audio(combined_audio)
?? ?
??? timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
??? file_name_wav = f"{base_file_name}_speaker_{speaker_number}_{timestamp}.wav"
??? file_name_mp3 = f"{base_file_name}_speaker_{speaker_number}_{timestamp}.mp3"
?? ?
??? write_wav(file_name_wav, SAMPLE_RATE, normalized_audio)
??? return normalized_audio, file_name_wav, file_name_mp3

def convert_wav_to_mp3(wav_file, mp3_file):
??? audio = AudioSegment.from_wav(wav_file)
??? audio.export(mp3_file, format="mp3")

def play_audio(audio_array):
??? display(Audio(audio_array, rate=SAMPLE_RATE))

def apply_equalization(audio, bands):
??? equalized_audio = audio.copy()
??? for frequency, gain in bands:
??????? b, a = butter(1, frequency / (SAMPLE_RATE / 2), btype='low')
??????? w, h = freqz(b, a, worN=audio.shape[0])
??????? equalized_audio += np.real(np.fft.ifft(np.fft.fft(audio) * np.abs(h))) * gain
??? return equalized_audio

def apply_audio_processing(file_name_wav):
??? rate, data = read_wav(file_name_wav)
?? ?
??? # Apply low-pass filter
??? filtered_audio = butter_lowpass_filter(data, 3000, rate)
?? ?
??? # Detect and equalize bass
??? equalized_audio = detect_and_equalize_bass(filtered_audio)
?? ?
??? # Additional equalization (optional)
??? bands = [(100, 0.5), (1000, 1.2), (5000, 0.8)]
??? fully_equalized_audio = apply_equalization(equalized_audio, bands)
?? ?
??? write_wav(file_name_wav, rate, fully_equalized_audio.astype(np.int16))

# Main code
text_prompt = "... your text here ..."
for speaker_number in range(1, 10):? # Include speakers 1 to 9
??? speaker_id = f"v2/en_speaker_{speaker_number}"
??? text_segments = split_text_into_segments(text_prompt, max_segment_length=140)
??? audio_segments = generate_and_save_audio_segments(text_segments, speaker_id)
??? combined_audio, file_name_wav, file_name_mp3 = combine_and_save_audio(audio_segments, speaker_number=speaker_number)

??? apply_audio_processing(file_name_wav)
??? convert_wav_to_mp3(file_name_wav, file_name_mp3)

??? print(f"Audio saved to {file_name_wav} and {file_name_mp3} using {speaker_id}")
??? play_audio(combined_audio)


        
No alt text provided for this image
Debugging the export of the wav and mp3 for clipping. Will keep posting updated code.

So here's the deal. AI like this, is not just a tool; it's a partner in creativity now whether you like to accept it or not. It's a bridge between the past and the present, the classical and the contemporary. It's a symphony that's playing a new tune, a tune that's both timeless and revolutionary. It's only a matter of time before ChatGPT will run locally on the same device that you use to swipe right or left for the next shallow picture to show up. Stay thirsty my friends...

#StayThirstyMyFriends #ArtificialIntelligence #MusicRevolution #TechInnovation #SymphonyAI #CreativeTechnology #SmartphoneGenius #ClassicalMeetsModern #OpenSourceMagic #AIComposer #FutureOfSound

Eliza Rusu

B2B Marketing Specialist at BCR┃Certified Digital Marketing Professional┃Digital Marketing Enthusiast┃

1 年

Great! After all, music is some kind of coding

回复
Md Rubel

Full-Stack Engineer | AI/ML Integration Specialist | Author

1 年

it's very important information for AI lover's

要查看或添加评论,请登录

Jose P.的更多文章

社区洞察

其他会员也浏览了