登录查看更多内容

Schubert's 'Unfinished' Symphony: How AI finished it on a smartphone! + Python Code!

Jose P.

AI Engineer | AGI Researcher | MarTech Solutions | Creative Producer (P.G.A.)

发布日期: 2023年8月19日

Did you know that machine learning can mimic Beethoven? Or that data science can turn a black-and-white film into 4K? It's not just tech talk; it's a creative revolution going on baby! And it's now all happening on the device you use to ignore phone calls from unknown numbers. The possibilities are endless, and the puns are plentiful.

You know that feeling when you're listening to a symphony, and it just stops? Yeah, neither did I. But Schubert's Symphony No. 8 did just that. It stopped, unfinished, like my attempts to assemble IKEA furniture. For centuries, people have been left hanging, like a bad 5G on a ski trip. Until now. And guess what? A smartphone with AI did it. Yes, the same device you use to take selfies and play Candy Crush.

AI: The New Maestro in Town (And It Fits in Your Pocket)

Enter Huawei's AI technology, a maestro with a digital baton, ready to finish what Schubert started. It's like Beethoven meeting Siri, a blend of classical genius with modern tech-savvy. And it all happened on a Huawei smartphone. That's right, the same phone you accidentally drop in the toilet now composes symphonies.

Listen to the Encore, Now with 100% More Smartphone

How They Did It: A Symphony in Your Pocket

Chinese technology company Huawei decided to try and use AI to complete Schubert's Symphony No. 8, with the help of composer Lucas Cantor. Engineers fed music, in the form of data, into the phone's dual Neural Processing Unit. The AI then created melodies from that information, and Cantor orchestrated those melodies into the final two movements. It's like having a personal Mozart on speed dial.

The new completion was performed in London at Cadogan Hall, in an event presented by Myleene Klass. Speaking ahead of the world premiere, Cantor said that working with the AI was "like having a collaborator who never gets tired, never runs out of ideas." But, he added, having his music performed immediately after Schubert's was slightly nerve-wracking: "It's a bit like being a comedian and having the greatest comedian in the world go on before you." Talk about a tough act to follow!

领英推荐

Future Beat: A calculated move

The National News 8 个月前

NVIDIA Reshapes the AI Landscape: Beyond GPUs to…

Cogent Integrated Business Solutions Inc. 3 个月前

Sohu: The AI Chip Designed to Race Ahead in the Age of…

JK Tech 7 个月前

The Future of AI: Beyond Music (And Beyond Your Smartphone Screen)

Imagine a world where AI doesn't just crunch numbers; it speaks, sings, and composes. It's not just about algorithms; it's about artistry, creativity, and innovation. And it's all happening on the device you use to order pizza. Walter Ji, the president of Huawei Consumer Business Group, said: "We used the power of AI to extend the boundaries of what is humanly possible and see the positive role technology might have on modern culture." It's like having Beethoven, Shakespeare, and Einstein all rolled into one, without the funny hair or the need for a charger.

A Symphony of Possibilities (Now Playing on a Smartphone Near You)

The completion of Schubert's Unfinished Symphony is just the beginning. From text-to-speech models like Bark AI to music models that can mimic the great composers, AI is opening doors to endless creative possibilities. It's like having a personal Beethoven on your laptop, minus the grumpy attitude, and now on your smartphone too. Who knew your phone could do more than just take blurry photos of your cat?

Bark AI is a cutting-edge AI text-to-speech and music maker that's taking the tech world by storm. It's not just a tool; it's a creative partner, transforming the way we interact with sounds and music. The main difference between Bark AI and a text-to-speech generator is that Bark AI is a fully generative text-to-audio model that can generate not only speech but also music, background noise, and simple sound effects. In contrast, a text-to-speech generator focuses on converting text into spoken audio.

Plus something else that is cool, but also creepy too is that Bark AI can produce nonverbal communications like laughing, sighing, and crying while a text-to-speech generator cannot. As an open-source project, BARK invites collaboration and innovation from nerds and hipsters around the globe. However, due to the way it makes sound from pretty much scratch, BARK currently only produces 15 seconds of generated content. But fear not, fellow geeks and voice over enthusiasts! I've extended the python code so that it extends BARK's capabilities and speaks the text into a single long file using consistent words. The original suggestion was to use the nltk library but it made the voice over more unnatural when I ran it, and my method of segments worked better!

I also added some audio enhancements, made it output version 2 of the latest voices, 1 through speaker 9 so you can have your choice, and export them into individual .wav and .mp3 files... with built in EQ balancer depending on the voice, and because sharing is caring, I'll share the code! (I've got some cool stuff in the works for Bark, voice cloning with AI podcast producer/script writer agents that produce full stories, sound effects and music all from a television or film treatment upload... and a SaaS competitor of ElevenLabs.io in the works!)

import os
import re
import numpy as np
from bark import SAMPLE_RATE, generate_audio, preload_models
from scipy.io.wavfile import write as write_wav, read as read_wav
from scipy.signal import butter, lfilter, freqz
from datetime import datetime
from pydub import AudioSegment
from pydub.effects import compress_dynamic_range
from scipy.fftpack import fft
from IPython.display import Audio, display

# Set environment variables
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
os.environ["SUNO_USE_SMALL_MODELS"] = "0"
os.environ["SUNO_OFFLOAD_CPU"] = "0"

def butter_lowpass_filter(data, cutoff, fs, order=5):
??? b, a = butter(order, cutoff / (0.5 * fs), btype='low', analog=False)
??? return lfilter(b, a, data)

def detect_and_equalize_bass(data, threshold=0.1):
??? N = len(data)
??? freq_data = fft(data)
??? bass_content = np.abs(freq_data[:N//20])
??? bass_level = np.sum(bass_content) / N
??? if bass_level > threshold:
??????? b, a = butter(1, 100 / (SAMPLE_RATE / 2), btype='high')
??????? data = lfilter(b, a, data)
??? return data

def split_text_into_segments(text, max_segment_length=140):
??? sentences = re.split(r'(?<=[.!?])\s+', text)
??? segments = []
??? current_segment = ''
??? for sentence in sentences:
??????? if len(current_segment) + len(sentence) <= max_segment_length:
??????????? current_segment += ' ' + sentence
??????? else:
??????????? segments.append(current_segment.strip())
??????????? current_segment = sentence
??? if current_segment:
??????? segments.append(current_segment.strip())
??? return segments

def generate_and_save_audio_segments(text_segments, speaker_id):
??? preload_models()
??? audio_segments = []
??? for segment in text_segments:
??????? audio_array = generate_audio(segment, history_prompt=speaker_id)
??????? audio_segments.append(audio_array)
??? return audio_segments

def normalize_audio(audio_array):
??? return (audio_array / np.max(np.abs(audio_array)) * 32767).astype(np.int16)

def combine_and_save_audio(audio_segments, base_file_name="combined_bark_audio", speaker_number=0):
??? combined_audio = np.concatenate(audio_segments)
??? normalized_audio = normalize_audio(combined_audio)
?? ?
??? timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
??? file_name_wav = f"{base_file_name}_speaker_{speaker_number}_{timestamp}.wav"
??? file_name_mp3 = f"{base_file_name}_speaker_{speaker_number}_{timestamp}.mp3"
?? ?
??? write_wav(file_name_wav, SAMPLE_RATE, normalized_audio)
??? return normalized_audio, file_name_wav, file_name_mp3

def convert_wav_to_mp3(wav_file, mp3_file):
??? audio = AudioSegment.from_wav(wav_file)
??? audio.export(mp3_file, format="mp3")

def play_audio(audio_array):
??? display(Audio(audio_array, rate=SAMPLE_RATE))

def apply_equalization(audio, bands):
??? equalized_audio = audio.copy()
??? for frequency, gain in bands:
??????? b, a = butter(1, frequency / (SAMPLE_RATE / 2), btype='low')
??????? w, h = freqz(b, a, worN=audio.shape[0])
??????? equalized_audio += np.real(np.fft.ifft(np.fft.fft(audio) * np.abs(h))) * gain
??? return equalized_audio

def apply_audio_processing(file_name_wav):
??? rate, data = read_wav(file_name_wav)
?? ?
??? # Apply low-pass filter
??? filtered_audio = butter_lowpass_filter(data, 3000, rate)
?? ?
??? # Detect and equalize bass
??? equalized_audio = detect_and_equalize_bass(filtered_audio)
?? ?
??? # Additional equalization (optional)
??? bands = [(100, 0.5), (1000, 1.2), (5000, 0.8)]
??? fully_equalized_audio = apply_equalization(equalized_audio, bands)
?? ?
??? write_wav(file_name_wav, rate, fully_equalized_audio.astype(np.int16))

# Main code
text_prompt = "... your text here ..."
for speaker_number in range(1, 10):? # Include speakers 1 to 9
??? speaker_id = f"v2/en_speaker_{speaker_number}"
??? text_segments = split_text_into_segments(text_prompt, max_segment_length=140)
??? audio_segments = generate_and_save_audio_segments(text_segments, speaker_id)
??? combined_audio, file_name_wav, file_name_mp3 = combine_and_save_audio(audio_segments, speaker_number=speaker_number)

??? apply_audio_processing(file_name_wav)
??? convert_wav_to_mp3(file_name_wav, file_name_mp3)

??? print(f"Audio saved to {file_name_wav} and {file_name_mp3} using {speaker_id}")
??? play_audio(combined_audio)

No alt text provided for this image — Debugging the export of the wav and mp3 for clipping. Will keep posting updated code.

So here's the deal. AI like this, is not just a tool; it's a partner in creativity now whether you like to accept it or not. It's a bridge between the past and the present, the classical and the contemporary. It's a symphony that's playing a new tune, a tune that's both timeless and revolutionary. It's only a matter of time before ChatGPT will run locally on the same device that you use to swipe right or left for the next shallow picture to show up. Stay thirsty my friends...

#StayThirstyMyFriends #ArtificialIntelligence #MusicRevolution #TechInnovation #SymphonyAI #CreativeTechnology #SmartphoneGenius #ClassicalMeetsModern #OpenSourceMagic #AIComposer #FutureOfSound

Eliza Rusu

B2B Marketing Specialist at BCR┃Certified Digital Marketing Professional┃Digital Marketing Enthusiast┃

1 年

Great! After all, music is some kind of coding

Md Rubel

Full-Stack Engineer | AI/ML Integration Specialist | Author

1 年

it's very important information for AI lover's

2 次回应

查看更多评论

要查看或添加评论，请登录

Jose P.的更多文章

My Project: Legion AI: I've invented the world's first multi-spawning-agent AI framework.

2024年9月24日

My Project: Legion AI: I've invented the world's first multi-spawning-agent AI framework.

Alright, let's get straight to it. Artificial General Intelligence (AGI) is the holy grail of AI development, right?…
Empowering AI: Neuromorphic Cognition, Blockchain Rewards, and the Future of Machine Intelligence

2024年9月21日

Empowering AI: Neuromorphic Cognition, Blockchain Rewards, and the Future of Machine Intelligence

Hey, imagine this: a world where AIs don’t just learn but get paid for learning. That’s the genius behind the…
The Claustrum - The key to AI emergent consciousness?

2024年1月24日

The Claustrum - The key to AI emergent consciousness?

Ever wondered what a student, scriptwriter, and aspiring tech innovator does in his free time? Besides juggling school…

10 条评论
My work with Six Flags Magic Mountain & the success it had.

2023年8月31日

My work with Six Flags Magic Mountain & the success it had.

Let me tell you about my unforgettable experience working with the incredible team at Doner Advertising in Southfield…
Sentient AI is here now.

2023年8月29日

Sentient AI is here now.

It is here, AI that simulates the human consciousness! Stephen L. Thaler, PhD, president and CEO of Imagination…
Tech jobs at Meta & OpenAI for 30 cents per hour! Sign up to train AI.

2023年8月12日

Tech jobs at Meta & OpenAI for 30 cents per hour! Sign up to train AI.

Hey there, fellow tech enthusiasts! Normally, I'm all jazzed up to chat about the latest and greatest in AI, from…

2 条评论
Forewarned. Artificial Intelligence Ethics Aren't Optional.

2023年7月23日

Forewarned. Artificial Intelligence Ethics Aren't Optional.

Alright folks, before we start I gotta tip my hat to Vilas Dhar, the big gun at the Patrick J. McGovern Foundation.
Sorry. Artificial Intelligence WILL eventually replace Showrunners/Writers/Actors.

2023年7月21日

Sorry. Artificial Intelligence WILL eventually replace Showrunners/Writers/Actors.

https://fablestudio.github.

2 条评论
AI Super Apps, Get Rewards from Brands You Love! For What You Do in Private!

2023年7月20日

AI Super Apps, Get Rewards from Brands You Love! For What You Do in Private!

Knock, knock. Who’s there? It's the future, and it's come dressed in Nike sneakers, tightening themselves while…
Tailoring Tantalizing Turndowns: Pitching for Film and Television

2023年7月16日

Tailoring Tantalizing Turndowns: Pitching for Film and Television

In the glimmering city of Hollywood, where writers' pens have been silenced by strikes and actors have downed their…

See all articles

Schubert's 'Unfinished' Symphony: How AI finished it on a smartphone! + Python Code!

Jose P.

AI Engineer | AGI Researcher | MarTech Solutions | Creative Producer (P.G.A.)

领英推荐

Jose P.的更多文章

社区洞察

其他会员也浏览了

Ambient Computing is Here

Tech Titans Clash in AI Chip Wars, Goliath-120b Stuns, Google's Sound Security Emerges

And that's when the choir people sent a nasty letter to the school board (and my parents)!

The engineering innovations of Deepseek - the role of the AI engineer will change dramatically

How are retro singer's voices being produced in AI?

IMO Weekly Highlights - 06172024

Tech Insights 2024 Week 52

Reliance Jio, Nvidia came together to develop AI language model for India

Beyond Imagination

Beyond Imagination: Symphony of the Future

领英推荐

Jose P.的更多文章

My Project: Legion AI: I've invented the world's first multi-spawning-agent AI framework.

Empowering AI: Neuromorphic Cognition, Blockchain Rewards, and the Future of Machine Intelligence

The Claustrum - The key to AI emergent consciousness?

My work with Six Flags Magic Mountain & the success it had.

Sentient AI is here now.

Tech jobs at Meta & OpenAI for 30 cents per hour! Sign up to train AI.

Forewarned. Artificial Intelligence Ethics Aren't Optional.

Sorry. Artificial Intelligence WILL eventually replace Showrunners/Writers/Actors.

AI Super Apps, Get Rewards from Brands You Love! For What You Do in Private!

Tailoring Tantalizing Turndowns: Pitching for Film and Television

社区洞察

其他会员也浏览了

Ambient Computing is Here

Tech Titans Clash in AI Chip Wars, Goliath-120b Stuns, Google's Sound Security Emerges

And that's when the choir people sent a nasty letter to the school board (and my parents)!

The engineering innovations of Deepseek - the role of the AI engineer will change dramatically

How are retro singer's voices being produced in AI?

IMO Weekly Highlights - 06172024

Tech Insights 2024 Week 52

Reliance Jio, Nvidia came together to develop AI language model for India

Beyond Imagination

Beyond Imagination: Symphony of the Future