The ChatGPT Moment for Music Has Arrived in 2024
Michael Spencer
A.I. Writer, researcher and curator - full-time Newsletter publication manager.
Hey Everyone,
I’m not even a huge music lover but I’ve been thinking a lot about Suno AI vs. Udio the past week. Udio is a generative artificial intelligence model that produces music based on prompts that has somewhat suddenly come to compete with Suno AI.
About a week ago Udio AI came out of stealth, and raised just $10M in seed funding from prominent investors and celebs including a16z, Instagram co-founder Mike Kreiger, and musicians will.i.am and Common, but how are they already so good? You can’t expect much more in 2024, or can you? You can read their launch thread on X here .
Subscribe to AI Supremacy to get my best deep dives for as low as $2 a week.
Udio AI Demo
Is Generative AI music a new consumer AI craze in the Summer of 2024?
Udio was created in December 2023 by a team of four former researchers for Google DeepMind, including Udio's CEO David Ding , Conor Durkan , Charlie Nash, Yaroslav Ganin, as well as Andrew Sanchez.
Before Udio came out I had asked Christopher Dalla Riva of Can’t Get Much Higher , to give us his take on Generative AI music, namely Suno AI. This is that story. If you are a music lover try out Chris’s Can’t Get Much Higher Newsletter.
Can’t Get Much Higher Newsletter - Music Trends
“If you’re looking for data-driven analyses about the musical trends of yesterday and today, then you’ve come to the right place.”
Suno’s AI One Man Band
The ChatGPT for music was supposed to be years away, but Suno’s sped up that timeline. While their technology is impressive, musicians are concerned about how it could remake their profession.
While reporting on music and artificial intelligence in the summer of 2022 , experts told Rolling Stone that despite advances by Google and OpenAI, it would be years before the ChatGPT equivalent arrived for music. Then Suno released its V3 model to the public just a few weeks ago. It felt like years zoomed by in a matter of days.
According to the press release , Suno’s V3 model can create two-minutes of “radio-quality music” from a text prompt in a matter of seconds. While that is indeed a brief summary of what Suno’s latest model does, you can’t really understand the power until you try it. Here is what was returned when I typed in “A wistful acoustic guitar song about being broken-hearted sung by a woman.”
“Echoes of Love”
Is this song going to take home a Grammy next year? No. But it is remarkably good for something generated in a few seconds. And it doesn’t even represent Suno’s full abilities. Rather than having a song generated from a string of text, Suno also has additional flexibility. You can choose if you want a song with or without lyrics. Should you choose to have lyrics, you can specify exactly what those lyrics should be, along with a title and musical style.
So, who is this tool for? According to Suno’s website, everyone: “Whether you're a shower singer or a charting artist, we break barriers between you and the song you dream of making. No instrument needed, just imagination.” And that description is not hyperbolic. In Rolling Stone’s profile of the company , Mikey Shulman, one of the co-founders, said that he “envisions a billion people worldwide paying 10 bucks a month to create songs with Suno.”
Would it be possible to get that many people using Suno? If you just focused on potential artists, probably not. Making music has never been more accessible than it is right now. In fact, according to Luminate , we now see over 100k songs uploaded to streaming services everyday! That’s more songs in a matter of weeks than were released during full decades of the 20th century. Even if you could increase that number by an order of magnitude, you would probably struggle to retain users. Artists can quickly get discouraged given that most songs gets very few plays .
领英推荐
Furthermore, there are more robust creator options. Take BandLab as an example. BandLab provides AI tools, mobile recording software, a social network, distribution, and so much more to millions of users . And many of these features are on their ad supported tier. Unless Suno builds out more features, they will never compete in these independent artist spaces.?
So, if we continue to take their “billion people” statement seriously, where would those users come from? When Brian Hiatt spoke with Mikey Shulman on the Rolling Stone Music Now podcast , Shulman said that he imagines those users coming from spaces that aren’t using music now: “Every single business … [would have] a media and music department behind them. The deli on the corner is going to have a jingle … That would not be possible today. They would not have the means to go hire someone to write them a jingle … That is very powerful.”
I do agree that this would be quite powerful. And there are some historical parallels to what he is getting at. In 1850, if you wanted music playing in your business, you had to hire musicians to perform. By 1950, you could just play records. Of course, the advent of recording did displace the work of some performers, but it also allowed the music industry to expand greatly. I don’t think jingle-making for every deli on Earth would be a scalable, profitable business, though. That feels more like a novelty than anything else.
I think the more realistic end user for Suno are companies that currently spend a good deal of money on music right now. Think companies operating in advertising or film spaces. Right now, it’s probably expensive to get music made for a commercial or a movie. It would be much cheaper to have Suno do it for a flat fee each month.
Regardless of if one billion or one hundred people end up paying to use Suno, I can promise you one thing. They will likely face an onslaught of lawsuits. And they are preparing for this. According to another Rolling Stone article on Suno’s newest model, they “declined to reveal details of its training data, though one of its main investors, Antonio Rodriguez … is prepared for a potential lawsuit from labels and publishers.” Billboard later confirmed that Suno has no licensing agreements with major rights holders. That’s a problem. When you play around with Suno, it becomes clear that they probably trained the model on mountains of copyrighted material without permission.
To be clear, Suno has some guardrails built in to prevent users from infringing on copyrights they don’t own. For example, if you enter a notable artist’s name into the text prompt (e.g., “Matchbox 20 song”), it won’t return anything. Also, if you try to enter custom lyrics, they warn you to only enter lyrics that you hold the copyright to. But these guardrails are mostly for optics.
While the prompt “Matchbox 20 song” won’t return anything, “Matchbox 21 song” will. Suno labels the output for that prompt “alternative rock.” Nothing about the prompt “Matchbox 21 song” suggests that I am looking for alternative rock. But if Suno’s system were aware that the group Matchbox 20 was an alternative rock band from the 1990s, then it would be able to suss out what I was getting at with “Matchbox 21 song.”
“Radiance in the Shadows”
As another example that Suno is not only likely trained on copyrighted material but that it won’t prevent you from generating copyright infringing content, look to their lyric prompt. Again, when you go to enter custom lyrics, they tell you to not enter anything you don’t own the copyright, but that didn’t stop me from entering the lyrics to Bob Dylan’s “Blowing in the Wind” and getting back a folksy song.
“Blowin’ in the Wind”
Suno is an impressive piece of technology. And it’s clear that this technology is only going to get more impressive. Other companies like Udio and Stability have begun to release models on par with Suno. You might think that in order to build models this impressive that you need to infringe on copyrighted material at scale. Udio, like Suno, also has guardrails in place to prevent you from making music that infringes on other people’s copyrights. Again, these guardrails are flimsy. Udio will give you a problem if you prompt it for a “Bruce Springsteen song”, but it’s also not very difficult to generate a song that sounds very similar to Bruce Springsteen.?
Nevertheless, Stability has shown us that it is possible to train these models in a way that has a much higher standard for respecting copyrights. Here is part of the statement that accompanies their latest model:
Like the 1.0 model, 2.0 is trained on data from AudioSparx consisting of over 800,000 audio files containing music, sound effects, and single-instrument stems, as well as corresponding text metadata. All of AudioSparx’s artists were given the option to 'opt out' of the Stable Audio model training. To protect creator copyrights, for audio uploads, we partner with Audible Magic to utilize their content recognition (ACR) technology to power real-time content matching to prevent copyright infringement.
As someone who has spent 15 years making music, playing around with Suno is both mind-bending and frightening. My hope is that musical artificial intelligence technology not only makes musical creation more accessible in the same ways that the drum machine and digital audio workstation did, but that it does so in a way that duly compensates artists for their work. There is a path forward to make that happen. Until that is the case, I’m not sure I can trust Suno’s Mikey Shulman in his saying, “The use cases that we think are long lasting and enduring are the legal, moral, and ethical ones.”
Suno vs. Udio
You decide - I feel like these two consumer AI apps will be fairly popular n the second half of 2024.
Let’s Discuss
Leave a comment
You could also just go make music:
AI Experts - Join our Network of AI Speakers, Consultants and AI Solution Providers. Message me for info.
7 个月Exciting times ahead in the world of Consumer AI apps
Sales Development Representative at GrowthZone AMS
7 个月AI really opens up alot of opportunities for aspiring songwriters and producers and also for casual/fly-by-night songwriters who might write a couple of songs for the fun of it and unknowingly capture lightning in a bottle with an instant classic. I have explored the topic of voice cloning and the need for a new music platform that handles creation, distribution and all legal facets (negotiating with rights holders - see link below) in one place but have recently begun to rethink the strategy. Perhaps it makes more sense to just create ten different avatars in every music category, each with their own synthetic voice/sound/vibe and then allow songwriters to select the avatar/voice they want to use for their original track. Once a final mix is done, the song could be distributed to Spotify, TikTok, etc and revenue would be split between the songwriter artist and the company that owned the rights to the avatar/original synth voice. It certainly would simplify production, branding and legal https://medium.com/the-cake-articles/the-next-spotify-d9a8197eb11f