How to compose an entire music album with Deep Neural Networks
Dr. Tristan Behrens
AI Engineer | Deep Learning | Large Language Models | Agents | Computational Music | Art | PhD in Computer Science
I have been fascinated by Artificial Intelligence for quite some time now. That might be one of the reasons, that I decided to do my Ph.D. in that field. AI has so many powerful tools that definitely change the technological landscape and ultimately change how we humans work.
It is true, that AI has a lot of use-cases. And today I tell you, that my favorite use-case is Music Generation with Deep Neural Networks.
Music Generation is a great example of AI's potential for augmenting human productivity and creativity. AI is changing the way how we work in creative fields. Automation is key.
Recently I have used AI to compose an entire album of electronic music. Let me tell you a little about the how behind that story.
Deep Learning with 7000 Heavy Metal songs
The whole endeavor started with a huge degree of scientific zeal. I got excited about the paper "MMM : Exploring Conditional Multi-Track Music Generation with the Transformer", but quickly realized that there was no publicly available implementation. So I spent my entire extended easter weekend implementing the paper. I used Hugging Face Transformers for the whole Deep Learning part and the Johann Sebastian Bach Chorales dataset for the music.
After that first step, I was convinced that the approach is solid. Although baroque chorales are a lot of fun, I wanted a bigger challenge. Fortunately, I had access to 7000 Heavy Metal Songs as MIDI files. This made things slightly more complicated. A converter that maps all the music to a format that can be used by Deep Learning needed to be implemented. I decided to restrict the instruments to drums, bass, guitar, and piano. Piano stood for all instruments that were neither drums, bass, nor guitar.
It took a couple of iterations to find a good Neural Network with a low enough loss and a healthy degree of overfitting. The last Transformer that I trained, finally convinced me.
Music Composed with AI, not by AI
Well, having a trained Neural Network does not give you the power to compose interesting music. Yet. It has to be put in production.
At this point, I would love to make clear that this whole work was about composing with AI, and not about composing by AI. In other words, I wanted to create a tool that supports a composer in their creative work. I did not want to create a tool that composes entire pieces of music without human interaction. A tool that inspires and enables, rather than a tool that replaces human work.
I build such a tool around the mentioned Neural Network. The tool allowed me to conditionally generate small pieces of music by repeatedly generating drums, bass, guitar, and piano in arbitrary order. Once I found a few pieces that worked extremely well (most of them worked well), I imported them into my favorite DAW Logic Pro. There I arranged the pieces, selected instruments, and added some very few notes here and there.
To my huge surprise, it did not take that much time to create an entire song. Actually, it did not take the entire 50 days between Easter and Pentecost for an entire album. Including the time for implementing data preprocessing and Deep Learning.
Finding the right song names
Of course, when preparing to publish an album you have to take good care of the song names. Why not using Deep Learning for that, too?
In Language Generation, Deep Neural Networks would consider a sequence of words and then predict the next one. Usually with a good degree of randomness. This way, an entire document can be generated. This approach is called autoregression. Autoregression has not really been convincing up until recently. The generated texts lacked consistency. But with the rise of Self-Attention Transformers, the metaphorical winds have changed. It actually works!
So, I wrote a short description of the album itself, finished with the words "Here is the tracklist:", and let a Neural Network predict the next words, which happened to be perfectly fine song names.
Considering Lyrics
I briefly considered using Deep Learning for lyrics generation too but in the end decided against it. Do not get me wrong, Transformers can also be used to compose lyrics, and this works very well. The crucial part is that I decided to wait until vocals generation - speech synthesis conditioned by melody, articulation and text - is a thing. I guess one of my future albums will have vocals, too.
The Album
I named the music project "Hexagon Machine". Well, I used AI to come up with that name as well. In the end, this experiment is about man-machine collaboration. So why not have the machine a word in that matter, too? The album is called "Robot Uprising MMXXI" and will be available soon. You can find Hexagon Machine on Instagram.
Thanks a lot for reading!
Head of Artificial Intelligence at PreciTaste
3 年Very cool! I really like the sound. I was trying something similar using RNNs a few years back but was distracted with other projects and never got to actually finish it. I’m a free time composer myself and i think this kind of AI as a tool Is really going to revolutionize how artists work in the future. Jus like DAWs did it for the electronic music scene.?I would be interested to try your implementation.?Is the code going to be available?