I Want to be a Vocaloid
Wow. Graphic design really is my passion. Take that Leon

I Want to be a Vocaloid

Hear me out. I just want to make a little guy on my computer sing when I do. How hard can it be?

The plan is simple: find a way to translate my voice to midi and make a 3D model to lip-sync to it. All in real time (preferably). Okay, there are a couple more steps than that, but the general gist can be summed up with the following steps:

  1. Get audio input from mic to computer
  2. Get software to transfer audio to midi
  3. Feed midi to Custom sound font (voice)
  4. Pump output into Automatic 3d animation/VTube software
  5. Profit


The first step was to find a way to transfer audio into midi. That was very easy to do on its own, but things got a bit more complicated when I tried to use my voice. Here are brief haikus to summarize 3 hours of pain spent trying to find an audio app that would work for this project:

Dubler - live Voice to Midi software - $219

There’s no free trial

Can return before two weeks

I don’t trust myself

Melodyne - Software for analyzing audio file and creating midi (among many other things) - $700

The most bourgeois bitch

Unintuitive and Plain

Free trial was nice

Logic’s Built in Software - Flex Pitch editor and midi transfer - in Logic Pro X

Is Already there

Just press like twelve small buttons

And just like that… ew

A2M - live voice to midi converter with standalone and audio unit options - Free app with upgrade

She’s Not like most girls

She like actually works

Thank the friggen lord

A2M, the final and simplest of the audio products I tried, actually seemed to work. It took a bit of tinkering, but I was able to create a setup that got me to step 2 of Operation Vocaloid.

Ignore my messy bed and desktop

A yeti-microphone was plugged into an iPad running A2M and Logic remote which allowed it to be plugged into my computer to be used as a midi input device. In order to prevent any sort of feedback, I had my headphones plugged into my computer.?


For step three, I needed to find the perfect sound font to use for my virtual self. For those who don’t know (congratulations on not being a nerd by the way), a sound font is a collection of sounds that can be fed into a sampler in order to create a custom midi instrument. I COULD have painstakingly recorded and created my own font, but I instead opted to allow someone more masochistic than me to do that work. After just a couple googles, I found the perfect voice: K.K. Slider from Animal Crossing.

With the sound font dropped into my Logic file, I was ready to start making some magic. The operative word here is WAS because… well… I hate to address the elephant in the refrigerator, but not only did he step in my Ben and Jerry’s, but also trying to make any audio, let alone voice, into MIDI is HARD.

This is because even though technology is worlds ahead of where it once was, machines are FAR behind our ears in terms of audio parsing and evaluation. Our ears have wonderful abilities to do things like separate music from background noise, adjust for bad intonation, find the target note of vibrato, and so SO much more. Machines are still working on the first one; as far as my yeti mic is concerned, me stubbing my toe on the table is just as important as me singing a high C. Because of its shortcomings, machine translated audio often ends up a bit… messy. Between ghost notes and unwarranted trills to random background noise bleed and constant choppiness, there is yet to be an audio software able to cleanly and accurately get a midi sample from a voice without a bit of human-ear powered post-processing.?

To be honest I had expected this, and as such it was time to go back to the drawing board and reassess my goals. Maybe live transfer from audio to MIDI wasn’t feasible, but I was still more than capable of creating melodies. In fact, I already had a couple of melodies premade and ready for lip-syncing. All I needed to do was flip out the melody instrument for my sweet sweet K.K. Slider. Actually, K. K. Slider is probably copyrighted, so let’s come up with a different guitar-playing white Jack Russel Terrier.

I present to you: J.J. Reuben

THIS is why I love AI

Now that we have our music and our ai generated nightmare fuel, why don’t we do some animating? I might not have live music, but maybe we instead make this a project about syncing audio playback to an animation. I’ve always been fascinated by lip dubbing in animations, so why not use this as an excuse to explore that. Time for another 3 hour deep dive and another set of haikus to sum up my trials and tribulations.

Blender - open source 3d modeling and animation software - FREE

I’ve been here for hours

My algorithm is wrecked

Youtube can’t save me

Live 2d Cubism - Live 2D expression software for primary use in vTube - $100/yr

All anime girls

I think I have a virus

It’s in Japanese

Procreate Dreams - iPad Animation software from creators of Procreate - $20

What I would have used

She was only twenty dollars

My card wasn’t close

I fear we must address the other elephant in the refrigerator (how did they fit two in there?): I know nothing about animation. Unlike music software where I almost have, you know, almost a whole degree’s worth of experience, I have literally no experience with any drawing or animation app other than basic Procreate. Lip syncing is one of the final things you learn as an animator, and it is an incredibly nuanced and complicated art that I was crazy to think I could learn in only an afternoon.

Screw it. Improvise. Adapt. Overcome. I am gonna make it work for me. My goal now is just to make an audio in logic, make a video in plain ‘ol procreate, and get the two to sync together. It’s simple, but it is at least a step in the right direction.

Get ready world because I present to you: J.J. Reuben’s new hit single “At Least I Tried”


Turns out, making your goals realistic to your current circumstances is actually very rewarding and fun. I know this is something very far from my original intent, but hear me out: I actually saw a project through to the end and made something. Sure this 30 second animation is… special to say the least, but in it are all the foundational steps for my eventual dreams of that perfect Vocaloid/VTuber/little-guy. I also can’t stress enough the importance that the time I spent working on this animation were actually FUN. I know that may be a bad word for you, but after 6 hours spent stressing over making the perfect project, 30 minutes of fun for something a little less serious seemed was like a cold Wendy’s chocolate frosty with some crispy fries dipped in it.?

This whole ordeal reinforced in me is that It is OKAY to readjust your goals to something more realistic. Almost every project begins with a grand idea for something that could only exist for our perfect self in a perfect world. News flash: we are flawed caffeinated space monkeys who are using flawed little metal electron boxes to do our bidding–perfect isn’t possible.

I feel that this is especially apt as we are beginning the second week of the New Year. This is probably the time in which people are realizing that the New Year’s resolutions they set for themselves may have been a bit too idealistic. Like my original plan for real-time MIDI/VTuber singing, people often set their New Year’s resolutions with thoughts of the perfect version of themselves completing it and not themselves. When they finally come face to face with that Elephant in their refrigerator stepping on their Halo Top Ice Cream (the elephant’s resolution was to go on a diet), they shut the door and give up on ever opening it again. This results in nothing but sad-frozen elephants and regret. You’re not going to become the perfect version of yourself just because a ball dropped–Most men have had two drop and look where we are–you will, however, never become a better version of yourself if you are too weary of taking the first step. So for this year, take the first step. It can be as small as saying thank you more-often to customer service workers. If it is a step in the right direction, it is going to be worth it.?

As possibly-Thomas Edison and equally possibly-Man in basement once said in a manner that warranted recreation on google, “I didn’t fail 1000 times. The light bulb was just an invention with 1000 steps.”

I will say I am thankful The monstrosity I invented today didn’t take 1000 steps. In fact, I am able to sum it up into a nice and easy 12 step program for you to enjoy and maybe try out for yourself. Now just like any good recipe, you are more than welcome to modify, tinker, and even Gasp go with something premade. Just so long as you are making something.

I leave you with Nate's 12 step recipe for creating your own singing monstrosity:

  1. Spend many hours trying to sing midi and give up
  2. Spend many more hours trying to animate in 3d or 2d and give up?
  3. Find an old Logic Pro Project with a melody written
  4. Replace the melody with a KK Slider sound font you found on the internet ()
  5. Export the file to be used later
  6. Open your ai generated dog in Pro-create (oh yea generate an ai dog)
  7. Use the lasso tool to put the mouth on another layer
  8. Start a screen recording and your music
  9. Use the selected mouth on the new layer and move it in time to the music
  10. Export the video to Final Cut Pro or iMovie or your editor of choice
  11. Sync the video with the audio
  12. Export and share your cursed creation with the world

要查看或添加评论,请登录

Nate Petsche的更多文章

社区洞察

其他会员也浏览了