Paper Note:Make-An-Audio

Paper Note:Make-An-Audio

The model use STFT as the inter feature.A HIFIGAN is used as the vocoder to convert STFT to wav. CFG is used.

STFT is convert to z by an audio encoder.KL loss and GAN loss is used to optimize the audioencoder.The training data can be audios that do not have text label.This leverages the ability of self-supervised training.

Due to insufficient data, They thought of a good way: to construct new data by superimposing and splicing different samples.

No alt text provided for this image

要查看或添加评论,请登录

Xingyu Ma的更多文章

  • Paper Note: RAVE

    Paper Note: RAVE

    Startpoints Improved the way vae models audio signals. Main innovation points: Two-stage training: representation…

  • Is Chatgpt only a Big Model?(1)

    Is Chatgpt only a Big Model?(1)

    StartPoint Chatgpt has gained a lot of attetion. But most nlper thinks that reinforcement learning(RL) is only for Big…

  • Paper Notes: ACE-VC

    Paper Notes: ACE-VC

    ACE-VC is to disentangling the speech into linguistic content, speaker characteristics, and speaking style with self…

  • Accent Conversion:NON-PARALLEL ACCENT CONVERSION USING PSEUDO SIAMESE DISENTANGLEMENT NETWORK

    Accent Conversion:NON-PARALLEL ACCENT CONVERSION USING PSEUDO SIAMESE DISENTANGLEMENT NETWORK

    Accent is a local feature.Distengling the component of speech and reconstructing the speech is the methods.

  • Music prompt:VALL-E

    Music prompt:VALL-E

    StartPoint VALL-E likes a speech language model that predicts current audio code from past audio codes.This has a…

  • Music Prompt: MusicLM(1)

    Music Prompt: MusicLM(1)

    Startpoints Nearly, Text prompt has been a trend in cross-modal generative model. Taking advantage of the brevity…

  • Image Generating: Stable Diffsusion in different view behind intuition(1)

    Image Generating: Stable Diffsusion in different view behind intuition(1)

    Let see stable diffusion in style transfer view. Cross Attention Text encoder can be regarded as a style encoder.

  • Paper Notes: DDSP

    Paper Notes: DDSP

    Start Points Tranditional DSP algorithm can produce high quality instrument sounds.As there is a lot of paramters to be…

  • Paper Notes: FreeVC

    Paper Notes: FreeVC

    Issues Text-based VC models need labeled data.Text-free approaches has lots of defects.

社区洞察

其他会员也浏览了