登录查看更多内容

Paper Note:Make-An-Audio

Xingyu Ma

Music Generation Algorithm/ AIGC/AGI

发布日期: 2023年4月5日

The model use STFT as the inter feature.A HIFIGAN is used as the vocoder to convert STFT to wav. CFG is used.

STFT is convert to z by an audio encoder.KL loss and GAN loss is used to optimize the audioencoder.The training data can be audios that do not have text label.This leverages the ability of self-supervised training.

Due to insufficient data, They thought of a good way: to construct new data by superimposing and splicing different samples.

要查看或添加评论，请登录

Xingyu Ma的更多文章

Paper Note: RAVE

2023年3月25日

Paper Note: RAVE

Startpoints Improved the way vae models audio signals. Main innovation points: Two-stage training: representation…
Is Chatgpt only a Big Model?(1)

2023年3月12日

Is Chatgpt only a Big Model?(1)

StartPoint Chatgpt has gained a lot of attetion. But most nlper thinks that reinforcement learning(RL) is only for Big…
Paper Notes: ACE-VC

2023年3月4日

Paper Notes: ACE-VC

ACE-VC is to disentangling the speech into linguistic content, speaker characteristics, and speaking style with self…
Accent Conversion:NON-PARALLEL ACCENT CONVERSION USING PSEUDO SIAMESE DISENTANGLEMENT NETWORK

2023年2月26日

Accent Conversion:NON-PARALLEL ACCENT CONVERSION USING PSEUDO SIAMESE DISENTANGLEMENT NETWORK

Accent is a local feature.Distengling the component of speech and reconstructing the speech is the methods.
Music prompt:VALL-E

2023年2月21日

Music prompt:VALL-E

StartPoint VALL-E likes a speech language model that predicts current audio code from past audio codes.This has a…
Music Prompt: MusicLM(1)

2023年2月16日

Music Prompt: MusicLM(1)

Startpoints Nearly, Text prompt has been a trend in cross-modal generative model. Taking advantage of the brevity…
Image Generating: Stable Diffsusion in different view behind intuition(1)

2023年2月13日

Image Generating: Stable Diffsusion in different view behind intuition(1)

Let see stable diffusion in style transfer view. Cross Attention Text encoder can be regarded as a style encoder.
Paper Notes: DDSP

2022年12月8日

Paper Notes: DDSP

Start Points Tranditional DSP algorithm can produce high quality instrument sounds.As there is a lot of paramters to be…
Paper Notes: FreeVC

2022年11月29日

Paper Notes: FreeVC

Issues Text-based VC models need labeled data.Text-free approaches has lots of defects.

See all articles

Paper Note:Make-An-Audio

Xingyu Ma

Music Generation Algorithm/ AIGC/AGI

Xingyu Ma的更多文章

社区洞察

其他会员也浏览了

Product Spotlight

HDMI Demystified: Understanding the numbers

COMING!

How a Relatively Inexpensive Chinese Integrated Amplifier Changed My Reference System

What the playout engineer said...

Cambridge Car Audio: Achieving the unachievable.

A Better Phone Interface w/ Ad Board

Kt Axial type Aluminum Electrolytic Capacitors for Audio products

Passive or Active speakers?

Need a Professional, Portable, 2-Channel Dante to Analog Audio Interface?

Xingyu Ma的更多文章

Paper Note: RAVE

Is Chatgpt only a Big Model?(1)

Paper Notes: ACE-VC

Accent Conversion:NON-PARALLEL ACCENT CONVERSION USING PSEUDO SIAMESE DISENTANGLEMENT NETWORK

Music prompt:VALL-E

Music Prompt: MusicLM(1)

Image Generating: Stable Diffsusion in different view behind intuition(1)

Paper Notes: DDSP

Paper Notes: FreeVC

社区洞察

其他会员也浏览了

Product Spotlight

HDMI Demystified: Understanding the numbers

COMING!

How a Relatively Inexpensive Chinese Integrated Amplifier Changed My Reference System

What the playout engineer said...

Cambridge Car Audio: Achieving the unachievable.

A Better Phone Interface w/ Ad Board

Kt Axial type Aluminum Electrolytic Capacitors for Audio products

Passive or Active speakers?

Need a Professional, Portable, 2-Channel Dante to Analog Audio Interface?