[TTS]add diffusion module for training diffsinger #2832

HighCWu · 2023-01-13T09:56:11Z

PR types

New features

PR changes

Models

Describe

Add diffusion module for training diffsinger.

Use normalized mel-spec as GaussianDiffusion forward input, then you will get noised normalized mel-spec and the target noise added in. Compute l1 loss or mse loss of the noised mel-spec and the target noise with non-padding weights for denoising model training.

Simple example for use the modules:

>>> import paddle
>>> import paddle.nn.functional as F
>>> from tqdm import tqdm
>>> 
>>> denoiser = WaveNetDenoiser()
>>> diffusion = GaussianDiffusion(denoiser, num_train_timesteps=1000, num_max_timesteps=100)
>>> x = paddle.ones([4, 80, 192]) # [B, mel_ch, T] # real mel input
>>> c = paddle.randn([4, 256, 192]) # [B, fs2_encoder_out_ch, T] # fastspeech2 encoder output
>>> loss = F.mse_loss(*diffusion(x, c))
>>> loss.backward()
>>> print('MSE Loss:', loss.item())
MSE Loss: 1.6669728755950928 
>>> def create_progress_callback():
>>>     pbar = None
>>>     def callback(index, timestep, num_timesteps, sample):
>>>         nonlocal pbar
>>>         if pbar is None:
>>>             pbar = tqdm(total=num_timesteps-index)
>>>         pbar.update()
>>> 
>>>     return callback
>>> 
>>> # ds=1000, K_step=60, scheduler=ddpm, from aux fs2 mel output
>>> ds = 1000
>>> infer_steps = 1000
>>> K_step = 60
>>> scheduler_type = 'ddpm'
>>> x_in = x
>>> diffusion = GaussianDiffusion(denoiser, num_train_timesteps=ds, num_max_timesteps=K_step)
>>> with paddle.no_grad():
>>>     sample = diffusion.inference(
>>>         paddle.randn(x.shape), c, x, 
>>>         num_inference_steps=infer_steps,
>>>         scheduler_type=scheduler_type,
>>>         callback=create_progress_callback())
100%|█████| 60/60 [00:03<00:00, 18.36it/s] 
>>> 
>>> # ds=100, K_step=100, scheduler=ddpm, from gaussian noise
>>> ds = 100
>>> infer_steps = 100
>>> K_step = 100
>>> scheduler_type = 'ddpm'
>>> x_in = None
>>> diffusion = GaussianDiffusion(denoiser, num_train_timesteps=ds, num_max_timesteps=K_step)
>>> with paddle.no_grad():
>>>     sample = diffusion.inference(
>>>         paddle.randn(x.shape), c, x_in, 
>>>         num_inference_steps=infer_steps,
>>>         scheduler_type=scheduler_type,
>>>         callback=create_progress_callback())
100%|█████| 100/100 [00:05<00:00, 18.29it/s] 
>>> 
>>> # ds=1000, K_step=1000, scheduler=pndm, infer_step=25, from gaussian noise
>>> ds = 1000
>>> infer_steps = 25
>>> K_step = 1000
>>> scheduler_type = 'pndm'
>>> x_in = None
>>> diffusion = GaussianDiffusion(denoiser, num_train_timesteps=ds, num_max_timesteps=K_step)
>>> with paddle.no_grad():
>>>     sample = diffusion.inference(
>>>         paddle.randn(x.shape), c, None, 
>>>         num_inference_steps=infer_steps,
>>>         scheduler_type=scheduler_type,
>>>         callback=create_progress_callback())
100%|█████| 25/25 [00:01<00:00, 19.75it/s]
>>> 
>>> # ds=1000, K_step=100, scheduler=pndm, infer_step=50, from aux fs2 mel output
>>> ds = 1000
>>> infer_steps = 50
>>> K_step = 100
>>> scheduler_type = 'pndm'
>>> x_in = x
>>> diffusion = GaussianDiffusion(denoiser, num_train_timesteps=ds, num_max_timesteps=K_step)
>>> with paddle.no_grad():
>>>     sample = diffusion.inference(
>>>         paddle.randn(x.shape), c, x, 
>>>         num_inference_steps=infer_steps,
>>>         scheduler_type=scheduler_type,
>>>         callback=create_progress_callback())
100%|█████| 5/5 [00:00<00:00, 23.80it/s]

The method to compute loss and training denoising model can refer to train_text_to_image.py.
The inference process is refer to pipeline_stable_diffusion_img2img.py.
To implement GaussianDiffusionShallow in diffsinger, you should modify the num_max_timesteps to the K_step during training to skip time steps that do not require training.

yt605155624

LGTM

add diffusion module for training diffsinger

5ae7d69

mergify bot added T2S Documentation Installation labels Jan 13, 2023

yt605155624 added this to the r1.4.0 milestone Jan 13, 2023

yt605155624 changed the title ~~add diffusion module for training diffsinger~~ [TTS]add diffusion module for training diffsinger Jan 13, 2023

yt605155624 added the contributor label Jan 13, 2023

yt605155624 requested a review from lym0302 January 13, 2023 10:21

yt605155624 mentioned this pull request Jan 13, 2023

[TTS] DiffSinger #2821

Closed

yt605155624 approved these changes Jan 13, 2023

View reviewed changes

yt605155624 merged commit 57b9d4b into PaddlePaddle:develop Jan 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TTS]add diffusion module for training diffsinger #2832

[TTS]add diffusion module for training diffsinger #2832

HighCWu commented Jan 13, 2023

yt605155624 left a comment

[TTS]add diffusion module for training diffsinger #2832

[TTS]add diffusion module for training diffsinger #2832

Conversation

HighCWu commented Jan 13, 2023

PR types

PR changes

Describe

yt605155624 left a comment

Choose a reason for hiding this comment