Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TTS]add diffusion module for training diffsinger #2832

Merged
merged 1 commit into from
Jan 13, 2023

Conversation

HighCWu
Copy link
Contributor

@HighCWu HighCWu commented Jan 13, 2023

PR types

New features

PR changes

Models

Describe

Add diffusion module for training diffsinger.

Use normalized mel-spec as GaussianDiffusion forward input, then you will get noised normalized mel-spec and the target noise added in. Compute l1 loss or mse loss of the noised mel-spec and the target noise with non-padding weights for denoising model training.

Simple example for use the modules:

>>> import paddle
>>> import paddle.nn.functional as F
>>> from tqdm import tqdm
>>> 
>>> denoiser = WaveNetDenoiser()
>>> diffusion = GaussianDiffusion(denoiser, num_train_timesteps=1000, num_max_timesteps=100)
>>> x = paddle.ones([4, 80, 192]) # [B, mel_ch, T] # real mel input
>>> c = paddle.randn([4, 256, 192]) # [B, fs2_encoder_out_ch, T] # fastspeech2 encoder output
>>> loss = F.mse_loss(*diffusion(x, c))
>>> loss.backward()
>>> print('MSE Loss:', loss.item())
MSE Loss: 1.6669728755950928 
>>> def create_progress_callback():
>>>     pbar = None
>>>     def callback(index, timestep, num_timesteps, sample):
>>>         nonlocal pbar
>>>         if pbar is None:
>>>             pbar = tqdm(total=num_timesteps-index)
>>>         pbar.update()
>>> 
>>>     return callback
>>> 
>>> # ds=1000, K_step=60, scheduler=ddpm, from aux fs2 mel output
>>> ds = 1000
>>> infer_steps = 1000
>>> K_step = 60
>>> scheduler_type = 'ddpm'
>>> x_in = x
>>> diffusion = GaussianDiffusion(denoiser, num_train_timesteps=ds, num_max_timesteps=K_step)
>>> with paddle.no_grad():
>>>     sample = diffusion.inference(
>>>         paddle.randn(x.shape), c, x, 
>>>         num_inference_steps=infer_steps,
>>>         scheduler_type=scheduler_type,
>>>         callback=create_progress_callback())
100%|█████| 60/60 [00:03<00:00, 18.36it/s] 
>>> 
>>> # ds=100, K_step=100, scheduler=ddpm, from gaussian noise
>>> ds = 100
>>> infer_steps = 100
>>> K_step = 100
>>> scheduler_type = 'ddpm'
>>> x_in = None
>>> diffusion = GaussianDiffusion(denoiser, num_train_timesteps=ds, num_max_timesteps=K_step)
>>> with paddle.no_grad():
>>>     sample = diffusion.inference(
>>>         paddle.randn(x.shape), c, x_in, 
>>>         num_inference_steps=infer_steps,
>>>         scheduler_type=scheduler_type,
>>>         callback=create_progress_callback())
100%|█████| 100/100 [00:05<00:00, 18.29it/s] 
>>> 
>>> # ds=1000, K_step=1000, scheduler=pndm, infer_step=25, from gaussian noise
>>> ds = 1000
>>> infer_steps = 25
>>> K_step = 1000
>>> scheduler_type = 'pndm'
>>> x_in = None
>>> diffusion = GaussianDiffusion(denoiser, num_train_timesteps=ds, num_max_timesteps=K_step)
>>> with paddle.no_grad():
>>>     sample = diffusion.inference(
>>>         paddle.randn(x.shape), c, None, 
>>>         num_inference_steps=infer_steps,
>>>         scheduler_type=scheduler_type,
>>>         callback=create_progress_callback())
100%|█████| 25/25 [00:01<00:00, 19.75it/s]
>>> 
>>> # ds=1000, K_step=100, scheduler=pndm, infer_step=50, from aux fs2 mel output
>>> ds = 1000
>>> infer_steps = 50
>>> K_step = 100
>>> scheduler_type = 'pndm'
>>> x_in = x
>>> diffusion = GaussianDiffusion(denoiser, num_train_timesteps=ds, num_max_timesteps=K_step)
>>> with paddle.no_grad():
>>>     sample = diffusion.inference(
>>>         paddle.randn(x.shape), c, x, 
>>>         num_inference_steps=infer_steps,
>>>         scheduler_type=scheduler_type,
>>>         callback=create_progress_callback())
100%|█████| 5/5 [00:00<00:00, 23.80it/s]

The method to compute loss and training denoising model can refer to train_text_to_image.py.
The inference process is refer to pipeline_stable_diffusion_img2img.py.
To implement GaussianDiffusionShallow in diffsinger, you should modify the num_max_timesteps to the K_step during training to skip time steps that do not require training.

@yt605155624 yt605155624 added this to the r1.4.0 milestone Jan 13, 2023
@yt605155624 yt605155624 changed the title add diffusion module for training diffsinger [TTS]add diffusion module for training diffsinger Jan 13, 2023
@yt605155624 yt605155624 mentioned this pull request Jan 13, 2023
Copy link
Collaborator

@yt605155624 yt605155624 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yt605155624 yt605155624 merged commit 57b9d4b into PaddlePaddle:develop Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants