How to replace PL-BERT with XPhoneBERT? #28

bharathraj-v · 2023-11-14T18:45:41Z

Hi,

I'm looking to generate Hindi audio but it was mentioned that PL-BERT doesn't work well with other languages and I either need to train a different PL-BERT or replace the module with XPhoneBERT.

I'm having trouble understanding how I could go about replacing the module with XPhoneBERT. The repository XPhoneBERT describes using the model through transformers but I'm unsure how I can apply that here and this issue thread suggests that the pre-trained model is not publicized so, how do I go about replacing PL-BERT with XPhoneBERT here?

Thanks!

yl4579 · 2023-11-15T02:08:42Z

Unfortunately this is not a straightforward replacement because the phoneimzer between PL-BERT and XPhoneBERT is quite different. You will have to re-train the text aligner (ASR) with the XPhoneBERT phonemizer and also prepare your data in that format, then you can replace PL-BERT with XPhoneBERT.

yl4579 · 2023-11-15T05:28:41Z

The model is publicly available here: https://huggingface.co/vinai/xphonebert-base

cmp-nct · 2023-11-20T00:14:08Z

The readme made it sound like a drop in replacement ;-)

@yl4579
It would be nice to get a few more steps, given many people have never trained any audio type models.
It's all a bit overwhelming

Here is a new utils.py for the xphonebert that acts like the previous utils.py

import os
from transformers import AutoConfig, AutoModelForMaskedLM


class CustomXPhoneBERT(AutoModelForMaskedLM):
    def forward(self, *args, **kwargs):
        # Call the original forward method
        outputs = super().forward(*args, **kwargs)

        # Only return the last_hidden_state
        return outputs.last_hidden_state


def load_xbert(model_name_or_path):
    # Load the configuration for 'xphonebert-base'
    config = AutoConfig.from_pretrained(model_name_or_path)

    # Initialize the custom XPhoneBERT model using the configuration
    xbert = CustomXPhoneBERT.from_pretrained(model_name_or_path, config=config)

    # Return the custom model
    return xbert

The inference won't be as compatible I guess, that's the current inference code which relies on the english-only bert:
    with torch.no_grad():
        input_lengths = torch.LongTensor([tokens.shape[-1]]).to(device)
        text_mask = length_to_mask(input_lengths).to(device)

        t_en = model.text_encoder(tokens, input_lengths, text_mask)
        bert_dur = model.bert(tokens, attention_mask=(~text_mask).int())
        d_en = model.bert_encoder(bert_dur).transpose(-1, -2)

        s_pred = sampler(noise=torch.randn((1, 256)).unsqueeze(1).to(device),
                         embedding=bert_dur,
                         embedding_scale=embedding_scale,
                         features=ref_s,  # reference from the same speaker as the embedding
                         num_steps=diffusion_steps).squeeze(1)

bharathraj-v changed the title ~~How do I replace PL-BERT with XPhoneBERT?~~ How to replace PL-BERT with XPhoneBERT? Nov 15, 2023

yl4579 closed this as completed Nov 15, 2023

cmp-nct mentioned this issue Nov 20, 2023

Awesome in english but no support for other languages - please add an example for another language (german, italian, french etc) #41

Open

yl4579 mentioned this issue Dec 9, 2023

Use XPhoneBERT instead of the provided PL-BERT checkpoints. #140

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to replace PL-BERT with XPhoneBERT? #28

How to replace PL-BERT with XPhoneBERT? #28

bharathraj-v commented Nov 14, 2023

yl4579 commented Nov 15, 2023

yl4579 commented Nov 15, 2023

cmp-nct commented Nov 20, 2023 •

edited

Loading

How to replace PL-BERT with XPhoneBERT? #28

How to replace PL-BERT with XPhoneBERT? #28

Comments

bharathraj-v commented Nov 14, 2023

yl4579 commented Nov 15, 2023

yl4579 commented Nov 15, 2023

cmp-nct commented Nov 20, 2023 • edited Loading

cmp-nct commented Nov 20, 2023 •

edited

Loading