[Bug] AttributeError: 'NoneType' object has no attribute 'load_wav' when using tts_with_vc_to_file #3143

pprobst · 2023-11-05T21:22:48Z

Describe the bug

Fix #3108 breaks tts_with_vc_to_file at least with VITS.

See:

Line 463 in 6fef4f9

    
           self.tts_to_file(text=text, speaker=None, language=language, file_path=fp.name,speaker_wav=speaker_wav)

By changing the line from:
self.tts_to_file(text=text, speaker=None, language=language, file_path=fp.name,speaker_wav=speaker_wav)

To its pre-0.19.1 version:
self.tts_to_file(text=text, speaker=None, language=language, file_path=fp.name)

The issue is solved.

Please take a look at the script below for reproduction.

To Reproduce

Clone the Coqui TTS repository and install the dependencies as specified in the README file.
Then, run the following script from TTS's root directory, but replace speaker_wav with any audio file you have at hand:

#!/usr/bin/env python3

import torch
from TTS.api import TTS

device = "cuda" if torch.cuda.is_available() else "cpu"

tts = TTS("tts_models/pt/cv/vits").to(device)

tts.tts_with_vc_to_file(
    text="A radiografia apresentou algumas lesões no fêmur esquerdo ponto parágrafo",
    speaker_wav="test_audios/1693678335_24253176-processed.wav",
    file_path="test_audios/output.wav",
)

Expected behavior

The output audio file defined in file_path is generated, saying the sentence in text with the voice cloned from speaker_wav.

Logs

> tts_models/pt/cv/vits is already downloaded.
 > Using model: vits
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:0
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:None
 | > fft_size:1024
 | > power:None
 | > preemphasis:0.0
 | > griffin_lim_iters:None
 | > signal_norm:None
 | > symmetric_norm:None
 | > mel_fmin:0
 | > mel_fmax:None
 | > pitch_fmin:None
 | > pitch_fmax:None
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:1.0
 | > clip_norm:True
 | > do_trim_silence:False
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > initialization of speaker-embedding layers.
 > initialization of language-embedding layers.
/home/probst/.pyenv/versions/coqui-tts/lib/python3.11/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
 > Text splitted to sentences.
['A radiografia apresentou algumas lesões no fêmur esquerdo ponto parágrafo']
Traceback (most recent call last):
  File "/home/probst/Projects/TTS-iara/./test.py", line 15, in <module>
    tts.tts_with_vc_to_file(
  File "/home/probst/Projects/TTS-iara/TTS/api.py", line 488, in tts_with_vc_to_file
    wav = self.tts_with_vc(text=text, language=language, speaker_wav=speaker_wav)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/probst/Projects/TTS-iara/TTS/api.py", line 463, in tts_with_vc
    self.tts_to_file(text=text, speaker=None, language=language, file_path=fp.name, speaker_wav=speaker_wav)
  File "/home/probst/Projects/TTS-iara/TTS/api.py", line 403, in tts_to_file
    wav = self.tts(text=text, speaker=speaker, language=language, speaker_wav=speaker_wav, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/probst/Projects/TTS-iara/TTS/api.py", line 341, in tts
    wav = self.synthesizer.tts(
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/probst/Projects/TTS-iara/TTS/utils/synthesizer.py", line 362, in tts
    speaker_embedding = self.tts_model.speaker_manager.compute_embedding_from_clip(speaker_wav)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/probst/Projects/TTS-iara/TTS/tts/utils/managers.py", line 365, in compute_embedding_from_clip
    embedding = _compute(wav_file)
                ^^^^^^^^^^^^^^^^^^
  File "/home/probst/Projects/TTS-iara/TTS/tts/utils/managers.py", line 342, in _compute
    waveform = self.encoder_ap.load_wav(wav_file, sr=self.encoder_ap.sample_rate)
               ^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'load_wav'

Environment

- 🐸TTS Version: 0.19.1
- PyTorch Version: 2.1.0+cu121
- OS: Artix Linux

Not using GPU.
Installed everything through pip in a virtual environment created with pyenv.

Additional context

No response

The text was updated successfully, but these errors were encountered:

erogol · 2023-11-08T10:17:25Z

@Aya-AlJafari can you look at this one?

TheLocalLab · 2023-11-20T02:37:38Z

If anyone is still looking through this issue, you might want to take a look at #1440

erogol · 2023-11-20T13:56:32Z

@Aya-AlJafari any updates?

eginhard · 2023-11-20T14:10:01Z

@erogol The original issue (#3067) was people trying to use tts.tts_with_vc_to_file() with XTTS and was "fixed" in ~~#3108~~#3109. But XTTS has integrated VC and you can just do tts.tts_to_file(..., speaker_wav="..."), there is no point in passing it through FreeVC afterwards. IMHO, #3109 should be reverted because it breaks tts.tts_with_vc_to_file() for any model that doesn't have integrated VC, i.e. all models this method is meant for.
Perhaps, tts.tts_with_vc_to_file() could throw a better error message when it's called for models that already support VC.

This reverts commit 041b4b6. Fixes coqui-ai#3143. The original issue (coqui-ai#3067) was people trying to use tts.tts_with_vc_to_file() with XTTS and was "fixed" in coqui-ai#3109. But XTTS has integrated VC and you can just do tts.tts_to_file(..., speaker_wav="..."), there is no point in passing it through FreeVC afterwards. So, reverting this commit because it breaks tts.tts_with_vc_to_file() for any model that doesn't have integrated VC, i.e. all models this method is meant for.

* Revert "fix for issue 3067" This reverts commit 041b4b6. Fixes #3143. The original issue (#3067) was people trying to use tts.tts_with_vc_to_file() with XTTS and was "fixed" in #3109. But XTTS has integrated VC and you can just do tts.tts_to_file(..., speaker_wav="..."), there is no point in passing it through FreeVC afterwards. So, reverting this commit because it breaks tts.tts_with_vc_to_file() for any model that doesn't have integrated VC, i.e. all models this method is meant for. * fix: support multi-speaker models in tts_with_vc/tts_with_vc_to_file * fix: only compute spk embeddings for models that support it Fixes #1440. Passing a `speaker_wav` argument to regular Vits models failed because they don't support voice cloning. Now that argument is simply ignored.

pprobst added the bug Something isn't working label Nov 5, 2023

eginhard mentioned this issue Nov 20, 2023

Fix tts_with_vc #3275

Merged

erogol closed this as completed in #3275 Nov 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] AttributeError: 'NoneType' object has no attribute 'load_wav' when using tts_with_vc_to_file #3143

[Bug] AttributeError: 'NoneType' object has no attribute 'load_wav' when using tts_with_vc_to_file #3143

pprobst commented Nov 5, 2023 •

edited

Loading

erogol commented Nov 8, 2023

TheLocalLab commented Nov 20, 2023

erogol commented Nov 20, 2023

eginhard commented Nov 20, 2023 •

edited

Loading

[Bug] AttributeError: 'NoneType' object has no attribute 'load_wav' when using tts_with_vc_to_file #3143

[Bug] AttributeError: 'NoneType' object has no attribute 'load_wav' when using tts_with_vc_to_file #3143

Comments

pprobst commented Nov 5, 2023 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

erogol commented Nov 8, 2023

TheLocalLab commented Nov 20, 2023

erogol commented Nov 20, 2023

eginhard commented Nov 20, 2023 • edited Loading

pprobst commented Nov 5, 2023 •

edited

Loading

eginhard commented Nov 20, 2023 •

edited

Loading