Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] AttributeError: 'NoneType' object has no attribute 'load_wav' when using tts_with_vc_to_file #3143

Closed
pprobst opened this issue Nov 5, 2023 · 4 comments · Fixed by #3275
Labels
bug Something isn't working

Comments

@pprobst
Copy link

pprobst commented Nov 5, 2023

Describe the bug

Fix #3108 breaks tts_with_vc_to_file at least with VITS.

See:

TTS/TTS/api.py

Line 463 in 6fef4f9

self.tts_to_file(text=text, speaker=None, language=language, file_path=fp.name,speaker_wav=speaker_wav)

By changing the line from:
self.tts_to_file(text=text, speaker=None, language=language, file_path=fp.name,speaker_wav=speaker_wav)

To its pre-0.19.1 version:
self.tts_to_file(text=text, speaker=None, language=language, file_path=fp.name)

The issue is solved.

Please take a look at the script below for reproduction.

To Reproduce

Clone the Coqui TTS repository and install the dependencies as specified in the README file.
Then, run the following script from TTS's root directory, but replace speaker_wav with any audio file you have at hand:

#!/usr/bin/env python3

import torch
from TTS.api import TTS

device = "cuda" if torch.cuda.is_available() else "cpu"

tts = TTS("tts_models/pt/cv/vits").to(device)

tts.tts_with_vc_to_file(
    text="A radiografia apresentou algumas lesões no fêmur esquerdo ponto parágrafo",
    speaker_wav="test_audios/1693678335_24253176-processed.wav",
    file_path="test_audios/output.wav",
)

Expected behavior

The output audio file defined in file_path is generated, saying the sentence in text with the voice cloned from speaker_wav.

Logs

> tts_models/pt/cv/vits is already downloaded.
 > Using model: vits
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:0
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:None
 | > fft_size:1024
 | > power:None
 | > preemphasis:0.0
 | > griffin_lim_iters:None
 | > signal_norm:None
 | > symmetric_norm:None
 | > mel_fmin:0
 | > mel_fmax:None
 | > pitch_fmin:None
 | > pitch_fmax:None
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:1.0
 | > clip_norm:True
 | > do_trim_silence:False
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > initialization of speaker-embedding layers.
 > initialization of language-embedding layers.
/home/probst/.pyenv/versions/coqui-tts/lib/python3.11/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
 > Text splitted to sentences.
['A radiografia apresentou algumas lesões no fêmur esquerdo ponto parágrafo']
Traceback (most recent call last):
  File "/home/probst/Projects/TTS-iara/./test.py", line 15, in <module>
    tts.tts_with_vc_to_file(
  File "/home/probst/Projects/TTS-iara/TTS/api.py", line 488, in tts_with_vc_to_file
    wav = self.tts_with_vc(text=text, language=language, speaker_wav=speaker_wav)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/probst/Projects/TTS-iara/TTS/api.py", line 463, in tts_with_vc
    self.tts_to_file(text=text, speaker=None, language=language, file_path=fp.name, speaker_wav=speaker_wav)
  File "/home/probst/Projects/TTS-iara/TTS/api.py", line 403, in tts_to_file
    wav = self.tts(text=text, speaker=speaker, language=language, speaker_wav=speaker_wav, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/probst/Projects/TTS-iara/TTS/api.py", line 341, in tts
    wav = self.synthesizer.tts(
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/probst/Projects/TTS-iara/TTS/utils/synthesizer.py", line 362, in tts
    speaker_embedding = self.tts_model.speaker_manager.compute_embedding_from_clip(speaker_wav)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/probst/Projects/TTS-iara/TTS/tts/utils/managers.py", line 365, in compute_embedding_from_clip
    embedding = _compute(wav_file)
                ^^^^^^^^^^^^^^^^^^
  File "/home/probst/Projects/TTS-iara/TTS/tts/utils/managers.py", line 342, in _compute
    waveform = self.encoder_ap.load_wav(wav_file, sr=self.encoder_ap.sample_rate)
               ^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'load_wav'

Environment

- 🐸TTS Version: 0.19.1
- PyTorch Version: 2.1.0+cu121
- OS: Artix Linux

Not using GPU.
Installed everything through pip in a virtual environment created with pyenv.

Additional context

No response

@pprobst pprobst added the bug Something isn't working label Nov 5, 2023
@erogol
Copy link
Member

erogol commented Nov 8, 2023

@Aya-AlJafari can you look at this one?

@TheLocalLab
Copy link

If anyone is still looking through this issue, you might want to take a look at #1440

@erogol
Copy link
Member

erogol commented Nov 20, 2023

@Aya-AlJafari any updates?

@eginhard
Copy link
Contributor

eginhard commented Nov 20, 2023

@erogol The original issue (#3067) was people trying to use tts.tts_with_vc_to_file() with XTTS and was "fixed" in #3108#3109. But XTTS has integrated VC and you can just do tts.tts_to_file(..., speaker_wav="..."), there is no point in passing it through FreeVC afterwards. IMHO, #3109 should be reverted because it breaks tts.tts_with_vc_to_file() for any model that doesn't have integrated VC, i.e. all models this method is meant for.
Perhaps, tts.tts_with_vc_to_file() could throw a better error message when it's called for models that already support VC.

eginhard added a commit to idiap/coqui-ai-TTS that referenced this issue Nov 20, 2023
This reverts commit 041b4b6.

Fixes coqui-ai#3143. The original issue (coqui-ai#3067) was people trying to use
tts.tts_with_vc_to_file() with XTTS and was "fixed" in coqui-ai#3109. But XTTS has
integrated VC and you can just do tts.tts_to_file(..., speaker_wav="..."), there
is no point in passing it through FreeVC afterwards. So, reverting this commit
because it breaks tts.tts_with_vc_to_file() for any model that doesn't have
integrated VC, i.e. all models this method is meant for.
erogol pushed a commit that referenced this issue Nov 24, 2023
* Revert "fix for issue 3067"

This reverts commit 041b4b6.

Fixes #3143. The original issue (#3067) was people trying to use
tts.tts_with_vc_to_file() with XTTS and was "fixed" in #3109. But XTTS has
integrated VC and you can just do tts.tts_to_file(..., speaker_wav="..."), there
is no point in passing it through FreeVC afterwards. So, reverting this commit
because it breaks tts.tts_with_vc_to_file() for any model that doesn't have
integrated VC, i.e. all models this method is meant for.

* fix: support multi-speaker models in tts_with_vc/tts_with_vc_to_file

* fix: only compute spk embeddings for models that support it

Fixes #1440. Passing a `speaker_wav` argument to regular Vits models failed
because they don't support voice cloning. Now that argument is simply ignored.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants