[BUG🐛] Size match in converted xttsv2 models #43

scruffynerf · 2024-12-18T22:06:02Z

Bug Description

[rank0]:   File "mypath/lib/python3.10/site-packages/auralis/core/tts.py", line 85, in _load_model
[rank0]:     return MODEL_REGISTRY[config['model_type']].from_pretrained(model_name_or_path, **kwargs)
[rank0]:   File "mypath/lib/python3.10/site-packages/auralis/models/xttsv2/XTTSv2.py", line 299, in from_pretrained
[rank0]:     model.load_state_dict(hifigan_state)
[rank0]:   File "mypath/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2584, in load_state_dict
[rank0]:     raise RuntimeError(
[rank0]: RuntimeError: Error(s) in loading state_dict for XTTSv2Engine:
[rank0]: 	size mismatch for text_embedding.weight: copying a param with shape torch.Size([6153, 1024]) from checkpoint, the shape in current model is torch.Size([6681, 1024]).
[rank0]: 	size mismatch for text_head.weight: copying a param with shape torch.Size([6153, 1024]) from checkpoint, the shape in current model is torch.Size([6681, 1024]).
[rank0]: 	size mismatch for text_head.bias: copying a param with shape torch.Size([6153]) from checkpoint, the shape in current model is torch.Size([6681]).

## Minimal Reproducible Example

use the current converter script with either
HF's drewThomasson/Morgan_freeman_xtts_model
or
HF's scruffynerf/xtts-vincent

(both of these work, and were trained using https://github.com/daswer123/xtts-finetune-webui )

and then try to use/load the resulting converted files

The text was updated successfully, but these errors were encountered:

scruffynerf · 2024-12-18T22:53:40Z

Ah ha, figured it out.

Coqui xtts2 v2.0.2 differs from v2.0.3 in the # of tokens

https://huggingface.co/coqui/XTTS-v2/commit/6b8036b35d787cf43d18d640587956b9db8fd1b8

the above models were training on v2.0.2

The convertor script needs to be aware of this, since any difference will cause it to not work once converted, since the config/etc don't match the actual trained gpt section of the model

Correct me if I'm wrong, but basically, either this means the gpt config must be adjusted in this case, since it no longer matches the stock config/etc. OR you should just fail the convertor, and complain that only v2.0.3 models can be converted.

C00reNUT · 2024-12-19T10:16:29Z

same issue here with 2.0.0 model version used for training, this would also maybe explain the difference in quality/output #27 when I am converting coqui 2.0.0 model using provided script...

scruffynerf added the bug Something isn't working label Dec 18, 2024

scruffynerf mentioned this issue Dec 18, 2024

v.2.0.2 token count differs from v.2.0.3 daswer123/xtts-finetune-webui#87

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG🐛] Size match in converted xttsv2 models #43

[BUG🐛] Size match in converted xttsv2 models #43

scruffynerf commented Dec 18, 2024

scruffynerf commented Dec 18, 2024 •

edited

Loading

C00reNUT commented Dec 19, 2024 •

edited

Loading

[BUG🐛] Size match in converted xttsv2 models #43

[BUG🐛] Size match in converted xttsv2 models #43

Comments

scruffynerf commented Dec 18, 2024

Bug Description

scruffynerf commented Dec 18, 2024 • edited Loading

C00reNUT commented Dec 19, 2024 • edited Loading

scruffynerf commented Dec 18, 2024 •

edited

Loading

C00reNUT commented Dec 19, 2024 •

edited

Loading