ReDimNet

This is an official implementation of a neural network architecture presented in the paper Reshape Dimensions Network for Speaker Recognition.

Speaker Recognition NN architectures comparison (2024)

Update

2024.11.13 Refactored model's code. Added first pretrained models on voxblink2 dataset, for more info please refer to evaluation page.
2024.07.15 Adding model builder and pretrained weights for: b0, b1, b2, b3, b5, b6 model sizes.

Introduction

We introduce Reshape Dimensions Network (ReDimNet), a novel neural network architecture for spectrogram (audio) processing, specifically for extracting utterance-level speaker representations. ReDimNet reshapes dimensionality between 2D feature maps and 1D signal representations, enabling the integration of 1D and 2D blocks within a single model. This architecture maintains the volume of channel-timestep-frequency outputs across both 1D and 2D blocks, ensuring efficient aggregation of residual feature maps. ReDimNet scales across various model sizes, from 1 to 15 million parameters and 0.5 to 20 GMACs. Our experiments show that ReDimNet achieves state-of-the-art performance in speaker recognition while reducing computational complexity and model size compared to existing systems.

ReDimNet architecture

Usage

Requirement

PyTorch>=2.0

Examples

import torch

# To load pretrained on vox2 model without Large-Margin finetuning
model = torch.hub.load('IDRnD/ReDimNet', 'ReDimNet', model_name='b2', train_type='ptn', dataset='vox2')

# To load pretrained on vox2 model with Large-Margin finetuning:
model = torch.hub.load('IDRnD/ReDimNet', 'ReDimNet', model_name='b2', train_type='ft_lm', dataset='vox2')

For full list of pretrained models, please refer to evaluation

Citation

If you find our work helpful and you used this code in your research, please cite:

@inproceedings{yakovlev24_interspeech,
  title     = {Reshape Dimensions Network for Speaker Recognition},
  author    = {Ivan Yakovlev and Rostislav Makarov and Andrei Balykin and Pavel Malov and Anton Okhotnikov and Nikita Torgashov},
  year      = {2024},
  booktitle = {Interspeech 2024},
  pages     = {3235--3239},
  doi       = {10.21437/Interspeech.2024-2116},
}

Acknowledgements

For model training we used wespeaker pipeline.

Some of the layers we ported from transformers.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
redimnet		redimnet
.gitignore		.gitignore
EVALUATION.md		EVALUATION.md
LICENSE		LICENSE
README.md		README.md
hubconf.py		hubconf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReDimNet

Update

Introduction

Usage

Requirement

Examples

Citation

Acknowledgements

About

Releases 1

Packages

Contributors 5

Languages

License

IDRnD/ReDimNet

Folders and files

Latest commit

History

Repository files navigation

ReDimNet

Update

Introduction

Usage

Requirement

Examples

Citation

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 5

Languages

Packages