This project is an unoffical implementation of the HiFi-GAN+ model for audio bandwidth extension, from the paper Bandwidth Extension is All You Need by Jiaqi Su, Yunyun Wang, Adam Finkelstein, and Zeyu Jin.
The model takes a band-limited audio signal (usually 8/16/24kHz) and attempts to reconstruct the high frequency components needed to restore a full-band signal at 48kHz. This is useful for upsampling low-rate outputs from upstream tasks like text-to-speech, voice conversion, etc. or enhancing audio that was filtered to remove high frequency noise. For more information, please see this blog post.
The example below uses a pretrained HiFi-GAN+ model to upsample a 1 second 24kHz sawtooth to 48kHz.
import torch
from hifi_gan_bwe import BandwidthExtender
model = BandwidthExtender.from_pretrained("hifi-gan-bwe-10-42890e3-vctk-48kHz")
fs = 24000
x = torch.full([fs], 261.63 / fs).cumsum(-1) % 1.0 - 0.5
y = model(x, fs)
There is a Gradio demo on HugggingFace Spaces where you can upload audio clips and run the model. You can also run the model on Colab with this notebook.
The HiFi-GAN+ library can be run directly from PyPI if you have the pipx application installed. The following script uses a hosted pretrained model to upsample an MP3 file to 48kHz. The input audio can be in any format supported by the audioread library, and the output can be in any format supported by soundfile.
pipx run --python=python3.9 hifi-gan-bwe \
hifi-gan-bwe-10-42890e3-vctk-48kHz \
input.mp3 \
output.wav
If you have a Python 3.9 virtual environment installed, you can install the HiFi-GAN+ library into it and run synthesis, training, etc. using it.
pip install hifi-gan-bwe
hifi-synth hifi-gan-bwe-10-42890e3-vctk-48kHz input.mp3 output.wav
The following models can be loaded with BandwidthExtender.from_pretrained
and used for audio upsampling. You can also download the model file from
the link and use it offline.
Name | Sample Rate | Parameters | Wandb Metrics | Notes |
---|---|---|---|---|
hifi-gan-bwe-10-42890e3-vctk-48kHz | 48kHz | 1M | bwe-10-42890e3 | Same as bwe-05, but uses bandlimited interpolation for upsampling, for reduced noise and aliasing. Uses the same parameters as resampy's kaiser_best mode. |
hifi-gan-bwe-11-d5f542d-vctk-8kHz-48kHz | 48kHz | 1M | bwe-11-d5f542d | Same as bwe-10, but trained only on 8kHz sources, for specialized upsampling. |
hifi-gan-bwe-12-b086d8b-vctk-16kHz-48kHz | 48kHz | 1M | bwe-12-b086d8b | Same as bwe-10, but trained only on 16kHz sources, for specialized upsampling. |
hifi-gan-bwe-13-59f00ca-vctk-24kHz-48kHz | 48kHz | 1M | bwe-13-59f00ca | Same as bwe-10, but trained only on 24kHz sources, for specialized upsampling. |
hifi-gan-bwe-05-cd9f4ca-vctk-48kHz | 48kHz | 1M | bwe-05-cd9f4ca | Trained for 200K iterations on the VCTK speech dataset with noise agumentation from the DNS Challenge dataset. |
If you want to train your own model, you can use any of the methods above to install/run the library or fork the repo and run the script commands locally. The following commands are supported:
Name | Description |
---|---|
hifi-train | Starts a new training run, pass in a name for the run. |
hifi-clone | Clone an existing training run at a given or the latest checkpoint. |
hifi-export | Optimize a model for inference and export it to a PyTorch model file (.pt). |
hifi-synth | Run model inference using a trained model on a source audio file. |
For example, you might start a new training run called bwe-01
with the
following command:
hifi-train 01
To train a model, you will first need to download the
VCTK and
DNS Challenge
datasets. By default, these datasets are assumed to be in the ./data/vctk
and ./data/dns
directories. See train.py
for how to specify your own
training data directories. If you want to use a custom training dataset,
you can implement a dataset wrapper in datasets.py.
The training scripts use wandb.ai for experiment tracking
and visualization. Wandb metrics can be disabled by passing --no_wandb
to
the training script. All of my own experiment results are publicly available at
wandb.ai/brentspell/hifi-gan-bwe.
Each training run is identified by a name and a git hash
(ex: bwe-01-8abbca9
). The git hash is used for simple experiment tracking,
reproducibility, and model provenance. Using git to manage experiments also
makes it easy to change model hyperparameters by simply changing the code,
making a commit, and starting the training run. This is why there is no
hyperparameter configuration file in the project, since I often end up
having to change the code anyway to run interesting experiments.
The following script creates a virtual environment using pyenv for the project and installs dependencies.
pyenv install 3.9.10
pyenv virtualenv 3.9.10 hifi-gan-bwe
pip install -r requirements.txt
If you want to run the hifi-*
scripts described above in development,
you can install the package locally:
pip install -e .
You can then run tests, etc. follows:
pytest --cov=hifi_gan_bwe
black .
isort --profile=black .
flake8 .
mypy .
These checks are also included in the pre-commit configuration for the project, so you can set them up to run automatically on commit by running
pre-commit install
The original research on the HiFi-GAN+ model is not my own, and all credit goes to the paper's authors. I also referred to kan-bayashi's excellent Parallel WaveGAN implementation, specifically the WaveNet module. If you use this code, please cite the original paper:
@inproceedings{su2021bandwidth,
title={Bandwidth extension is all you need},
author={Su, Jiaqi and Wang, Yunyun and Finkelstein, Adam and Jin, Zeyu},
booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={696--700},
year={2021},
organization={IEEE},
url={https://doi.org/10.1109/ICASSP39728.2021.9413575},
}
Copyright © 2022 Brent M. Spell
Licensed under the MIT License (the "License"). You may not use this package except in compliance with the License. You may obtain a copy of the License at
https://opensource.org/licenses/MIT
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.