-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
review #1
review #1
Changes from 54 commits
fa276ca
b5a193f
3ceef8d
754051f
30b21c5
f3ce9a6
b65e32e
10c2578
4e50daf
1bb5043
9127466
9a95a9c
c3b6044
f266701
2a1191c
7f5bd64
c63da0e
fcc35f4
be68670
c21557d
5e01862
3d096e9
bc57088
97f6f10
6b64282
510fff9
a6a9506
67cae64
cda2178
60ab5a2
88822e2
42f850c
a659d93
87d6ccd
87dae59
97578fb
ddb459b
4cc3311
95e8ad1
78f396f
06057a8
e08c9b1
163df3e
fe8d679
c063333
2344f3d
d18a30a
234c634
1639898
98df7c2
6b87c1d
8a328c6
8a8e6d5
062f28b
def592a
2f869eb
6cec29a
1fdc0da
f46b0e0
d8ea2f8
d4c60de
08fefd1
ae7eeea
fd090eb
20d2466
0d7eb3b
50c93aa
26b6b2d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
name: Main | ||
|
||
on: [push, pull_request] | ||
|
||
jobs: | ||
main: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v3 | ||
- uses: actions/setup-python@v3 | ||
with: | ||
python-version: 3.10.12 | ||
cache: "pip" | ||
- name: "installation" | ||
run: | | ||
pip install -r requirements-dev.txt | ||
- name: "black" | ||
run: black . --check --diff --color |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04 | ||
|
||
ENV NV_CUDNN_VERSION 8.6.0.163 | ||
ENV NV_CUDNN_PACKAGE_NAME "libcudnn8" | ||
|
||
ENV NV_CUDNN_PACKAGE "$NV_CUDNN_PACKAGE_NAME=$NV_CUDNN_VERSION-1+cuda11.8" | ||
ENV DEBIAN_FRONTEND=noninteractive | ||
|
||
RUN apt-get -y update && apt-get -y upgrade && apt-get install -y --no-install-recommends ffmpeg | ||
RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
${NV_CUDNN_PACKAGE} \ | ||
unzip \ | ||
&& apt-mark hold ${NV_CUDNN_PACKAGE_NAME} \ | ||
&& rm -rf /var/lib/apt/lists/* | ||
RUN apt-get update -y \ | ||
&& apt-get install -y python3-pip | ||
RUN echo 'alias python=python3' >> ~/.bashrc | ||
RUN echo 'NCCL_SOCKET_IFNAME=lo' >> ~/.bashrc | ||
|
||
|
||
WORKDIR /app | ||
COPY requirements.txt requirements.txt | ||
RUN pip install -r requirements.txt | ||
|
||
ENTRYPOINT [ "bash" ] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# PM-Unet: phase and magnitude aware model for music source separation | ||
[![githubio](https://img.shields.io/badge/GitHub.io-Audio_Samples-blue?logo=Github&style=flat-square)](https://d-a-yakovlev.github.io/test/) | ||
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1OXlCZgd5KidMDZDUItOIT9ZA4IUJHXsZ?usp=sharing) | ||
|
||
## Navigation | ||
1. [Structure](#structure) | ||
2. [Docker](#docker) | ||
3. [Training](#training) | ||
4. [Inference](#inference) | ||
|
||
## Structure | ||
- [`separator`](./separator) ‒ main source code with model and dataset implementations and code to train model. | ||
- [`streaming`](./streaming/demo) ‒ source code inference tf-lite version model. | ||
|
||
## Docker | ||
#### To set up environment with Docker | ||
|
||
If you don't have Docker installed, please follow the links to find installation instructions for [Ubuntu](https://docs.docker.com/desktop/install/linux-install/), [Mac](https://docs.docker.com/desktop/install/mac-install/) or [Windows](https://docs.docker.com/desktop/install/windows-install/). | ||
|
||
Build docker image: | ||
|
||
docker build -t pmunet . | ||
|
||
Run docker image: | ||
|
||
bash run_docker.sh | ||
|
||
## Data | ||
Used dataset [MUSDB18-HQ](https://sigsep.github.io/datasets/musdb.html#musdb18-hq-uncompressed-wav). | ||
|
||
[![Download dataset](https://img.shields.io/badge/Download%20dataset-65c73b)](https://zenodo.org/record/3338373/files/musdb18hq.zip?download=1) | ||
|
||
The dataset consists of | ||
150 full-length stereo tracks sampled at 44.1 kHz. providing a | ||
complete audio mix and four main elements: ”vocal”, ”bass”, | ||
”drums” and ”other” for each sample, which can be considered as a target in the context of source separation. The kit | ||
structure offers 100 training compositions and 50 validation | ||
compositions | ||
|
||
## Training | ||
1. Configure arguments in `separator/config/config.py`. | ||
2. `cd separator`. | ||
3. Run `python3 separator/pl_model.py`. | ||
|
||
## Inference | ||
|
||
### Auto local | ||
1. Configure arguments in `separator/config/config.py`. | ||
2. `cd separator`. | ||
3. `python3 inference.py [-IO]` | ||
- `-I` specify path to mixture, | ||
- `-O` output dir, both of them optional. | ||
|
||
By default script loads `.pt` file with weights and `sample.wav` from google drive. | ||
|
||
#### For example | ||
``` | ||
python3 inference.py -I path/to/mix -O out_dir | ||
``` | ||
With successful script run four audio files (`vocals.wav` and `drums.wav`, `bass.wav`, `other.wav`) will be in `out_dir`. By default in `separator/inference/output`. | ||
|
||
**You can download weights manually** | ||
|
||
Download one the .pt file below: | ||
* [LSTM-bottleneck version](https://drive.google.com/file/d/18jT2TYffdRD1fL7wecAiM5nJPM_OKpNB/view?usp=drive_link) | ||
* [WIthout LSTM-bottleneck version](https://drive.google.com/file/d/1VO07OYbsnCuEJYRSuA8HhjlQnx6dbWX7/view?usp=drive_link) | ||
|
||
### Streaming | ||
In streaming section located scripts for: convert model to `tflite` format and run `tflite` model in `"stream mode"`. | ||
|
||
1. Configure arguments in `streaming/config/config.py`. | ||
2. `cd streaming`. | ||
3. `python3 runner.py` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
black | ||
mypy | ||
pytest |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,139 @@ | ||
aiohttp==3.8.4 | ||
aiosignal==1.3.1 | ||
antlr4-python3-runtime==4.9.3 | ||
appdirs==1.4.4 | ||
asttokens | ||
async-timeout==4.0.2 | ||
attrs==23.1.0 | ||
audioread==3.0.0 | ||
backcall | ||
certifi==2023.5.7 | ||
cffi==1.15.1 | ||
charset-normalizer==3.1.0 | ||
cmake==3.26.4 | ||
comm | ||
contourpy | ||
cycler | ||
Cython==0.29.35 | ||
debugpy | ||
decorator | ||
diffq==0.2.4 | ||
einops==0.6.1 | ||
executing | ||
fast-bss-eval==0.1.4 | ||
ffmpeg-python==0.2.0 | ||
filelock==3.12.0 | ||
fonttools==4.25.0 | ||
frozenlist==1.3.3 | ||
fsspec==2023.6.0 | ||
future==0.18.3 | ||
gdown | ||
idna==3.4 | ||
ipykernel | ||
ipython | ||
jedi | ||
Jinja2==3.1.2 | ||
joblib==1.3.1 | ||
jsonschema==4.19.0 | ||
jsonschema-specifications==2023.7.1 | ||
julius==0.2.7 | ||
jupyter_client | ||
jupyter_core | ||
kiwisolver | ||
lameenc==1.4.2 | ||
lazy_loader==0.3 | ||
librosa==0.10.0.post2 | ||
lightning-utilities==0.8.0 | ||
lit==16.0.5.post0 | ||
llvmlite==0.40.1 | ||
lpips==0.1.4 | ||
MarkupSafe==2.1.3 | ||
matplotlib | ||
matplotlib-inline | ||
mir-eval==0.7 | ||
mkl-fft==1.3.6 | ||
mkl-random | ||
mkl-service==2.4.0 | ||
mpmath==1.3.0 | ||
msgpack==1.0.5 | ||
multidict==6.0.4 | ||
munkres==1.1.4 | ||
musdb==0.4.0 | ||
museval==0.4.1 | ||
nest-asyncio | ||
networkx==3.1 | ||
numba==0.57.1 | ||
numpy #==1.24.4 | ||
nobuco | ||
nvidia-cublas-cu11==11.10.3.66 | ||
nvidia-cuda-cupti-cu11==11.7.101 | ||
nvidia-cuda-nvrtc-cu11==11.7.99 | ||
nvidia-cuda-runtime-cu11==11.7.99 | ||
nvidia-cudnn-cu11==8.5.0.96 | ||
nvidia-cufft-cu11==10.9.0.58 | ||
nvidia-curand-cu11==10.2.10.91 | ||
nvidia-cusolver-cu11==11.4.0.1 | ||
nvidia-cusparse-cu11==11.7.4.91 | ||
nvidia-nccl-cu11==2.14.3 | ||
nvidia-nvtx-cu11==11.7.91 | ||
omegaconf==2.3.0 | ||
openunmix==1.2.1 | ||
packaging | ||
pandas==2.1.0 | ||
parso | ||
pexpect | ||
pickleshare | ||
Pillow==9.5.0 | ||
platformdirs | ||
ply==3.11 | ||
pooch==1.6.0 | ||
primePy==1.3 | ||
prompt-toolkit | ||
psutil | ||
ptyprocess | ||
pure-eval | ||
pyaml==23.5.9 | ||
pycparser==2.21 | ||
pyee==10.0.1 | ||
Pygments | ||
pyparsing | ||
PyQt5-sip==12.11.0 | ||
PySoundFile==0.9.0.post1 | ||
python-dateutil | ||
python-ffmpeg==2.0.4 | ||
pytorch-lightning==2.0.3 | ||
pytz==2023.3 | ||
PyYAML==6.0 | ||
pyzmq | ||
referencing==0.30.2 | ||
requests==2.31.0 | ||
rpds-py==0.10.0 | ||
scikit-learn==1.3.0 | ||
scipy==1.10.1 | ||
simplejson==3.19.1 | ||
sip | ||
six | ||
soundfile==0.12.1 | ||
sox==1.4.1 | ||
soxr==0.3.5 | ||
stack-data | ||
stempeg==0.2.3 | ||
sympy==1.12 | ||
tensorflow>=2.13.0 #.* | ||
threadpoolctl==3.1.0 | ||
toml | ||
torch==2.0.1 | ||
torch-audiomentations==0.11.0 | ||
torch-pitch-shift==1.2.4 | ||
torchaudio==2.0.2 | ||
torchmetrics==0.11.4 | ||
torchvision==0.15.2 | ||
tornado | ||
tqdm==4.65.0 | ||
traitlets | ||
triton==2.0.0 | ||
typing_extensions>=4.6.1 | ||
tzdata==2023.3 | ||
urllib3==2.0.3 | ||
wcwidth | ||
yarl==1.9.2 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
#!/bin/bash | ||
|
||
app=$PWD | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. $(pwd) |
||
|
||
docker run --name pmunet -it --rm \ | ||
--net=host --ipc=host \ | ||
--gpus "all" \ | ||
-v ${app}:/app \ | ||
pmunet |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. нужны комменты к каждому полю, вот пример как гпт4 сгенерил, поправьте чтоб получше было, т.е. оставь группировку DATA OPTIONS MODEL OPTIONS TRAIN OPTIONS AUGMENTATION OPTIONS ну и убери избыточность в комментах там где будут идти строчки в стиле # Section
param1 : VeryLongType1 = "value1" # doc1
long_param_name : Type1 = "value1" # doc2 форматнуть как # Section
param1 : VeryLongType1 = "value1" # doc1
long_param_name : Type1 = "value1" # doc2 @dataclass
class TrainConfig:
# The computing platform for training: 'cuda' for NVIDIA GPUs or 'cpu' for CPU-based training.
device: str = "cuda"
# Directory path where the MUSDB18-HQ dataset is stored or to be downloaded.
musdb_path: str = "musdb18hq"
# Directory path for saving training metadata, like track names and lengths.
metadata_train_path: str = "metadata"
# Directory path for saving testing metadata.
metadata_test_path: str = "metadata1"
# Length (in seconds) of each audio segment used during training. Shorter segments can help in learning finer details.
segment: int = 5
# Batch size for training, determining how many samples are processed together. Affects memory usage and gradient updates.
batch_size: int = 6
# Whether to shuffle the training dataset at the beginning of each epoch. Shuffling can help in reducing overfitting.
shuffle_train: bool = True
# Whether to shuffle the validation dataset. Generally not needed as validation performance is independent of data order.
shuffle_valid: bool = False
# Whether to drop the last incomplete batch in case the dataset size isn't divisible by the batch size.
drop_last: bool = True
# Number of worker processes used for loading data. Higher numbers can speed up data loading.
num_workers: int = 2
# Strategy for monitoring metrics to save model checkpoints: 'min' for saving when a monitored metric is minimized.
metric_monitor_mode: str = "min"
# Number of best-performing model weights to save based on the monitored metric.
save_top_k_model_weights: int = 1
# PM_Unet model configurations
# Specific sources to target in source separation, e.g., separating drums, bass, etc.
model_source: tuple = ("drums", "bass", "other", "vocals")
# The depth of the U-Net architecture, affecting its capacity and complexity.
model_depth: int = 4
# Number of initial channels in U-Net layers, influencing the model's learning capability.
model_channel: int = 28
# Indicates whether the input audio should be treated as mono (True) or stereo (False).
is_mono: bool = False
# Whether to utilize masking within the model for source separation.
mask_mode: bool = False
# Mode of skip connections in U-Net ('concat' for concatenation, 'add' for summation).
skip_mode: str = "concat"
# Number of bins used in Fast Fourier Transform (FFT) during audio processing.
nfft: int = 4096
# Determines whether to use LSTM layers as bottleneck in the U-Net architecture.
bottlneck_lstm: bool = True
# Number of LSTM layers if bottleneck is enabled, adding to the model's complexity and ability to capture temporal dependencies.
layers: int = 2
# Flag to decide whether to apply Short Time Fourier Transform (STFT) on input audio.
stft_flag: bool = True
# Augmentation parameters
# Maximum number of samples to shift in time during data augmentation, helping the model generalize over temporal variations.
shift: int = 8192
# Probability of applying pitch shift augmentation to the audio, introducing pitch variability in training data.
pitchshift_proba: float = 0.2
# Range of semitone shifts for pitch shift augmentation applied specifically to vocal tracks.
vocals_min_semitones: int = -5
vocals_max_semitones: int = 5
# Range of semitone shifts for pitch shift augmentation applied to non-vocal tracks.
other_min_semitones: int = -2
other_max_semitones: int = 2
# Flag to enable pitch shift augmentation on non-vocal sources.
pitchshift_flag_other: bool = False
# Probability of applying time stretching or compression to alter the speed of the audio without changing its pitch.
time_change_proba: float = 0.2
# Factors for time stretching/compression, defining the range and intensity of this augmentation.
time_change_factors: tuple = (0.8, 0.85, 0.9, 0.95, 1.05, 1.1, 1.15, 1.2, 1.25, 1.3)
# Probability of remixing audio tracks, useful for training the model to recognize and separate blended sounds.
remix_proba: float = 1
# Number of tracks to group together for the remix augmentation.
remix_group_size: int = batch_size
# Probability and range of scaling the amplitude of audio tracks, introducing dynamic range variability.
scale_proba: float = 1
scale_min: float = 0.25
scale_max: float = 1.25
# Probability of applying a fade effect to the audio, simulating natural starts and ends of sounds.
fade_mask_proba: float = 0.1
# Probability of doubling one channel's audio to both channels, simulating mono audio in a stereo setup.
double_proba: float = 0.1
# Probability of reversing a segment of the audio track, useful for capturing reversed audio patterns.
reverse_proba: float = 0.2
# Probability and depth of mixing tracks together to create mashups, aiding the model in learning from complex blends of sounds.
mushap_proba: float = 0.0
mushap_depth: int = 2
# Loss function parameters
# Multiplicative factors for different components of the loss function, affecting the emphasis on different aspects of the training objective.
factor: int = 1
c_factor: int = 1
# Number of FFT bins for calculating loss, impacting the granularity of frequency-based loss computation.
loss_nfft: tuple = (4096,)
# Gamma parameter for adjusting the focus of the loss function on certain aspects of the audio spectrum.
gamma: float = 0.3
# Learning rate for the optimizer, a crucial parameter for training convergence.
lr: float = 0.5 * 3e-3
# Period of the cosine annealing schedule in learning rate adjustment, influencing the adaptation of learning rate over time.
T_0: int = 40
# Maximum number of training epochs, defining the total duration of the training process.
max_epochs: int = 100
# Precision of training computations, e.g., '16' for 16-bit floating-point precision, reducing memory usage and potentially speeding up training.
precision: str = 16
# Gradient clipping value to prevent exploding gradients, aiding in training stability.
grad_clip: float = 0.5 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
from dataclasses import dataclass | ||
from pathlib import Path | ||
from typing import Union | ||
|
||
|
||
@dataclass | ||
class TrainConfig: | ||
device: str = "cuda" | ||
|
||
# datasets | ||
musdb_path: str = "musdb18hq" | ||
metadata_train_path: str = "metadata" | ||
metadata_test_path: str = "metadata1" | ||
segment: int = 5 | ||
|
||
# dataloaders | ||
batch_size: int = 6 | ||
shuffle_train: bool = True | ||
shuffle_valid: bool = False | ||
drop_last: bool = True | ||
num_workers: int = 2 | ||
|
||
# checkpoint_callback | ||
metric_monitor_mode: str = "min" | ||
save_top_k_model_weights: int = 1 | ||
|
||
# PM_Unet model | ||
model_source: tuple = ("drums", "bass", "other", "vocals") | ||
model_depth: int = 4 | ||
model_channel: int = 28 | ||
is_mono: bool = False | ||
mask_mode: bool = False | ||
skip_mode: str = "concat" | ||
nfft: int = 4096 | ||
bottlneck_lstm: bool = True | ||
layers: int = 2 | ||
stft_flag: bool = True | ||
# augments | ||
shift: int = 8192 | ||
pitchshift_proba: float = 0.2 | ||
vocals_min_semitones: int = -5 | ||
vocals_max_semitones: int = 5 | ||
other_min_semitones: int = -2 | ||
other_max_semitones: int = 2 | ||
pitchshift_flag_other: bool = False | ||
time_change_proba: float = 0.2 | ||
time_change_factors: tuple = (0.8, 0.85, 0.9, 0.95, 1.05, 1.1, 1.15, 1.2, 1.25, 1.3) | ||
remix_proba: float = 1 | ||
remix_group_size: int = batch_size | ||
scale_proba: float = 1 | ||
scale_min: float = 0.25 | ||
scale_max: float = 1.25 | ||
fade_mask_proba: float = 0.1 | ||
double_proba: float = 0.1 | ||
reverse_proba: float = 0.2 | ||
mushap_proba: float = 0.0 | ||
mushap_depth: int = 2 | ||
|
||
# loss if there are artifacts while listening, then increase this params | ||
factor: int = 1 | ||
c_factor: int = 1 | ||
loss_nfft: tuple = (4096,) | ||
gamma: float = 0.3 | ||
# lr | ||
lr: float = 0.5 * 3e-3 | ||
T_0: int = 40 | ||
|
||
# lightning | ||
max_epochs: int = 100 | ||
precision: str = 16 # "bf16-mixed" | ||
grad_clip: float = 0.5 | ||
|
||
|
||
@dataclass | ||
class InferenceConfig: | ||
GDRIVE_PREFIX = "https://drive.google.com/uc?id=" | ||
|
||
device: str = "cpu" | ||
|
||
# weights | ||
weights_dir: Path = Path("/app/separator/inference/weights") | ||
gdrive_weights_LSTM: str = f"{GDRIVE_PREFIX}18jT2TYffdRD1fL7wecAiM5nJPM_OKpNB" | ||
gdrive_weights_conv: str = f"{GDRIVE_PREFIX}1VO07OYbsnCuEJYRSuA8HhjlQnx6dbWX7" | ||
|
||
# inference instance | ||
segment: int = 7 | ||
overlap: float = 0.2 | ||
offset: Union[int, None] = None | ||
duration: Union[int, None] = None | ||
|
||
# inference | ||
sample_rate: int = 44100 | ||
num_channels: int = 2 | ||
default_result_dir: str = "/app/separator/inference/output" | ||
default_input_dir: str = "/app/separator/inference/input" | ||
# adele | ||
gdrive_mix: str = f"{GDRIVE_PREFIX}1zJpyW1fYxHKXDcDH9s5DiBCYiRpraDB3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
зачем эти приседания с установкой куднн руками? можно взять готовый образ с куднн, например
nvcr.io/nvidia/tensorrt:22.08-py3
можно поискать более подходящий