Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review #1

Merged
merged 68 commits into from
Jan 16, 2024
Merged
Show file tree
Hide file tree
Changes from 54 commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
fa276ca
Initial commit
maks00170 Oct 9, 2023
b5a193f
init_commit
maks00170 Oct 9, 2023
3ceef8d
clean pl.py
maks00170 Oct 9, 2023
754051f
init stream demo
d-a-yakovlev Oct 9, 2023
30b21c5
streaming: demo structure
d-a-yakovlev Oct 9, 2023
f3ce9a6
streaming: IPython commenting
d-a-yakovlev Oct 9, 2023
b65e32e
update
maks00170 Oct 27, 2023
10c2578
Update README.md
maks00170 Oct 27, 2023
4e50daf
add req
maks00170 Oct 27, 2023
1bb5043
merge
maks00170 Oct 27, 2023
9127466
del pl
maks00170 Oct 27, 2023
9a95a9c
Update README.md
maks00170 Oct 27, 2023
c3b6044
Update README.md
maks00170 Oct 27, 2023
f266701
up readme
maks00170 Oct 27, 2023
2a1191c
refactor dataset.py - flake8 and del distrib dep
d-a-yakovlev Nov 8, 2023
7f5bd64
refactor dataset.py - print -> logging
d-a-yakovlev Nov 8, 2023
c63da0e
update code: 1)add comments 2)black, 3)refactor code 4)test notebook
maks00170 Nov 9, 2023
fcc35f4
update Readme
maks00170 Nov 9, 2023
be68670
relative path -> absolute, some annotations [NTC]
d-a-yakovlev Nov 10, 2023
c21557d
separator: add config
d-a-yakovlev Nov 14, 2023
5e01862
separator: update config
d-a-yakovlev Nov 14, 2023
3d096e9
separator: eval script, update config, gdown to requirements
d-a-yakovlev Nov 14, 2023
bc57088
separator: one download default sample
d-a-yakovlev Nov 14, 2023
97f6f10
separator: eval fix
d-a-yakovlev Nov 14, 2023
6b64282
streaming: raw converter script
d-a-yakovlev Nov 16, 2023
510fff9
general: update README
d-a-yakovlev Nov 16, 2023
a6a9506
general: update README
d-a-yakovlev Nov 16, 2023
67cae64
streaming: add stream class
d-a-yakovlev Nov 16, 2023
cda2178
separator: config,pl_model,readme update & delete test
maks00170 Nov 16, 2023
60ab5a2
general: init dockerfile
d-a-yakovlev Nov 16, 2023
88822e2
general: add docker section, docker run script, fix configs paths
d-a-yakovlev Nov 22, 2023
42f850c
general: add colab link stand, fix run_docker
d-a-yakovlev Nov 22, 2023
a659d93
general: minor fix in Dockerfile
d-a-yakovlev Nov 23, 2023
87d6ccd
separator: eval->inference, del distrib, config
maks00170 Nov 27, 2023
87dae59
separator: black
maks00170 Nov 27, 2023
97578fb
general: docker tune
d-a-yakovlev Nov 28, 2023
ddb459b
init_weights
maks00170 Dec 17, 2023
4cc3311
general: minor change with {} works in Win
d-a-yakovlev Dec 17, 2023
95e8ad1
separator: cleaning and add some attributes in model
d-a-yakovlev Dec 17, 2023
78f396f
streaming: deprecated clean
d-a-yakovlev Dec 17, 2023
06057a8
streaming: converter script
d-a-yakovlev Dec 17, 2023
e08c9b1
general: slightly updated readme
d-a-yakovlev Dec 17, 2023
163df3e
general: add CI
d-a-yakovlev Dec 18, 2023
fe8d679
general: configuring CI 1
d-a-yakovlev Dec 18, 2023
c063333
general: configuring CI 2
d-a-yakovlev Dec 18, 2023
2344f3d
streaming: pushing all sources
d-a-yakovlev Dec 18, 2023
d18a30a
streaming: take out deprecated 2
d-a-yakovlev Dec 18, 2023
234c634
general: fix typos in readme
d-a-yakovlev Dec 18, 2023
1639898
general: black apply
d-a-yakovlev Dec 18, 2023
98df7c2
general: configuring CI 2
d-a-yakovlev Dec 18, 2023
6b87c1d
general: configuring CI 3
d-a-yakovlev Dec 18, 2023
8a328c6
general: configuring CI 4
d-a-yakovlev Dec 18, 2023
8a8e6d5
streaming: take out trash 3
d-a-yakovlev Dec 18, 2023
062f28b
Merge branch 'main' into project
Lallapallooza Dec 18, 2023
def592a
refac: full - run_docker, almost - dataset, inference
d-a-yakovlev Dec 29, 2023
2f869eb
refac: full - runner, stream_class -> tf_lite_stream, almost - converter
d-a-yakovlev Dec 29, 2023
6cec29a
refac: runner
d-a-yakovlev Dec 29, 2023
1fdc0da
rafactor_separator: modules, PM_unet, pl_model,STFT
maks00170 Dec 29, 2023
f46b0e0
black and refac config
maks00170 Dec 29, 2023
d8ea2f8
one del space config
maks00170 Dec 29, 2023
d4c60de
refac stream config
maks00170 Dec 30, 2023
08fefd1
ref comment config
maks00170 Dec 30, 2023
ae7eeea
refac: docker debugged 1
d-a-yakovlev Jan 9, 2024
fd090eb
refac: configuring CI 5
d-a-yakovlev Jan 9, 2024
20d2466
refac: inference output 1
d-a-yakovlev Jan 15, 2024
0d7eb3b
refac: inference output 2
d-a-yakovlev Jan 15, 2024
50c93aa
refac: inference output 3
d-a-yakovlev Jan 16, 2024
26b6b2d
refac: fix comment
maks00170 Jan 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions .github/workflows/main.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
name: Main

on: [push, pull_request]

jobs:
main:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v3
with:
python-version: 3.10.12
cache: "pip"
- name: "installation"
run: |
pip install -r requirements-dev.txt
- name: "black"
run: black . --check --diff --color
25 changes: 25 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

зачем эти приседания с установкой куднн руками? можно взять готовый образ с куднн, например
nvcr.io/nvidia/tensorrt:22.08-py3

можно поискать более подходящий


ENV NV_CUDNN_VERSION 8.6.0.163
ENV NV_CUDNN_PACKAGE_NAME "libcudnn8"

ENV NV_CUDNN_PACKAGE "$NV_CUDNN_PACKAGE_NAME=$NV_CUDNN_VERSION-1+cuda11.8"
ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get -y update && apt-get -y upgrade && apt-get install -y --no-install-recommends ffmpeg
RUN apt-get update && apt-get install -y --no-install-recommends \
${NV_CUDNN_PACKAGE} \
unzip \
&& apt-mark hold ${NV_CUDNN_PACKAGE_NAME} \
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update -y \
&& apt-get install -y python3-pip
RUN echo 'alias python=python3' >> ~/.bashrc
RUN echo 'NCCL_SOCKET_IFNAME=lo' >> ~/.bashrc


WORKDIR /app
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

ENTRYPOINT [ "bash" ]
73 changes: 73 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# PM-Unet: phase and magnitude aware model for music source separation
[![githubio](https://img.shields.io/badge/GitHub.io-Audio_Samples-blue?logo=Github&style=flat-square)](https://d-a-yakovlev.github.io/test/)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1OXlCZgd5KidMDZDUItOIT9ZA4IUJHXsZ?usp=sharing)

## Navigation
1. [Structure](#structure)
2. [Docker](#docker)
3. [Training](#training)
4. [Inference](#inference)

## Structure
- [`separator`](./separator) ‒ main source code with model and dataset implementations and code to train model.
- [`streaming`](./streaming/demo) ‒ source code inference tf-lite version model.

## Docker
#### To set up environment with Docker

If you don't have Docker installed, please follow the links to find installation instructions for [Ubuntu](https://docs.docker.com/desktop/install/linux-install/), [Mac](https://docs.docker.com/desktop/install/mac-install/) or [Windows](https://docs.docker.com/desktop/install/windows-install/).

Build docker image:

docker build -t pmunet .

Run docker image:

bash run_docker.sh

## Data
Used dataset [MUSDB18-HQ](https://sigsep.github.io/datasets/musdb.html#musdb18-hq-uncompressed-wav).

[![Download dataset](https://img.shields.io/badge/Download%20dataset-65c73b)](https://zenodo.org/record/3338373/files/musdb18hq.zip?download=1)

The dataset consists of
150 full-length stereo tracks sampled at 44.1 kHz. providing a
complete audio mix and four main elements: ”vocal”, ”bass”,
”drums” and ”other” for each sample, which can be considered as a target in the context of source separation. The kit
structure offers 100 training compositions and 50 validation
compositions

## Training
1. Configure arguments in `separator/config/config.py`.
2. `cd separator`.
3. Run `python3 separator/pl_model.py`.

## Inference

### Auto local
1. Configure arguments in `separator/config/config.py`.
2. `cd separator`.
3. `python3 inference.py [-IO]`
- `-I` specify path to mixture,
- `-O` output dir, both of them optional.

By default script loads `.pt` file with weights and `sample.wav` from google drive.

#### For example
```
python3 inference.py -I path/to/mix -O out_dir
```
With successful script run four audio files (`vocals.wav` and `drums.wav`, `bass.wav`, `other.wav`) will be in `out_dir`. By default in `separator/inference/output`.

**You can download weights manually**

Download one the .pt file below:
* [LSTM-bottleneck version](https://drive.google.com/file/d/18jT2TYffdRD1fL7wecAiM5nJPM_OKpNB/view?usp=drive_link)
* [WIthout LSTM-bottleneck version](https://drive.google.com/file/d/1VO07OYbsnCuEJYRSuA8HhjlQnx6dbWX7/view?usp=drive_link)

### Streaming
In streaming section located scripts for: convert model to `tflite` format and run `tflite` model in `"stream mode"`.

1. Configure arguments in `streaming/config/config.py`.
2. `cd streaming`.
3. `python3 runner.py`
3 changes: 3 additions & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
black
mypy
pytest
139 changes: 139 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
aiohttp==3.8.4
aiosignal==1.3.1
antlr4-python3-runtime==4.9.3
appdirs==1.4.4
asttokens
async-timeout==4.0.2
attrs==23.1.0
audioread==3.0.0
backcall
certifi==2023.5.7
cffi==1.15.1
charset-normalizer==3.1.0
cmake==3.26.4
comm
contourpy
cycler
Cython==0.29.35
debugpy
decorator
diffq==0.2.4
einops==0.6.1
executing
fast-bss-eval==0.1.4
ffmpeg-python==0.2.0
filelock==3.12.0
fonttools==4.25.0
frozenlist==1.3.3
fsspec==2023.6.0
future==0.18.3
gdown
idna==3.4
ipykernel
ipython
jedi
Jinja2==3.1.2
joblib==1.3.1
jsonschema==4.19.0
jsonschema-specifications==2023.7.1
julius==0.2.7
jupyter_client
jupyter_core
kiwisolver
lameenc==1.4.2
lazy_loader==0.3
librosa==0.10.0.post2
lightning-utilities==0.8.0
lit==16.0.5.post0
llvmlite==0.40.1
lpips==0.1.4
MarkupSafe==2.1.3
matplotlib
matplotlib-inline
mir-eval==0.7
mkl-fft==1.3.6
mkl-random
mkl-service==2.4.0
mpmath==1.3.0
msgpack==1.0.5
multidict==6.0.4
munkres==1.1.4
musdb==0.4.0
museval==0.4.1
nest-asyncio
networkx==3.1
numba==0.57.1
numpy #==1.24.4
nobuco
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
omegaconf==2.3.0
openunmix==1.2.1
packaging
pandas==2.1.0
parso
pexpect
pickleshare
Pillow==9.5.0
platformdirs
ply==3.11
pooch==1.6.0
primePy==1.3
prompt-toolkit
psutil
ptyprocess
pure-eval
pyaml==23.5.9
pycparser==2.21
pyee==10.0.1
Pygments
pyparsing
PyQt5-sip==12.11.0
PySoundFile==0.9.0.post1
python-dateutil
python-ffmpeg==2.0.4
pytorch-lightning==2.0.3
pytz==2023.3
PyYAML==6.0
pyzmq
referencing==0.30.2
requests==2.31.0
rpds-py==0.10.0
scikit-learn==1.3.0
scipy==1.10.1
simplejson==3.19.1
sip
six
soundfile==0.12.1
sox==1.4.1
soxr==0.3.5
stack-data
stempeg==0.2.3
sympy==1.12
tensorflow>=2.13.0 #.*
threadpoolctl==3.1.0
toml
torch==2.0.1
torch-audiomentations==0.11.0
torch-pitch-shift==1.2.4
torchaudio==2.0.2
torchmetrics==0.11.4
torchvision==0.15.2
tornado
tqdm==4.65.0
traitlets
triton==2.0.0
typing_extensions>=4.6.1
tzdata==2023.3
urllib3==2.0.3
wcwidth
yarl==1.9.2
9 changes: 9 additions & 0 deletions run_docker.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash

app=$PWD
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$(pwd)


docker run --name pmunet -it --rm \
--net=host --ipc=host \
--gpus "all" \
-v ${app}:/app \
pmunet
97 changes: 97 additions & 0 deletions separator/config/config.py
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

нужны комменты к каждому полю, вот пример как гпт4 сгенерил, поправьте чтоб получше было, т.е. оставь группировку

DATA OPTIONS

MODEL OPTIONS

TRAIN OPTIONS

AUGMENTATION OPTIONS

ну и убери избыточность в комментах

там где будут идти строчки в стиле

# Section
param1 : VeryLongType1 = "value1" # doc1
long_param_name : Type1 = "value1" # doc2

форматнуть как

# Section
param1          : VeryLongType1 = "value1" # doc1
long_param_name : Type1         = "value1" # doc2
@dataclass
class TrainConfig:
    # The computing platform for training: 'cuda' for NVIDIA GPUs or 'cpu' for CPU-based training.
    device: str = "cuda"

    # Directory path where the MUSDB18-HQ dataset is stored or to be downloaded.
    musdb_path: str = "musdb18hq"

    # Directory path for saving training metadata, like track names and lengths.
    metadata_train_path: str = "metadata"

    # Directory path for saving testing metadata.
    metadata_test_path: str = "metadata1"

    # Length (in seconds) of each audio segment used during training. Shorter segments can help in learning finer details.
    segment: int = 5

    # Batch size for training, determining how many samples are processed together. Affects memory usage and gradient updates.
    batch_size: int = 6

    # Whether to shuffle the training dataset at the beginning of each epoch. Shuffling can help in reducing overfitting.
    shuffle_train: bool = True

    # Whether to shuffle the validation dataset. Generally not needed as validation performance is independent of data order.
    shuffle_valid: bool = False

    # Whether to drop the last incomplete batch in case the dataset size isn't divisible by the batch size.
    drop_last: bool = True

    # Number of worker processes used for loading data. Higher numbers can speed up data loading.
    num_workers: int = 2

    # Strategy for monitoring metrics to save model checkpoints: 'min' for saving when a monitored metric is minimized.
    metric_monitor_mode: str = "min"

    # Number of best-performing model weights to save based on the monitored metric.
    save_top_k_model_weights: int = 1

    # PM_Unet model configurations
    # Specific sources to target in source separation, e.g., separating drums, bass, etc.
    model_source: tuple = ("drums", "bass", "other", "vocals")

    # The depth of the U-Net architecture, affecting its capacity and complexity.
    model_depth: int = 4

    # Number of initial channels in U-Net layers, influencing the model's learning capability.
    model_channel: int = 28

    # Indicates whether the input audio should be treated as mono (True) or stereo (False).
    is_mono: bool = False

    # Whether to utilize masking within the model for source separation.
    mask_mode: bool = False

    # Mode of skip connections in U-Net ('concat' for concatenation, 'add' for summation).
    skip_mode: str = "concat"

    # Number of bins used in Fast Fourier Transform (FFT) during audio processing.
    nfft: int = 4096

    # Determines whether to use LSTM layers as bottleneck in the U-Net architecture.
    bottlneck_lstm: bool = True

    # Number of LSTM layers if bottleneck is enabled, adding to the model's complexity and ability to capture temporal dependencies.
    layers: int = 2

    # Flag to decide whether to apply Short Time Fourier Transform (STFT) on input audio.
    stft_flag: bool = True

    # Augmentation parameters
    # Maximum number of samples to shift in time during data augmentation, helping the model generalize over temporal variations.
    shift: int = 8192

    # Probability of applying pitch shift augmentation to the audio, introducing pitch variability in training data.
    pitchshift_proba: float = 0.2

    # Range of semitone shifts for pitch shift augmentation applied specifically to vocal tracks.
    vocals_min_semitones: int = -5
    vocals_max_semitones: int = 5

    # Range of semitone shifts for pitch shift augmentation applied to non-vocal tracks.
    other_min_semitones: int = -2
    other_max_semitones: int = 2

    # Flag to enable pitch shift augmentation on non-vocal sources.
    pitchshift_flag_other: bool = False

    # Probability of applying time stretching or compression to alter the speed of the audio without changing its pitch.
    time_change_proba: float = 0.2

    # Factors for time stretching/compression, defining the range and intensity of this augmentation.
    time_change_factors: tuple = (0.8, 0.85, 0.9, 0.95, 1.05, 1.1, 1.15, 1.2, 1.25, 1.3)

    # Probability of remixing audio tracks, useful for training the model to recognize and separate blended sounds.
    remix_proba: float = 1

    # Number of tracks to group together for the remix augmentation.
    remix_group_size: int = batch_size

    # Probability and range of scaling the amplitude of audio tracks, introducing dynamic range variability.
    scale_proba: float = 1
    scale_min: float = 0.25
    scale_max: float = 1.25

    # Probability of applying a fade effect to the audio, simulating natural starts and ends of sounds.
    fade_mask_proba: float = 0.1

    # Probability of doubling one channel's audio to both channels, simulating mono audio in a stereo setup.
    double_proba: float = 0.1

    # Probability of reversing a segment of the audio track, useful for capturing reversed audio patterns.
    reverse_proba: float = 0.2

    # Probability and depth of mixing tracks together to create mashups, aiding the model in learning from complex blends of sounds.
    mushap_proba: float = 0.0
    mushap_depth: int = 2

    # Loss function parameters
    # Multiplicative factors for different components of the loss function, affecting the emphasis on different aspects of the training objective.
    factor: int = 1
    c_factor: int = 1

    # Number of FFT bins for calculating loss, impacting the granularity of frequency-based loss computation.
    loss_nfft: tuple = (4096,)

    # Gamma parameter for adjusting the focus of the loss function on certain aspects of the audio spectrum.
    gamma: float = 0.3

    # Learning rate for the optimizer, a crucial parameter for training convergence.
    lr: float = 0.5 * 3e-3

    # Period of the cosine annealing schedule in learning rate adjustment, influencing the adaptation of learning rate over time.
    T_0: int = 40

    # Maximum number of training epochs, defining the total duration of the training process.
    max_epochs: int = 100

    # Precision of training computations, e.g., '16' for 16-bit floating-point precision, reducing memory usage and potentially speeding up training.
    precision: str = 16

    # Gradient clipping value to prevent exploding gradients, aiding in training stability.
    grad_clip: float = 0.5

Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
from dataclasses import dataclass
from pathlib import Path
from typing import Union


@dataclass
class TrainConfig:
device: str = "cuda"

# datasets
musdb_path: str = "musdb18hq"
metadata_train_path: str = "metadata"
metadata_test_path: str = "metadata1"
segment: int = 5

# dataloaders
batch_size: int = 6
shuffle_train: bool = True
shuffle_valid: bool = False
drop_last: bool = True
num_workers: int = 2

# checkpoint_callback
metric_monitor_mode: str = "min"
save_top_k_model_weights: int = 1

# PM_Unet model
model_source: tuple = ("drums", "bass", "other", "vocals")
model_depth: int = 4
model_channel: int = 28
is_mono: bool = False
mask_mode: bool = False
skip_mode: str = "concat"
nfft: int = 4096
bottlneck_lstm: bool = True
layers: int = 2
stft_flag: bool = True
# augments
shift: int = 8192
pitchshift_proba: float = 0.2
vocals_min_semitones: int = -5
vocals_max_semitones: int = 5
other_min_semitones: int = -2
other_max_semitones: int = 2
pitchshift_flag_other: bool = False
time_change_proba: float = 0.2
time_change_factors: tuple = (0.8, 0.85, 0.9, 0.95, 1.05, 1.1, 1.15, 1.2, 1.25, 1.3)
remix_proba: float = 1
remix_group_size: int = batch_size
scale_proba: float = 1
scale_min: float = 0.25
scale_max: float = 1.25
fade_mask_proba: float = 0.1
double_proba: float = 0.1
reverse_proba: float = 0.2
mushap_proba: float = 0.0
mushap_depth: int = 2

# loss if there are artifacts while listening, then increase this params
factor: int = 1
c_factor: int = 1
loss_nfft: tuple = (4096,)
gamma: float = 0.3
# lr
lr: float = 0.5 * 3e-3
T_0: int = 40

# lightning
max_epochs: int = 100
precision: str = 16 # "bf16-mixed"
grad_clip: float = 0.5


@dataclass
class InferenceConfig:
GDRIVE_PREFIX = "https://drive.google.com/uc?id="

device: str = "cpu"

# weights
weights_dir: Path = Path("/app/separator/inference/weights")
gdrive_weights_LSTM: str = f"{GDRIVE_PREFIX}18jT2TYffdRD1fL7wecAiM5nJPM_OKpNB"
gdrive_weights_conv: str = f"{GDRIVE_PREFIX}1VO07OYbsnCuEJYRSuA8HhjlQnx6dbWX7"

# inference instance
segment: int = 7
overlap: float = 0.2
offset: Union[int, None] = None
duration: Union[int, None] = None

# inference
sample_rate: int = 44100
num_channels: int = 2
default_result_dir: str = "/app/separator/inference/output"
default_input_dir: str = "/app/separator/inference/input"
# adele
gdrive_mix: str = f"{GDRIVE_PREFIX}1zJpyW1fYxHKXDcDH9s5DiBCYiRpraDB3"
Loading