review #1

Lallapallooza · 2023-12-18T17:58:02Z

No description provided.

Lallapallooza · 2023-12-18T18:00:22Z

Dockerfile

@@ -0,0 +1,25 @@
+FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04


зачем эти приседания с установкой куднн руками? можно взять готовый образ с куднн, например
nvcr.io/nvidia/tensorrt:22.08-py3

можно поискать более подходящий

Lallapallooza · 2023-12-18T18:01:15Z

run_docker.sh

@@ -0,0 +1,9 @@
+#!/bin/bash
+
+app=$PWD


Lallapallooza · 2023-12-18T18:16:58Z

separator/config/config.py

нужны комменты к каждому полю, вот пример как гпт4 сгенерил, поправьте чтоб получше было, т.е. оставь группировку

DATA OPTIONS

MODEL OPTIONS

TRAIN OPTIONS

AUGMENTATION OPTIONS

ну и убери избыточность в комментах

там где будут идти строчки в стиле

# Section param1 : VeryLongType1 = "value1" # doc1 long_param_name : Type1 = "value1" # doc2

форматнуть как

# Section param1 : VeryLongType1 = "value1" # doc1 long_param_name : Type1 = "value1" # doc2

@dataclass class TrainConfig: # The computing platform for training: 'cuda' for NVIDIA GPUs or 'cpu' for CPU-based training. device: str = "cuda" # Directory path where the MUSDB18-HQ dataset is stored or to be downloaded. musdb_path: str = "musdb18hq" # Directory path for saving training metadata, like track names and lengths. metadata_train_path: str = "metadata" # Directory path for saving testing metadata. metadata_test_path: str = "metadata1" # Length (in seconds) of each audio segment used during training. Shorter segments can help in learning finer details. segment: int = 5 # Batch size for training, determining how many samples are processed together. Affects memory usage and gradient updates. batch_size: int = 6 # Whether to shuffle the training dataset at the beginning of each epoch. Shuffling can help in reducing overfitting. shuffle_train: bool = True # Whether to shuffle the validation dataset. Generally not needed as validation performance is independent of data order. shuffle_valid: bool = False # Whether to drop the last incomplete batch in case the dataset size isn't divisible by the batch size. drop_last: bool = True # Number of worker processes used for loading data. Higher numbers can speed up data loading. num_workers: int = 2 # Strategy for monitoring metrics to save model checkpoints: 'min' for saving when a monitored metric is minimized. metric_monitor_mode: str = "min" # Number of best-performing model weights to save based on the monitored metric. save_top_k_model_weights: int = 1 # PM_Unet model configurations # Specific sources to target in source separation, e.g., separating drums, bass, etc. model_source: tuple = ("drums", "bass", "other", "vocals") # The depth of the U-Net architecture, affecting its capacity and complexity. model_depth: int = 4 # Number of initial channels in U-Net layers, influencing the model's learning capability. model_channel: int = 28 # Indicates whether the input audio should be treated as mono (True) or stereo (False). is_mono: bool = False # Whether to utilize masking within the model for source separation. mask_mode: bool = False # Mode of skip connections in U-Net ('concat' for concatenation, 'add' for summation). skip_mode: str = "concat" # Number of bins used in Fast Fourier Transform (FFT) during audio processing. nfft: int = 4096 # Determines whether to use LSTM layers as bottleneck in the U-Net architecture. bottlneck_lstm: bool = True # Number of LSTM layers if bottleneck is enabled, adding to the model's complexity and ability to capture temporal dependencies. layers: int = 2 # Flag to decide whether to apply Short Time Fourier Transform (STFT) on input audio. stft_flag: bool = True # Augmentation parameters # Maximum number of samples to shift in time during data augmentation, helping the model generalize over temporal variations. shift: int = 8192 # Probability of applying pitch shift augmentation to the audio, introducing pitch variability in training data. pitchshift_proba: float = 0.2 # Range of semitone shifts for pitch shift augmentation applied specifically to vocal tracks. vocals_min_semitones: int = -5 vocals_max_semitones: int = 5 # Range of semitone shifts for pitch shift augmentation applied to non-vocal tracks. other_min_semitones: int = -2 other_max_semitones: int = 2 # Flag to enable pitch shift augmentation on non-vocal sources. pitchshift_flag_other: bool = False # Probability of applying time stretching or compression to alter the speed of the audio without changing its pitch. time_change_proba: float = 0.2 # Factors for time stretching/compression, defining the range and intensity of this augmentation. time_change_factors: tuple = (0.8, 0.85, 0.9, 0.95, 1.05, 1.1, 1.15, 1.2, 1.25, 1.3) # Probability of remixing audio tracks, useful for training the model to recognize and separate blended sounds. remix_proba: float = 1 # Number of tracks to group together for the remix augmentation. remix_group_size: int = batch_size # Probability and range of scaling the amplitude of audio tracks, introducing dynamic range variability. scale_proba: float = 1 scale_min: float = 0.25 scale_max: float = 1.25 # Probability of applying a fade effect to the audio, simulating natural starts and ends of sounds. fade_mask_proba: float = 0.1 # Probability of doubling one channel's audio to both channels, simulating mono audio in a stereo setup. double_proba: float = 0.1 # Probability of reversing a segment of the audio track, useful for capturing reversed audio patterns. reverse_proba: float = 0.2 # Probability and depth of mixing tracks together to create mashups, aiding the model in learning from complex blends of sounds. mushap_proba: float = 0.0 mushap_depth: int = 2 # Loss function parameters # Multiplicative factors for different components of the loss function, affecting the emphasis on different aspects of the training objective. factor: int = 1 c_factor: int = 1 # Number of FFT bins for calculating loss, impacting the granularity of frequency-based loss computation. loss_nfft: tuple = (4096,) # Gamma parameter for adjusting the focus of the loss function on certain aspects of the audio spectrum. gamma: float = 0.3 # Learning rate for the optimizer, a crucial parameter for training convergence. lr: float = 0.5 * 3e-3 # Period of the cosine annealing schedule in learning rate adjustment, influencing the adaptation of learning rate over time. T_0: int = 40 # Maximum number of training epochs, defining the total duration of the training process. max_epochs: int = 100 # Precision of training computations, e.g., '16' for 16-bit floating-point precision, reducing memory usage and potentially speeding up training. precision: str = 16 # Gradient clipping value to prevent exploding gradients, aiding in training stability. grad_clip: float = 0.5

Lallapallooza · 2023-12-18T18:18:47Z

separator/data/dataset.py

+        channels=2,
+        ext=File.EXT,
+    ):
+        """


поправить на

""" A dataset class for audio source separation, compatible with WAV (or MP3) files. This class allows training with arbitrary sources, where each audio track is represented by a separate folder within the specified root directory. Each folder should contain audio files for different sources, named as `{source}.{ext}`. Args: root (Path or str): The root directory of the dataset where audio tracks are stored. metadata (dict): Metadata information generated by the `build_metadata` function. It contains details like track names and lengths. sources (list[str]): A list of source names to be separated, e.g., ['drums', 'vocals']. segment (Optional[float]): The length of each audio segment in seconds. If `None`, the entire track is used. shift (Optional[float]): The stride in seconds between samples. Determines the overlap between consecutive audio segments. normalize (bool): If True, normalizes the input audio based on the entire track's statistics, not just individual segments. samplerate (int): The target sample rate. Audio files with a different sample rate will be resampled to this rate. channels (int): The target number of audio channels. If different, audio will be converted accordingly. ext (str): The file extension of the audio files, default is '.wav'. Note: The `samplerate` and `channels` parameters are used to ensure consistency across the dataset. They allow on-the-fly conversion of audio properties to match the target specifications. """

Lallapallooza · 2023-12-18T18:20:09Z

separator/inference.py

+
+    def resolve_weigths(self):
+        if self.model_bottlneck_lstm:
+            self.weights_path = self.config.weights_dir / "weight_LSTM.pt"


плохо хардкодить имена весов

Lallapallooza · 2023-12-18T19:22:41Z

streaming/converter.py

+    )
+
+    if model.bottlneck_lstm:
+        weights_path = config.weights_dir / "weight_LSTM.pt"


хардкод имен весов плохо, передавай параметром

Lallapallooza · 2023-12-18T19:23:34Z

streaming/converter.py

+        def istft(self, z):
+            return self.model.stft.istft(z, self.length_wave)
+
+    SEGMENT_WAVE = 44100


хардкод плохо, передавай параметром

Lallapallooza · 2023-12-18T19:23:45Z

streaming/converter.py

+    )
+
+    model_path = str(
+        args.out_dir + f"/{args.class_name}_outer_stft_{SEGMENT_WAVE / 44100:.1f}"


хардкод плохо, передавай параметром

model_filename = f"{args.class_name}_outer_stft_{config.segment_duration:.1f}" model_path = args.out_dir + '/' + model_filename

ok?

Lallapallooza · 2023-12-18T19:24:03Z

streaming/runner.py

+
+    if start_converter:
+        subprocess.Popen(
+            ["python3", "/app/converter.py"], executable="/bin/bash", shell=True


хардкод плохо, передавай параметром

Lallapallooza · 2023-12-18T19:24:36Z

streaming/stream_class.py

название на нормальное, tf_lite_stream.py

maks00170 and others added 30 commits October 10, 2023 00:08

Initial commit

fa276ca

init_commit

b5a193f

clean pl.py

3ceef8d

init stream demo

754051f

streaming: demo structure

30b21c5

streaming: IPython commenting

f3ce9a6

update

b65e32e

Update README.md

10c2578

add req

4e50daf

merge

1bb5043

del pl

9127466

Update README.md

9a95a9c

Update README.md

c3b6044

up readme

f266701

refactor dataset.py - flake8 and del distrib dep

2a1191c

refactor dataset.py - print -> logging

7f5bd64

update code: 1)add comments 2)black, 3)refactor code 4)test notebook

c63da0e

update Readme

fcc35f4

relative path -> absolute, some annotations [NTC]

be68670

separator: add config

c21557d

separator: update config

5e01862

separator: eval script, update config, gdown to requirements

3d096e9

separator: one download default sample

bc57088

separator: eval fix

97f6f10

streaming: raw converter script

6b64282

general: update README

510fff9

general: update README

a6a9506

streaming: add stream class

67cae64

separator: config,pl_model,readme update & delete test

cda2178

general: init dockerfile

60ab5a2

d-a-yakovlev and others added 14 commits December 18, 2023 01:51

streaming: converter script

06057a8

general: slightly updated readme

e08c9b1

general: add CI

163df3e

general: configuring CI 1

fe8d679

general: configuring CI 2

c063333

streaming: pushing all sources

2344f3d

streaming: take out deprecated 2

d18a30a

general: fix typos in readme

234c634

general: black apply

1639898

general: configuring CI 2

98df7c2

general: configuring CI 3

6b87c1d

general: configuring CI 4

8a328c6

streaming: take out trash 3

8a8e6d5

Merge branch 'main' into project

062f28b

Lallapallooza commented Dec 18, 2023

View reviewed changes

d-a-yakovlev and others added 14 commits December 29, 2023 16:22

refac: full - run_docker, almost - dataset, inference

def592a

refac: full - runner, stream_class -> tf_lite_stream, almost - converter

2f869eb

refac: runner

6cec29a

rafactor_separator: modules, PM_unet, pl_model,STFT

1fdc0da

black and refac config

f46b0e0

one del space config

d8ea2f8

refac stream config

d4c60de

ref comment config

08fefd1

refac: docker debugged 1

ae7eeea

refac: configuring CI 5

fd090eb

refac: inference output 1

20d2466

refac: inference output 2

0d7eb3b

refac: inference output 3

50c93aa

refac: fix comment

26b6b2d

maks00170 merged commit 3bea291 into review Jan 16, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

review #1

review #1

Lallapallooza commented Dec 18, 2023

Lallapallooza Dec 18, 2023

Lallapallooza Dec 18, 2023

Lallapallooza Dec 18, 2023

Lallapallooza Dec 18, 2023

Lallapallooza Dec 18, 2023

Lallapallooza Dec 18, 2023

Lallapallooza Dec 18, 2023

Lallapallooza Dec 18, 2023

d-a-yakovlev Dec 29, 2023 •

edited

Loading

Lallapallooza Dec 18, 2023

Lallapallooza Dec 18, 2023

		@@ -0,0 +1,9 @@
		#!/bin/bash

		app=$PWD

review #1

review #1

Conversation

Lallapallooza commented Dec 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

d-a-yakovlev Dec 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

d-a-yakovlev Dec 29, 2023 •

edited

Loading