NTU Machine Learning (2018,Spring) Final Project

Freesound General-Purpose Audio Tagging Challenge

Team Member:

何適楷黃廉弼蔡仲閔陳致維

Dependency

This final project is dependent on Python

keras == 2.2.0
librosa == 0.6.1
numpy == 1.14.0
pandas == 0.22.0
matplotlib == 2.1.2
requests == 2.18.4
simplejson == 3.15.0
scikit-learn == 0.19.1

Usage

Folder and Dataset

To start this project we have to download dataset from Kaggle and put those file to ./input/audio_train and ./input/audio_test .

.
├── ...
├── src                    
│   ├── train.sh              # training scripts
│   ├── predict.sh            # predicting scripts
│   └── data_gen.py           # data_generator
│   └── ...
├── input 
│   ├── train.csv             # training list
│   ├── sample_submission.csv # testing list
│   ├── audio_train           # Folder contains train data (wav file)
│   ├── audio_test            # Folder contains test data (wav file)
|   └── mfcc                  # output folder for mfcc_test.npy
└── ...

Data Preprocessing (Data Generator)

To use the data generator to append the dataset, do:

if only for predict:

python3 ./final/src/data_gen.py --test_only 1

for train and predict:

# To use the data generator you should decide 
# 2 paraneters : strech, num
python3 ./final/src/data_gen.py --strech 1.1 --num 5

Training

To train a model, do:

# To use the train you should give the model you want to use:
cd ./final/src && bash train.sh 1d_conv
cd ./final/src && bash train.sh 2d_mfcc

Ensemble & Predict

cd ./final/src && bash predict.sh

whitch will output 1d_2d_ensembled_submission.csv in the same directory

Crawler the Rank on Kaggle

To check the rank in NTU, we write a python crawler:

python3 ./final/ranking.py

Model

1D Convolution

我們將音訊檔取sample之後直接將Raw Data餵進1D-CNN Model，中間經過多層架構並有MaxPooling提升訓練速度。而這樣的架構相當簡單，但缺點就是訓練時間非常長，因為我們並沒有對音訊檔做預處理，因此Data size極大，需透過data_generator每次load進Model中訓練，而file I/O次數變大幅增加，也拖垮整體訓練速度。

	epoch	learning_rate	optimizer	loss_function	activation_function
1D Convolution	50	0.0001	Adam	categorical_crossentropy	relu

2D Convolution on MFCC

在這裡則是較常見的做法，我們先將data透過MFCC做預處理在在將它餵進2D-CNN Model中，此時就接近一個我們熟悉的圖像辨識問題。而好處在於透過MFCC處理的資料不但Size較小且較接近人耳辨識聲音的方式，因此在這個模型上可以得到相當不錯的結果。

	epoch	learning_rate	optimizer	loss_function	activation_function
2D Convolution	50	0.0001	Adam	categorical_crossentropy	relu

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
data		data
data_gen		data_gen
img		img
input		input
other_code		other_code
src		src
test		test
.gitignore		.gitignore
README.md		README.md
Report.md		Report.md
Report.pdf		Report.pdf
ranking.py		ranking.py
ranking_TV.py		ranking_TV.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NTU Machine Learning (2018,Spring) Final Project

Team Member:

Dependency

Usage

Folder and Dataset

Data Preprocessing (Data Generator)

Training

Ensemble & Predict

Crawler the Rank on Kaggle

Model

1D Convolution

2D Convolution on MFCC

Reference

About

Releases

Packages

Contributors 4

Languages

qa276390/NTU-ML2018-Audio-Tagging-Challenge

Folders and files

Latest commit

History

Repository files navigation

NTU Machine Learning (2018,Spring) Final Project

Team Member:

Dependency

Usage

Folder and Dataset

Data Preprocessing (Data Generator)

Training

Ensemble & Predict

Crawler the Rank on Kaggle

Model

1D Convolution

2D Convolution on MFCC

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages