Audio Pattern Discovery

Pattern Discovery In Audio Collections in Rust.

The program will extract interesting regions from wav files and then cluster them using hierarchical clustering under dynamic time warping. Below we see some extracted and clustered dolphin whistles.

Method

From each file we extract the cepstrum [1] in the following manner:

1. Extract Sliding Window
1. Compute DFT for each window
1. Convolve DFT with triangular window with a stride of half the filter
1. Compute log of filtered window
1. Compute Cepstrum by computing the discrete cosine transform

The parameters needed so far are:

dft window
dft step
triangular window size

We then find slices where something interesting happens:

1. For each cepstrum frame compute its variance
1. Smooth the variances in each sequence using a moving average
1. Extract long sequences of high variances

The parameters needed for the interesting detector are:

percentile of variance to find variance threshold
min size of subsequence

Now we can also reduce the dimensionality further, by adding an auto encoder. The one used here only has one hidden layer.

We then cluster all sequences using dynamic time warping window. The window can be restricted by a Sakoe-Chiba band [2]. Furthermore, we can weigh the errors INSERTION, DELETION and MATCH with separate weights [3]. We also stop clustering using a threshold estimated by a percentage.

We cluster using agglomerative clustering with average linkage also known as UPGMA[4].

After this we generate an audio file for each cluster which contains all instances of the cluster. A latex document with the dendrograms of the clusterin and a classification experiment showing that the models for each cluster model the data. The output of the tool is summarised in a result html page.

Usage

In order to generate the report and all the clusters run:

./generate_report.sh FOLDER

The folder should contain wav files, it will be searched recursively. In order to configure the program use the file in project/config. In order to change the latex templates use the project/templates folder.

Source Code

audio.rs Read and Write Audio
discovery.rs Discovery Parameters
main.rs Tying it all together
reporting.rs Latex/HTML/GraphViz templating
alignments.rs DTW code with back tracking and alignment path information
clustering.rs Hierarchical Clustering
numerics.rs All numerics methods
spectrogram.rs Implements spectrogram and slicing
neural.rs Implements a one layer autoencoder

Output Folder

The results will be generated in the output folder:

result.html Summary of output with all links to the tool
log.txt Will show the logs of the run
img Holds all image files, including the tikz files for the dendrograms and the png files for the spectrograms
encoder Binary dump of the auto encoder
docs Will contain the final pdf with all images and the log
audio Includes all interesting regions and clusters as wav files

Requirements

Latex
Rust and Cargo

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
project		project
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md
generate_report.sh		generate_report.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Pattern Discovery

Method

Usage

Source Code

Output Folder

Requirements

Reference

About

Releases

Packages

Languages

dkohlsdorf/audio_pattern_discovery

Folders and files

Latest commit

History

Repository files navigation

Audio Pattern Discovery

Method

Usage

Source Code

Output Folder

Requirements

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages