Pattern Discovery In Audio Collections in Rust.
The program will extract interesting regions from wav files and then cluster them using hierarchical clustering under dynamic time warping. Below we see some extracted and clustered dolphin whistles.
From each file we extract the cepstrum [1] in the following manner:
-
- Extract Sliding Window
-
- Compute DFT for each window
-
- Convolve DFT with triangular window with a stride of half the filter
-
- Compute log of filtered window
-
- Compute Cepstrum by computing the discrete cosine transform
The parameters needed so far are:
- dft window
- dft step
- triangular window size
We then find slices where something interesting
happens:
-
- For each cepstrum frame compute its variance
-
- Smooth the variances in each sequence using a moving average
-
- Extract long sequences of high variances
The parameters needed for the interesting
detector are:
- percentile of variance to find variance threshold
- min size of subsequence
Now we can also reduce the dimensionality further, by adding an auto encoder. The one used here only has one hidden layer.
We then cluster all sequences using dynamic time warping window.
The window can be restricted by a Sakoe-Chiba
band [2]. Furthermore,
we can weigh the errors INSERTION
, DELETION
and MATCH
with
separate weights [3]. We also stop clustering using a threshold
estimated by a percentage.
We cluster using agglomerative clustering with average linkage also known as UPGMA[4].
After this we generate an audio file for each cluster which contains all instances of the cluster. A latex document with the dendrograms of the clusterin and a classification experiment showing that the models for each cluster model the data. The output of the tool is summarised in a result html page.
In order to generate the report and all the clusters run:
./generate_report.sh FOLDER
The folder should contain wav files, it will be searched recursively.
In order to configure the program use the file in project/config
.
In order to change the latex templates use the project/templates
folder.
audio.rs
Read and Write Audiodiscovery.rs
Discovery Parametersmain.rs
Tying it all togetherreporting.rs
Latex/HTML/GraphViz templatingalignments.rs
DTW code with back tracking and alignment path informationclustering.rs
Hierarchical Clusteringnumerics.rs
All numerics methodsspectrogram.rs
Implements spectrogram and slicingneural.rs
Implements a one layer autoencoder
The results will be generated in the output folder:
result.html
Summary of output with all links to the toollog.txt
Will show the logs of the runimg
Holds all image files, including the tikz files for the dendrograms and the png files for the spectrogramsencoder
Binary dump of the auto encoderdocs
Will contain the final pdf with all images and the logaudio
Includes all interesting regions and clusters as wav files
- Latex
- Rust and Cargo