This repo was created to separate two speakers from a telephone recording.
If your telephone recording has more than two speakers, I can't guarantee that my method will work.
In addition to this one, to get good result, please try to make sure that different speakers have the same length of speech.
1. Split a wave to audio clips by remove mute
2. Count all clips' id-vector use pre-trained speaker recognition model
3. Use K-means to cluster all clips' id-vector when K=2
1. The pre-trained speaker recognition model from WeidiXie's repo VGG-Speaker-Recognition. Thanks for the open source!
2. Because my method looks like a non-supervised method, so you can try supervised method even end2end. You can get more information about speaker diarization from Here