Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add diarization recipe v3 #347

Merged
merged 13 commits into from
Aug 20, 2024
Merged

Add diarization recipe v3 #347

merged 13 commits into from
Aug 20, 2024

Conversation

xx205
Copy link
Collaborator

@xx205 xx205 commented Aug 11, 2024

Add diarization recipe v3 for voxconverse dataset.

Highlights

  • update silero-vad to v5.1 from v3.1
  • new diarization method umap+hdbscan

Results

  • Dev set

    system MISS FA SC DER
    This repo (with oracle SAD) 2.3 0.0 1.3 3.6
    This repo (with system SAD) 3.4 0.6 1.4 5.4
    DIHARD 2019 baseline 1 11.1 1.4 11.3 23.8
    DIHARD 2019 baseline w/ SE 1 9.3 1.3 9.7 20.2
    (SyncNet ASD only) 1 2.2 4.1 4.0 10.4
    (AVSE ASD only) 1 2.0 5.9 4.6 12.4
    (proposed) 1 2.4 2.3 3.0 7.7
  • Test set

    system MISS FA SC DER
    This repo (with oracle SAD) 1.6 0.0 1.9 3.5
    This repo (with system SAD) 3.8 1.7 1.8 7.4

Footnotes

  1. Spot the conversation: speaker diarisation in the wild, https://arxiv.org/pdf/2007.01216.pdf 2 3 4 5

@xx205 xx205 requested review from wsstriving and czy97 August 12, 2024 16:27
@czy97
Copy link
Collaborator

czy97 commented Aug 19, 2024

The news part should be updated
image

@czy97
Copy link
Collaborator

czy97 commented Aug 19, 2024

I think it is better to link the local directory and path.sh file directly if we reuse them.

Copy link
Collaborator

@czy97 czy97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well Done!

* Refer to [voxceleb sv recipe](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxceleb/v2)
* [pretrained model path](https://wespeaker-1256283475.cos.ap-shanghai.myqcloud.com/models/voxceleb/voxceleb_resnet34_LM.onnx)
* Speaker activity detection model: oracle SAD (from ground truth annotation) or system SAD (VAD model pretrained by silero, https://github.com/snakers4/silero-vad)
* Clustering method: spectral clustering
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clustering method should be umap + dbscan?

@@ -29,7 +29,7 @@
from wespeaker.cli.utils import get_args
from wespeaker.models.speaker_model import get_speaker_model
from wespeaker.utils.checkpoint import load_checkpoint
from wespeaker.diar.spectral_clusterer import cluster
from wespeaker.diar.umap_clusterer import cluster
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JiJiJiang I am not sure whether we should change the client script.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just keep it as the better one.


import torch
import silero_vad
from wespeaker.utils.file_utils import read_scp


def get_args():
parser = argparse.ArgumentParser(description='')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also edit the v1 and v2 version, if we change the arguments of this script?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, also update the results if change into silero vad v5.1.

@czy97 czy97 merged commit 5ac089e into wenet-e2e:master Aug 20, 2024
4 checks passed
@xx205 xx205 deleted the voxconverse_v3 branch August 20, 2024 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants