-
-
Notifications
You must be signed in to change notification settings - Fork 763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix instantaneous speaker count exceeding max_speakers
or detected number of clusters
#1351
Fix instantaneous speaker count exceeding max_speakers
or detected number of clusters
#1351
Conversation
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## develop #1351 +/- ##
===========================================
- Coverage 33.00% 32.79% -0.21%
===========================================
Files 65 65
Lines 4109 4135 +26
===========================================
Hits 1356 1356
- Misses 2753 2779 +26
☔ View full report in Codecov by Sentry. |
6e273e7
to
3124b28
Compare
3124b28
to
e289d18
Compare
e289d18
to
ddc19d5
Compare
ddc19d5
to
65f4a88
Compare
Lots is going on in this PR. Can we just focus on fixing |
@hbredin Okay, no problem, I will move the stuff regarding the k-means fallback into the separate branch. |
6b118f9
to
6391cd6
Compare
@hbredin I only left the first commit. Now the PR only does the following:
The other stuff (falling back to k-means when no good cluster assignment is found by agglomerative clustering) is indeed another part of the problem, I can make a separate PR for this in separate branch arguing why I think it is necessary later. |
Two options:
|
@hbredin good catch, went with the option 1. BTW, I am willing to provide some basic tests for applicability of basic pipelines to the test suite in some separate PR. Currently, the test coverage of the entire |
# quick sanity check | ||
assert ( | ||
num_different_speakers >= min_speakers | ||
and num_different_speakers <= max_speakers | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that assert
ever fail?
If it does, we should handle it or this will raise an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hbredin Good catch. With the removal of the k-means fallback which used to be included in this PR (and which I use in my application anyway), this can definitely fail, and surely will in some circumstances due to the min_cluster_size
parameter. The best we can do here I think is to issue a warning.
# during counting, we could possibly overcount the number of instantaneous | ||
# speakers due to segmentation errors, so we cap the maximum instantaneous number | ||
# of speakers by the number of detected clusters | ||
count.data = np.minimum(count.data, num_different_speakers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs further investigation on my side (e.g. to check that this change does not degrade the overall performance on my usual benchmarks) which might take some time...
Indeed, there might be cases where the embedding/clustering step would (wrongly) decide that there is only 1 speaker in the recording, while the segmentation step actually (correctly) detects 2 overlapping speakers for certain frames.
680438f
to
602d702
Compare
@hbredin I removed the assert. In the meanwhile, did you have time to run benchmarks? |
I just ran a few benchmarks (with neither That being said, I do think you have a point in the case where Also, I think having the change in both |
602d702
to
d49f5fe
Compare
@hbredin Sorry for taking too long to get back to this. Your last comment is accounted for. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments and a question.
44568f8
to
0d72a4e
Compare
d1be399
to
05a333c
Compare
@hbredin Sorry for such a large delay. Primary work was really heavy last couple of months. I have considered your last comments. |
🎉 Merged! Thanks @flyingleafe! |
The number of labels in the diarization result was based on the result of
speaker_count
method, which did not account formax_speakers
variable or the actual detected number of speakers after clusterization. This led to the appearance of additional erroneous speakers in the diarization result, so the output could have more labels than providedmax_speakers
or more labels than number of clusters detected by the clusterization algorithm. This PR fixes this.Additionally, the extra binarization step was removed from
speaker_count
method - now it accepts segmentations in already binarized form, so no extra binarization step in the pipeline is done.