-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid extra chunk in speech recognition #29539
Avoid extra chunk in speech recognition #29539
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
The change looks reasonable to me, but let's get a second review from @sanchit-gandhi to confirm the desired behaviour here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, did you run the slow whisper tests? Pipeline + model? 🤗
@ArthurZucker I run the slow pipeline tests with |
If you can run the |
I'm running on mac CPU. The whisper tests took a while, there is a number of failures, but I don't expect any of them to be related, perhaps that's because the assertions are for GPU results and sometimes there are precision differences? |
I'll run them on GPU just to be sure! |
Thanks! FTR I now run on main and got the same failures :) |
Could you also rebase on the main branch I have a lot of failing tests: FAILED tests/models/whisper/test_feature_extraction_whisper.py::WhisperFeatureExtractionTest::test_numpy_integration - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_feature_extraction_whisper.py::WhisperFeatureExtractionTest::test_torch_integration - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_feature_extraction_whisper.py::WhisperFeatureExtractionTest::test_zero_mean_unit_variance_normalization_trunc_np_longest - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_from_config - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_generate_left_padding - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_generate_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_generate_use_cache - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_inference - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_inference_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_generate_longform_with_prompt_ids - IndexError: index -1 is out of bounds for dimension 0 with size 0
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_generate_with_prompt_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_generate_with_prompt_ids_and_forced_decoder_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_generate_with_prompt_ids_and_no_non_prompt_forced_decoder_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_language_detection - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_batched_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_generation_multilingual - huggingface_hub.utils._errors.GatedRepoError: 403 Client Error. (Request ID: Root=1-660397cf-42f5812e3b4c73a62732db88;cde84af6-e519-411b-83f4-394a7f8b638d)
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_logits_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_small_en_logits_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_speculative_decoding_distil - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_speculative_decoding_non_distil - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_en_batched_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_en_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_logits_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_specaugment_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_timestamp_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_token_timestamp_batch_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_token_timestamp_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_token_timestamp_generation_longform - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch_hard - ImportError: To support decoding audio files, please install 'librosa' and 'soundfile'.
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch_hard_prev_cond - ImportError: To support decoding audio files, please install 'librosa' and 'soundfile'.
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch_prev_cond - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_prompt_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_single_batch - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_single_batch_prev_cond - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperEncoderModelTest::test_flash_attn_2_inference - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperEncoderModelTest::test_flash_attn_2_inference_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_from_config - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_generate_left_padding - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_generate_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_generate_use_cache - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_inference - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_inference_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_model_parallel_beam_search - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_model_parallelism - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
========================================================================================== 47 failed, 419 passed, 236 skipped, 163 warnings in 698.76s (0:11:38) =========================================================================================== |
697eeb4
to
a7ccb75
Compare
Done! |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Up :) |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
@ArthurZucker kindly ping :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work, thanks @jonatanklosko! Note that this PR only changes the pipeline, not the Whisper model class. So given @jonatanklosko has confirmed these slow tests pass, this is good to merge!
I was confused by this conditional before, but now I'm revisiting some logic and I am more convinced that it is not necessary.
I think it's clear if we look at the test I changed. Consider we use chunk length 100, left context 20, right context 10. If the input has length 100, the current logic returns two chunks with lengths 100 and 30 respectively. However, the input fits perfectly as a single chunk, so I don't see a reason why using two chunks would be helpful. Now, the conditional really only makes an off-by-1 distinction, so if the input had length 99, the current logic does return a single chunk (which makes sense). From my understanding, it only makes sense to do two chunks if the input has length 101, since it inherently does not fit.
For more context see the PR that introduces it #21612. I believe the actual fix (for the linked issue) in that PR is the missing
if is_last: break
. I'm guessing the condition was introduces to make the existing test pass, and I think it's the test that was wrong.I run
RUN_SLOW=1 pytest tests/pipelines/test_pipelines_automatic_speech_recognition.py
locally and it passed.cc @ArthurZucker