Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid extra chunk in speech recognition #29539

Merged
merged 1 commit into from
May 22, 2024

Conversation

jonatanklosko
Copy link
Contributor

I was confused by this conditional before, but now I'm revisiting some logic and I am more convinced that it is not necessary.

I think it's clear if we look at the test I changed. Consider we use chunk length 100, left context 20, right context 10. If the input has length 100, the current logic returns two chunks with lengths 100 and 30 respectively. However, the input fits perfectly as a single chunk, so I don't see a reason why using two chunks would be helpful. Now, the conditional really only makes an off-by-1 distinction, so if the input had length 99, the current logic does return a single chunk (which makes sense). From my understanding, it only makes sense to do two chunks if the input has length 101, since it inherently does not fit.

For more context see the PR that introduces it #21612. I believe the actual fix (for the linked issue) in that PR is the missing if is_last: break. I'm guessing the condition was introduces to make the existing test pass, and I think it's the test that was wrong.

I run RUN_SLOW=1 pytest tests/pipelines/test_pipelines_automatic_speech_recognition.py locally and it passed.

cc @ArthurZucker

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

The change looks reasonable to me, but let's get a second review from @sanchit-gandhi to confirm the desired behaviour here

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, did you run the slow whisper tests? Pipeline + model? 🤗

@jonatanklosko
Copy link
Contributor Author

@ArthurZucker I run the slow pipeline tests with RUN_SLOW=1 pytest tests/pipelines/test_pipelines_automatic_speech_recognition.py, since the change is specific to that pipeline. If there are other relevant tests let me know which :)

@ArthurZucker
Copy link
Collaborator

ArthurZucker commented Mar 25, 2024

If you can run the whisper slow test would be amazing! RUN_SLOW=1 pytest tests/models/whisper.
Are you running this on GPU?

@jonatanklosko
Copy link
Contributor Author

I'm running on mac CPU. The whisper tests took a while, there is a number of failures, but I don't expect any of them to be related, perhaps that's because the assertions are for GPU results and sometimes there are precision differences?

@ArthurZucker
Copy link
Collaborator

I'll run them on GPU just to be sure!

@jonatanklosko
Copy link
Contributor Author

Thanks! FTR I now run on main and got the same failures :)

@ArthurZucker
Copy link
Collaborator

Could you also rebase on the main branch I have a lot of failing tests:

FAILED tests/models/whisper/test_feature_extraction_whisper.py::WhisperFeatureExtractionTest::test_numpy_integration - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_feature_extraction_whisper.py::WhisperFeatureExtractionTest::test_torch_integration - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_feature_extraction_whisper.py::WhisperFeatureExtractionTest::test_zero_mean_unit_variance_normalization_trunc_np_longest - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_from_config - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_generate_left_padding - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_generate_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_generate_use_cache - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_inference - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_inference_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_generate_longform_with_prompt_ids - IndexError: index -1 is out of bounds for dimension 0 with size 0
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_generate_with_prompt_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_generate_with_prompt_ids_and_forced_decoder_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_generate_with_prompt_ids_and_no_non_prompt_forced_decoder_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_language_detection - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_batched_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_generation_multilingual - huggingface_hub.utils._errors.GatedRepoError: 403 Client Error. (Request ID: Root=1-660397cf-42f5812e3b4c73a62732db88;cde84af6-e519-411b-83f4-394a7f8b638d)
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_logits_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_small_en_logits_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_speculative_decoding_distil - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_speculative_decoding_non_distil - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_en_batched_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_en_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_logits_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_specaugment_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_timestamp_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_token_timestamp_batch_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_token_timestamp_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_token_timestamp_generation_longform - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch_hard - ImportError: To support decoding audio files, please install 'librosa' and 'soundfile'.
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch_hard_prev_cond - ImportError: To support decoding audio files, please install 'librosa' and 'soundfile'.
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch_prev_cond - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_prompt_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_single_batch - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_single_batch_prev_cond - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperEncoderModelTest::test_flash_attn_2_inference - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperEncoderModelTest::test_flash_attn_2_inference_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_from_config - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_generate_left_padding - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_generate_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_generate_use_cache - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_inference - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_inference_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_model_parallel_beam_search - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_model_parallelism - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
========================================================================================== 47 failed, 419 passed, 236 skipped, 163 warnings in 698.76s (0:11:38) ===========================================================================================

@jonatanklosko
Copy link
Contributor Author

Done!

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@jonatanklosko
Copy link
Contributor Author

Up :)

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@jonatanklosko
Copy link
Contributor Author

@ArthurZucker kindly ping :D

Copy link
Contributor

@sanchit-gandhi sanchit-gandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, thanks @jonatanklosko! Note that this PR only changes the pipeline, not the Whisper model class. So given @jonatanklosko has confirmed these slow tests pass, this is good to merge!

@sanchit-gandhi sanchit-gandhi merged commit 1518508 into huggingface:main May 22, 2024
17 checks passed
@jonatanklosko jonatanklosko deleted the jk-whisper-chunking branch May 22, 2024 13:30
itazap pushed a commit that referenced this pull request May 24, 2024
itazap pushed a commit that referenced this pull request May 30, 2024
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants