Avoid extra chunk in speech recognition #29539

jonatanklosko · 2024-03-08T15:26:16Z

I was confused by this conditional before, but now I'm revisiting some logic and I am more convinced that it is not necessary.

I think it's clear if we look at the test I changed. Consider we use chunk length 100, left context 20, right context 10. If the input has length 100, the current logic returns two chunks with lengths 100 and 30 respectively. However, the input fits perfectly as a single chunk, so I don't see a reason why using two chunks would be helpful. Now, the conditional really only makes an off-by-1 distinction, so if the input had length 99, the current logic does return a single chunk (which makes sense). From my understanding, it only makes sense to do two chunks if the input has length 101, since it inherently does not fit.

For more context see the PR that introduces it #21612. I believe the actual fix (for the linked issue) in that PR is the missing if is_last: break. I'm guessing the condition was introduces to make the existing test pass, and I think it's the test that was wrong.

I run RUN_SLOW=1 pytest tests/pipelines/test_pipelines_automatic_speech_recognition.py locally and it passed.

cc @ArthurZucker

amyeroberts

Thanks for working on this!

The change looks reasonable to me, but let's get a second review from @sanchit-gandhi to confirm the desired behaviour here

ArthurZucker

LGTM, did you run the slow whisper tests? Pipeline + model? 🤗

jonatanklosko · 2024-03-25T07:44:37Z

@ArthurZucker I run the slow pipeline tests with RUN_SLOW=1 pytest tests/pipelines/test_pipelines_automatic_speech_recognition.py, since the change is specific to that pipeline. If there are other relevant tests let me know which :)

ArthurZucker · 2024-03-25T09:47:33Z

If you can run the whisper slow test would be amazing! RUN_SLOW=1 pytest tests/models/whisper.
Are you running this on GPU?

jonatanklosko · 2024-03-25T11:31:47Z

I'm running on mac CPU. The whisper tests took a while, there is a number of failures, but I don't expect any of them to be related, perhaps that's because the assertions are for GPU results and sometimes there are precision differences?

ArthurZucker · 2024-03-25T11:39:34Z

I'll run them on GPU just to be sure!

jonatanklosko · 2024-03-25T12:21:35Z

Thanks! FTR I now run on main and got the same failures :)

ArthurZucker · 2024-03-27T04:06:49Z

Could you also rebase on the main branch I have a lot of failing tests:

FAILED tests/models/whisper/test_feature_extraction_whisper.py::WhisperFeatureExtractionTest::test_numpy_integration - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_feature_extraction_whisper.py::WhisperFeatureExtractionTest::test_torch_integration - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_feature_extraction_whisper.py::WhisperFeatureExtractionTest::test_zero_mean_unit_variance_normalization_trunc_np_longest - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_from_config - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_generate_left_padding - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_generate_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_generate_use_cache - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_inference - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flash_attn_2_inference_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_generate_longform_with_prompt_ids - IndexError: index -1 is out of bounds for dimension 0 with size 0
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_generate_with_prompt_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_generate_with_prompt_ids_and_forced_decoder_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_generate_with_prompt_ids_and_no_non_prompt_forced_decoder_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_language_detection - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_batched_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_generation_multilingual - huggingface_hub.utils._errors.GatedRepoError: 403 Client Error. (Request ID: Root=1-660397cf-42f5812e3b4c73a62732db88;cde84af6-e519-411b-83f4-394a7f8b638d)
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_large_logits_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_small_en_logits_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_speculative_decoding_distil - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_speculative_decoding_non_distil - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_en_batched_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_en_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_logits_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_specaugment_librispeech - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_timestamp_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_token_timestamp_batch_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_token_timestamp_generation - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_tiny_token_timestamp_generation_longform - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch_hard - ImportError: To support decoding audio files, please install 'librosa' and 'soundfile'.
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch_hard_prev_cond - ImportError: To support decoding audio files, please install 'librosa' and 'soundfile'.
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_multi_batch_prev_cond - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_prompt_ids - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_single_batch - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelIntegrationTests::test_whisper_longform_single_batch_prev_cond - datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperEncoderModelTest::test_flash_attn_2_inference - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperEncoderModelTest::test_flash_attn_2_inference_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_from_config - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_generate_left_padding - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_generate_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_generate_use_cache - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_inference - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_flash_attn_2_inference_padding_right - RuntimeError: FlashAttention only support fp16 and bf16 data type
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_model_parallel_beam_search - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_model_parallelism - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
========================================================================================== 47 failed, 419 passed, 236 skipped, 163 warnings in 698.76s (0:11:38) ===========================================================================================

jonatanklosko · 2024-03-27T04:47:25Z

Done!

github-actions · 2024-04-20T08:03:45Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

jonatanklosko · 2024-04-20T18:03:24Z

Up :)

github-actions · 2024-05-15T08:04:50Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

jonatanklosko · 2024-05-15T08:26:54Z

@ArthurZucker kindly ping :D

sanchit-gandhi

Awesome work, thanks @jonatanklosko! Note that this PR only changes the pipeline, not the Whisper model class. So given @jonatanklosko has confirmed these slow tests pass, this is good to merge!

amyeroberts reviewed Mar 8, 2024

View reviewed changes

ArthurZucker approved these changes Mar 25, 2024

View reviewed changes

Avoid extra chunk in speech recognition

a7ccb75

jonatanklosko force-pushed the jk-whisper-chunking branch from 697eeb4 to a7ccb75 Compare March 27, 2024 04:47

sanchit-gandhi approved these changes May 22, 2024

View reviewed changes

sanchit-gandhi merged commit 1518508 into huggingface:main May 22, 2024
17 checks passed

jonatanklosko deleted the jk-whisper-chunking branch May 22, 2024 13:30

itazap pushed a commit that referenced this pull request May 24, 2024

Avoid extra chunk in speech recognition (#29539)

f4a9c42

itazap pushed a commit that referenced this pull request May 30, 2024

Avoid extra chunk in speech recognition (#29539)

c337d55

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Jun 11, 2024

Avoid extra chunk in speech recognition (huggingface#29539)

bbfc16c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid extra chunk in speech recognition #29539

Avoid extra chunk in speech recognition #29539

jonatanklosko commented Mar 8, 2024

amyeroberts left a comment

ArthurZucker left a comment

jonatanklosko commented Mar 25, 2024

ArthurZucker commented Mar 25, 2024 •

edited

Loading

jonatanklosko commented Mar 25, 2024

ArthurZucker commented Mar 25, 2024

jonatanklosko commented Mar 25, 2024

ArthurZucker commented Mar 27, 2024

jonatanklosko commented Mar 27, 2024

github-actions bot commented Apr 20, 2024

jonatanklosko commented Apr 20, 2024

github-actions bot commented May 15, 2024

jonatanklosko commented May 15, 2024

sanchit-gandhi left a comment •

edited

Loading

Avoid extra chunk in speech recognition #29539

Avoid extra chunk in speech recognition #29539

Conversation

jonatanklosko commented Mar 8, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

jonatanklosko commented Mar 25, 2024

ArthurZucker commented Mar 25, 2024 • edited Loading

jonatanklosko commented Mar 25, 2024

ArthurZucker commented Mar 25, 2024

jonatanklosko commented Mar 25, 2024

ArthurZucker commented Mar 27, 2024

jonatanklosko commented Mar 27, 2024

github-actions bot commented Apr 20, 2024

jonatanklosko commented Apr 20, 2024

github-actions bot commented May 15, 2024

jonatanklosko commented May 15, 2024

sanchit-gandhi left a comment • edited Loading

Choose a reason for hiding this comment

ArthurZucker commented Mar 25, 2024 •

edited

Loading

sanchit-gandhi left a comment •

edited

Loading