Uniformize model processors #31368

molbap · 2024-06-11T10:05:12Z

What does this PR do?

Adds uniformized processors following #30511, #31197 in particular, and #31198 .

Adds support for:

molbap · 2024-09-24T10:16:03Z

@amyeroberts I finished digging through feature extractors for audio, not trivial, but seems it was missing something testwise. Now all seems to work 🧹 🧹

molbap · 2024-09-24T13:24:57Z

One slow test model fails because of #33678

amyeroberts · 2024-09-25T17:10:58Z

@molbap Re this comment - is this PR ready for review now?

molbap · 2024-09-26T07:15:53Z

It should! failing tests are unrelated and should be fixed in #33678

molbap · 2024-09-27T09:09:30Z

I fixed a concat bug I noticed in another PR, #33678 , where I fixed it for instructblip but it remained for blip_2 🫣

amyeroberts

Thanks! Just a few small comments

tests/test_processing_common.py

src/transformers/models/altclip/processing_altclip.py

src/transformers/models/blip_2/processing_blip_2.py

src/transformers/models/bridgetower/processing_bridgetower.py

src/transformers/models/donut/processing_donut.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

… into uniform_processors_3

HuggingFaceDocBuilderDev · 2024-10-01T09:13:10Z

Hey! 🤗 Thanks for your contribution to the transformers library!

Before merging this pull request, slow tests CI should be triggered. To enable this:

Add the run-slow label to the PR
When your PR is ready for merge and all reviewers' comments have been addressed, push an empty commit with the command [run-slow] followed by a comma separated list of all the models to be tested, i.e. [run_slow] model_to_test_1, model_to_test_2
- If the pull request affects a lot of models, put at most 10 models in the commit message
A transformers maintainer will then approve the workflow to start the tests

(For maintainers) The documentation for slow tests CI on PRs is here.

amyeroberts

Thanks!

…wav2vec2_bert

* add initial design for uniform processors + align model * add uniform processors for altclip + chinese_clip * add uniform processors for blip + blip2 * fix mutable default 👀 * add configuration test * handle structured kwargs w defaults + add test * protect torch-specific test * fix style * fix * rebase * update processor to generic kwargs + test * fix style * add sensible kwargs merge * update test * fix assertEqual * move kwargs merging to processing common * rework kwargs for type hinting * just get Unpack from extensions * run-slow[align] * handle kwargs passed as nested dict * add from_pretrained test for nested kwargs handling * [run-slow]align * update documentation + imports * update audio inputs * protect audio types, silly * try removing imports * make things simpler * simplerer * move out kwargs test to common mixin * [run-slow]align * skip tests for old processors * [run-slow]align, clip * !$#@!! protect imports, darn it * [run-slow]align, clip * [run-slow]align, clip * update common processor testing * add altclip * add chinese_clip * add pad_size * [run-slow]align, clip, chinese_clip, altclip * remove duplicated tests * fix * add blip, blip2, bridgetower Added tests for bridgetower which override common. Also modified common tests to force center cropping if existing * fix * update doc * improve documentation for default values * add model_max_length testing This parameter depends on tokenizers received. * Raise if kwargs are specified in two places * fix * removed copied from * match defaults * force padding * fix tokenizer test * clean defaults * move tests to common * add missing import * fix * adapt bridgetower tests to shortest edge * uniformize donut processor + tests * add wav2vec2 * extend common testing to audio processors * add testing + bert version * propagate common kwargs to different modalities * BC order of arguments * check py version * revert kwargs merging * add draft overlap test * update * fix blip2 and wav2vec due to updates * fix copies * ensure overlapping kwargs do not disappear * replace .pop by .get to handle duplicated kwargs * fix copies * fix missing import * add clearly wav2vec2_bert to uniformized models * fix copies * increase number of features * fix style * [run-slow] blip, blip2, bridgetower, donut, wav2vec2, wav2vec2_bert * [run-slow] blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert * fix concatenation * [run-slow] blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert * Update tests/test_processing_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * 🧹 * address comments * clean up + tests * [run-slow] instructblip, blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

molbap added 30 commits June 3, 2024 09:38

add initial design for uniform processors + align model

b85036f

add uniform processors for altclip + chinese_clip

1336931

add uniform processors for blip + blip2

691a298

fix mutable default 👀

bb8ac70

add configuration test

cd8c601

handle structured kwargs w defaults + add test

f00c852

protect torch-specific test

693036f

fix style

766da3a

fix

844394d

rebase

7d860a0

update processor to generic kwargs + test

7cb9925

fix style

ad4cbf7

add sensible kwargs merge

def56cd

update test

2e6b7e1

fix assertEqual

c19bbc6

move kwargs merging to processing common

3c38119

rework kwargs for type hinting

81ae819

just get Unpack from extensions

ce4abcd

run-slow[align]

3acdf28

handle kwargs passed as nested dict

404239f

add from_pretrained test for nested kwargs handling

603be40

[run-slow]align

71c9d6c

update documentation + imports

26383c5

update audio inputs

4521f4f

protect audio types, silly

b96eb64

try removing imports

9c5c01c

make things simpler

3ccb505

simplerer

142acf3

move out kwargs test to common mixin

60a5730

[run-slow]align

be6c141

molbap added 2 commits September 24, 2024 12:08

increase number of features

ce19cfb

fix style

138140b

molbap added the run-slow label Sep 24, 2024

[run-slow] blip, blip2, bridgetower, donut, wav2vec2, wav2vec2_bert

50b3e45

molbap mentioned this pull request Sep 24, 2024

Fix position embeddings singular/plural #33678

Merged

molbap added 4 commits September 27, 2024 10:11

Merge branch 'main' into uniform_processors_3

add0fd5

[run-slow] blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert

483af1e

fix concatenation

ebc9aea

[run-slow] blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert

c2268c6

molbap requested a review from amyeroberts September 27, 2024 09:17

amyeroberts reviewed Sep 30, 2024

View reviewed changes

molbap and others added 5 commits October 1, 2024 09:34

Update tests/test_processing_common.py

d5c107b

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

🧹

99f9a2e

Merge branch 'uniform_processors_3' of github.com:molbap/transformers…

7e39ca3

… into uniform_processors_3

address comments

acfc61d

clean up + tests

d80d48d

molbap requested a review from amyeroberts October 1, 2024 09:22

amyeroberts approved these changes Oct 1, 2024

View reviewed changes

[run-slow] instructblip, blip, blip_2, bridgetower, donut, wav2vec2, …

90aef58

…wav2vec2_bert

molbap merged commit 50290cf into huggingface:main Oct 2, 2024
36 checks passed

molbap deleted the uniform_processors_3 branch October 2, 2024 08:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uniformize model processors #31368

Uniformize model processors #31368

molbap commented Jun 11, 2024 •

edited

Loading

molbap commented Sep 24, 2024

molbap commented Sep 24, 2024

amyeroberts commented Sep 25, 2024

molbap commented Sep 26, 2024

molbap commented Sep 27, 2024

amyeroberts left a comment

HuggingFaceDocBuilderDev commented Oct 1, 2024

amyeroberts left a comment

Uniformize model processors #31368

Uniformize model processors #31368

Conversation

molbap commented Jun 11, 2024 • edited Loading

What does this PR do?

molbap commented Sep 24, 2024

molbap commented Sep 24, 2024

amyeroberts commented Sep 25, 2024

molbap commented Sep 26, 2024

molbap commented Sep 27, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Oct 1, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

molbap commented Jun 11, 2024 •

edited

Loading