Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uniformize model processors #31368

Merged
merged 107 commits into from
Oct 2, 2024
Merged
Show file tree
Hide file tree
Changes from 101 commits
Commits
Show all changes
107 commits
Select commit Hold shift + click to select a range
b85036f
add initial design for uniform processors + align model
molbap Jun 3, 2024
1336931
add uniform processors for altclip + chinese_clip
molbap Jun 3, 2024
691a298
add uniform processors for blip + blip2
molbap Jun 3, 2024
bb8ac70
fix mutable default :eyes:
molbap Jun 3, 2024
cd8c601
add configuration test
molbap Jun 3, 2024
f00c852
handle structured kwargs w defaults + add test
molbap Jun 3, 2024
693036f
protect torch-specific test
molbap Jun 3, 2024
766da3a
fix style
molbap Jun 3, 2024
844394d
fix
molbap Jun 3, 2024
7d860a0
rebase
molbap Jun 3, 2024
7cb9925
update processor to generic kwargs + test
molbap Jun 3, 2024
ad4cbf7
fix style
molbap Jun 3, 2024
def56cd
add sensible kwargs merge
molbap Jun 3, 2024
2e6b7e1
update test
molbap Jun 3, 2024
c19bbc6
fix assertEqual
molbap Jun 4, 2024
3c38119
move kwargs merging to processing common
molbap Jun 4, 2024
81ae819
rework kwargs for type hinting
molbap Jun 5, 2024
ce4abcd
just get Unpack from extensions
molbap Jun 7, 2024
3acdf28
run-slow[align]
molbap Jun 7, 2024
404239f
handle kwargs passed as nested dict
molbap Jun 7, 2024
603be40
add from_pretrained test for nested kwargs handling
molbap Jun 7, 2024
71c9d6c
[run-slow]align
molbap Jun 7, 2024
26383c5
update documentation + imports
molbap Jun 7, 2024
4521f4f
update audio inputs
molbap Jun 7, 2024
b96eb64
protect audio types, silly
molbap Jun 7, 2024
9c5c01c
try removing imports
molbap Jun 7, 2024
3ccb505
make things simpler
molbap Jun 7, 2024
142acf3
simplerer
molbap Jun 7, 2024
60a5730
move out kwargs test to common mixin
molbap Jun 10, 2024
be6c141
[run-slow]align
molbap Jun 10, 2024
84135d7
skip tests for old processors
molbap Jun 10, 2024
ce967ac
[run-slow]align, clip
molbap Jun 10, 2024
f78ec52
!$#@!! protect imports, darn it
molbap Jun 10, 2024
52fd5ad
[run-slow]align, clip
molbap Jun 10, 2024
8f21abe
Merge branch 'main' into uniform_processors_1
molbap Jun 10, 2024
d510030
[run-slow]align, clip
molbap Jun 10, 2024
b2f0336
fix conflicts
molbap Jun 10, 2024
40c8a0b
update common processor testing
molbap Jun 10, 2024
2e19860
add altclip
molbap Jun 10, 2024
06b7ae2
add chinese_clip
molbap Jun 10, 2024
2e58518
add pad_size
molbap Jun 10, 2024
aa7a68c
[run-slow]align, clip, chinese_clip, altclip
molbap Jun 10, 2024
f0ca955
remove duplicated tests
molbap Jun 10, 2024
7f61246
fix
molbap Jun 10, 2024
0372def
Merge branch 'uniform_processors_2' into uniform_processors_3
molbap Jun 11, 2024
a8249e7
add blip, blip2, bridgetower
molbap Jun 11, 2024
c283836
fix
molbap Jun 11, 2024
fd43bcd
update doc
molbap Jun 11, 2024
b2cd7c9
improve documentation for default values
molbap Jun 11, 2024
bcbd646
add model_max_length testing
molbap Jun 11, 2024
39c1587
Raise if kwargs are specified in two places
molbap Jun 11, 2024
1f73bdf
fix
molbap Jun 11, 2024
934e612
Merge branch 'uniform_processors_1' into uniform_processors_2
molbap Jun 11, 2024
7372b53
Merge branch 'uniform_processors_1' into uniform_processors_3
molbap Jun 11, 2024
41c2e2a
Merge branch 'uniform_processors_2' into uniform_processors_3
molbap Jun 11, 2024
b3f98ba
Merge branch 'main' into uniform_processors_1
molbap Jun 11, 2024
bd7e745
Merge branch 'uniform_processors_1' into uniform_processors_3
molbap Jun 11, 2024
0411b79
removed copied from
molbap Jun 11, 2024
ee57813
Merge branch 'main' into uniform_processors_2
molbap Jun 11, 2024
bab441f
Merge branch 'main' into uniform_processors_2
molbap Jun 14, 2024
4fd60cf
match defaults
molbap Jun 17, 2024
34d0b61
force padding
molbap Jun 17, 2024
10d727b
fix tokenizer test
molbap Jun 17, 2024
986ed9f
clean defaults
molbap Jun 17, 2024
3c265d1
move tests to common
molbap Jun 17, 2024
e71a7f5
Merge branch 'main' into uniform_processors_3
molbap Jun 17, 2024
962ddb5
merge + add pad token if not found
molbap Jun 17, 2024
dc56fc6
add missing import
molbap Jun 17, 2024
f2388f7
fix
molbap Jun 17, 2024
2660c7a
adapt bridgetower tests to shortest edge
molbap Jun 17, 2024
45dd38f
uniformize donut processor + tests
molbap Jun 17, 2024
c327925
add wav2vec2
molbap Jun 18, 2024
a777253
extend common testing to audio processors
molbap Jun 18, 2024
a3ab5bd
add testing + bert version
molbap Jun 18, 2024
17e18c5
propagate common kwargs to different modalities
molbap Jun 21, 2024
3bf804f
BC order of arguments
molbap Jun 21, 2024
2c8180a
check py version
molbap Jun 25, 2024
91e5045
revert kwargs merging
molbap Jul 15, 2024
bb256aa
add draft overlap test
molbap Jul 16, 2024
f7cc03b
Merge branch 'main' into uniform_processors_3
molbap Aug 9, 2024
bae0b5a
Merge branch 'main' into uniform_processors_3
molbap Aug 14, 2024
0d19af4
update
molbap Aug 14, 2024
59037cd
fix blip2 and wav2vec due to updates
molbap Aug 14, 2024
972f29c
fix copies
molbap Aug 14, 2024
ba3d4f5
ensure overlapping kwargs do not disappear
molbap Aug 14, 2024
ffb864a
replace .pop by .get to handle duplicated kwargs
molbap Aug 14, 2024
3afde1d
fix copies
molbap Aug 14, 2024
d337d3c
Merge branch 'main' into uniform_processors_3
molbap Sep 19, 2024
bcc8c52
rebase on main + fix wav2vec2_bert test missing import
molbap Sep 19, 2024
1444902
fix missing import
molbap Sep 19, 2024
35cc35f
add clearly wav2vec2_bert to uniformized models
molbap Sep 19, 2024
d489a7d
Merge branch 'main' into uniform_processors_3
molbap Sep 20, 2024
f282384
Merge branch 'main' into uniform_processors_3
molbap Sep 23, 2024
6f55001
fix copies
molbap Sep 23, 2024
ce19cfb
increase number of features
molbap Sep 24, 2024
138140b
fix style
molbap Sep 24, 2024
50b3e45
[run-slow] blip, blip2, bridgetower, donut, wav2vec2, wav2vec2_bert
molbap Sep 24, 2024
add0fd5
Merge branch 'main' into uniform_processors_3
molbap Sep 27, 2024
483af1e
[run-slow] blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert
molbap Sep 27, 2024
ebc9aea
fix concatenation
molbap Sep 27, 2024
c2268c6
[run-slow] blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert
molbap Sep 27, 2024
d5c107b
Update tests/test_processing_common.py
molbap Oct 1, 2024
99f9a2e
:broom:
molbap Oct 1, 2024
7e39ca3
Merge branch 'uniform_processors_3' of github.com:molbap/transformers…
molbap Oct 1, 2024
acfc61d
address comments
molbap Oct 1, 2024
d80d48d
clean up + tests
molbap Oct 1, 2024
90aef58
[run-slow] instructblip, blip, blip_2, bridgetower, donut, wav2vec2, …
molbap Oct 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 0 additions & 6 deletions src/transformers/models/altclip/processing_altclip.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,12 +80,6 @@ def __call__(
The sequence or batch of sequences to be encoded. Each sequence can be a string or a list of strings
(pretokenized string). If the sequences are provided as list of strings (pretokenized), you must set
`is_split_into_words=True` (to lift the ambiguity with a batch of sequences).
return_tensors (`str` or [`~utils.TensorType`], *optional*):
molbap marked this conversation as resolved.
Show resolved Hide resolved
If set, will return tensors of a particular framework. Acceptable values are:
- `'tf'`: Return TensorFlow `tf.constant` objects.
- `'pt'`: Return PyTorch `torch.Tensor` objects.
- `'np'`: Return NumPy `np.ndarray` objects.
- `'jax'`: Return JAX `jnp.ndarray` objects.
Returns:
[`BatchEncoding`]: A [`BatchEncoding`] with the following fields:

Expand Down
129 changes: 60 additions & 69 deletions src/transformers/models/blip/processing_blip.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,32 @@

from typing import List, Optional, Union


try:
from typing import Unpack
except ImportError:
from typing_extensions import Unpack
molbap marked this conversation as resolved.
Show resolved Hide resolved

from ...image_utils import ImageInput
from ...processing_utils import ProcessorMixin
from ...tokenization_utils_base import BatchEncoding, PaddingStrategy, PreTokenizedInput, TextInput, TruncationStrategy
from ...utils import TensorType
from ...processing_utils import ProcessingKwargs, ProcessorMixin
from ...tokenization_utils_base import BatchEncoding, PreTokenizedInput, TextInput


class BlipProcessorKwargs(ProcessingKwargs, total=False):
_defaults = {
"text_kwargs": {
"add_special_tokens": True,
"padding": False,
"stride": 0,
"return_overflowing_tokens": False,
"return_special_tokens_mask": False,
"return_offsets_mapping": False,
"return_token_type_ids": False,
"return_length": False,
"verbose": True,
},
"images_kwargs": {},
}


class BlipProcessor(ProcessorMixin):
Expand Down Expand Up @@ -51,84 +73,53 @@ def __init__(self, image_processor, tokenizer, **kwargs):
def __call__(
self,
images: ImageInput = None,
text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,
add_special_tokens: bool = True,
padding: Union[bool, str, PaddingStrategy] = False,
truncation: Union[bool, str, TruncationStrategy] = None,
max_length: Optional[int] = None,
stride: int = 0,
pad_to_multiple_of: Optional[int] = None,
return_attention_mask: Optional[bool] = None,
return_overflowing_tokens: bool = False,
return_special_tokens_mask: bool = False,
return_offsets_mapping: bool = False,
return_token_type_ids: bool = False,
return_length: bool = False,
verbose: bool = True,
return_tensors: Optional[Union[str, TensorType]] = None,
**kwargs,
text: Optional[Union[str, List[str], TextInput, PreTokenizedInput]] = None,
audio=None,
videos=None,
**kwargs: Unpack[BlipProcessorKwargs],
) -> BatchEncoding:
"""
This method uses [`BlipImageProcessor.__call__`] method to prepare image(s) for the model, and
[`BertTokenizerFast.__call__`] to prepare text for the model.

Please refer to the docstring of the above two methods for more information.
Args:
images (`ImageInput`):
The image or batch of images to be prepared. Each image can be a PIL image, NumPy array or PyTorch
tensor. Both channels-first and channels-last formats are supported.
text (`TextInput`, `PreTokenizedInput`, `List[TextInput]`, `List[PreTokenizedInput]`):
The sequence or batch of sequences to be encoded. Each sequence can be a string or a list of strings
(pretokenized string). If the sequences are provided as list of strings (pretokenized), you must set
`is_split_into_words=True` (to lift the ambiguity with a batch of sequences).
return_tensors (`str` or [`~utils.TensorType`], *optional*):
If set, will return tensors of a particular framework. Acceptable values are:
- `'tf'`: Return TensorFlow `tf.constant` objects.
- `'pt'`: Return PyTorch `torch.Tensor` objects.
- `'np'`: Return NumPy `np.ndarray` objects.
- `'jax'`: Return JAX `jnp.ndarray` objects.
"""
if images is None and text is None:
raise ValueError("You have to specify either images or text.")

# Get only text
if images is None:
self.current_processor = self.tokenizer
text_encoding = self.tokenizer(
text=text,
add_special_tokens=add_special_tokens,
padding=padding,
truncation=truncation,
max_length=max_length,
stride=stride,
pad_to_multiple_of=pad_to_multiple_of,
return_attention_mask=return_attention_mask,
return_overflowing_tokens=return_overflowing_tokens,
return_special_tokens_mask=return_special_tokens_mask,
return_offsets_mapping=return_offsets_mapping,
return_token_type_ids=return_token_type_ids,
return_length=return_length,
verbose=verbose,
return_tensors=return_tensors,
**kwargs,
)
return text_encoding

# add pixel_values
encoding_image_processor = self.image_processor(images, return_tensors=return_tensors)
text_encoding = None

# add pixel_values encoding. If we also have text_encoding, update image encoding and return it.
# else, return the text encoding.
output_kwargs = self._merge_kwargs(
BlipProcessorKwargs,
tokenizer_init_kwargs=self.tokenizer.init_kwargs,
**kwargs,
)
if text is not None:
text_encoding = self.tokenizer(
text=text,
add_special_tokens=add_special_tokens,
padding=padding,
truncation=truncation,
max_length=max_length,
stride=stride,
pad_to_multiple_of=pad_to_multiple_of,
return_attention_mask=return_attention_mask,
return_overflowing_tokens=return_overflowing_tokens,
return_special_tokens_mask=return_special_tokens_mask,
return_offsets_mapping=return_offsets_mapping,
return_token_type_ids=return_token_type_ids,
return_length=return_length,
verbose=verbose,
return_tensors=return_tensors,
**kwargs,
)
else:
text_encoding = None

if text_encoding is not None:
encoding_image_processor.update(text_encoding)

return encoding_image_processor
text_encoding = self.tokenizer(text, **output_kwargs["text_kwargs"])
if images is not None:
encoding_image_processor = self.image_processor(images, **output_kwargs["images_kwargs"])

if text_encoding is not None:
encoding_image_processor.update(text_encoding)
return encoding_image_processor

return text_encoding
molbap marked this conversation as resolved.
Show resolved Hide resolved

def batch_decode(self, *args, **kwargs):
"""
Expand Down
135 changes: 67 additions & 68 deletions src/transformers/models/blip_2/processing_blip_2.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,22 +18,43 @@

from typing import List, Optional, Union


try:
from typing import Unpack
except ImportError:
from typing_extensions import Unpack
molbap marked this conversation as resolved.
Show resolved Hide resolved

from ...image_utils import ImageInput
from ...processing_utils import ProcessorMixin
from ...processing_utils import ProcessingKwargs, ProcessorMixin
from ...tokenization_utils_base import (
AddedToken,
BatchEncoding,
PaddingStrategy,
PreTokenizedInput,
TextInput,
TruncationStrategy,
)
from ...utils import TensorType, logging
from ...utils import logging


logger = logging.get_logger(__name__)


class Blip2ProcessorKwargs(ProcessingKwargs, total=False):
_defaults = {
"text_kwargs": {
"add_special_tokens": True,
"padding": False,
"stride": 0,
"return_overflowing_tokens": False,
"return_special_tokens_mask": False,
"return_offsets_mapping": False,
"return_token_type_ids": False,
"return_length": False,
"verbose": True,
},
"images_kwargs": {},
}


class Blip2Processor(ProcessorMixin):
r"""
Constructs a BLIP-2 processor which wraps a BLIP image processor and an OPT/T5 tokenizer into a single processor.
Expand Down Expand Up @@ -67,83 +88,55 @@ def __init__(self, image_processor, tokenizer, num_query_tokens=None, **kwargs):
def __call__(
self,
images: ImageInput = None,
text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,
add_special_tokens: bool = True,
padding: Union[bool, str, PaddingStrategy] = False,
truncation: Union[bool, str, TruncationStrategy] = None,
max_length: Optional[int] = None,
stride: int = 0,
pad_to_multiple_of: Optional[int] = None,
return_attention_mask: Optional[bool] = None,
return_overflowing_tokens: bool = False,
return_special_tokens_mask: bool = False,
return_offsets_mapping: bool = False,
return_token_type_ids: bool = False,
return_length: bool = False,
verbose: bool = True,
return_tensors: Optional[Union[str, TensorType]] = None,
**kwargs,
text: Optional[Union[str, List[str], TextInput, PreTokenizedInput]] = None,
audio=None,
videos=None,
**kwargs: Unpack[Blip2ProcessorKwargs],
) -> BatchEncoding:
"""
This method uses [`BlipImageProcessor.__call__`] method to prepare image(s) for the model, and
[`BertTokenizerFast.__call__`] to prepare text for the model.

Please refer to the docstring of the above two methods for more information.
Args:
images (`ImageInput`):
The image or batch of images to be prepared. Each image can be a PIL image, NumPy array or PyTorch
tensor. Both channels-first and channels-last formats are supported.
text (`TextInput`, `PreTokenizedInput`, `List[TextInput]`, `List[PreTokenizedInput]`):
The sequence or batch of sequences to be encoded. Each sequence can be a string or a list of strings
(pretokenized string). If the sequences are provided as list of strings (pretokenized), you must set
`is_split_into_words=True` (to lift the ambiguity with a batch of sequences).
return_tensors (`str` or [`~utils.TensorType`], *optional*):
If set, will return tensors of a particular framework. Acceptable values are:
- `'tf'`: Return TensorFlow `tf.constant` objects.
- `'pt'`: Return PyTorch `torch.Tensor` objects.
- `'np'`: Return NumPy `np.ndarray` objects.
- `'jax'`: Return JAX `jnp.ndarray` objects.
"""
if images is None and text is None:
raise ValueError("You have to specify either images or text.")

# Get only text
if images is None:
self.current_processor = self.tokenizer
text_encoding = self.tokenizer(
text=text,
add_special_tokens=add_special_tokens,
padding=padding,
truncation=truncation,
max_length=max_length,
stride=stride,
pad_to_multiple_of=pad_to_multiple_of,
return_attention_mask=return_attention_mask,
return_overflowing_tokens=return_overflowing_tokens,
return_special_tokens_mask=return_special_tokens_mask,
return_offsets_mapping=return_offsets_mapping,
return_token_type_ids=return_token_type_ids,
return_length=return_length,
verbose=verbose,
return_tensors=return_tensors,
**kwargs,
)
return text_encoding

# add pixel_values
encoding_image_processor = self.image_processor(images, return_tensors=return_tensors)

output_kwargs = self._merge_kwargs(
Blip2ProcessorKwargs,
tokenizer_init_kwargs=self.tokenizer.init_kwargs,
**kwargs,
)
text_encoding = None
# BC for explicit return_tensors
if "return_tensors" in output_kwargs["common_kwargs"]:
return_tensors = output_kwargs["common_kwargs"].pop("return_tensors", None)
else:
return_tensors = None
molbap marked this conversation as resolved.
Show resolved Hide resolved
if text is not None:
if isinstance(text, str):
text = [text]
elif not isinstance(text, list) and not isinstance(text[0], str):
raise ValueError("Invalid input text. Please provide a string, or a list of strings")

text_encoding = {}
_text_encoding = self.tokenizer(
text=text,
add_special_tokens=add_special_tokens,
padding=padding,
truncation=truncation,
max_length=max_length,
stride=stride,
pad_to_multiple_of=pad_to_multiple_of,
return_attention_mask=return_attention_mask,
return_overflowing_tokens=return_overflowing_tokens,
return_special_tokens_mask=return_special_tokens_mask,
return_offsets_mapping=return_offsets_mapping,
return_token_type_ids=return_token_type_ids,
return_length=return_length,
verbose=verbose,
return_tensors=None, # hardcode "None" here for prepending image tokens
**kwargs,
)

return_tensors = output_kwargs["text_kwargs"].pop("return_tensors", None)
_text_encoding = self.tokenizer(text, **output_kwargs["text_kwargs"], return_tensors=None)
output_kwargs["text_kwargs"]["return_tensors"] = return_tensors

# if we know how many query tokens, expand text inside processor. We need this hacky manipulation
# because BLIP expects image tokens to be at the beginning even before BOS token
Expand All @@ -168,10 +161,16 @@ def __call__(
else:
text_encoding = None

if text_encoding is not None:
encoding_image_processor.update(text_encoding)
# add pixel_values encoding. If we also have text_encoding, update image encoding and return it.
# else, return the text encoding.

if images is not None:
encoding_image_processor = self.image_processor(images, **output_kwargs["images_kwargs"])
if text_encoding is not None:
encoding_image_processor.update(text_encoding)
return encoding_image_processor

return encoding_image_processor
return text_encoding

# Copied from transformers.models.blip.processing_blip.BlipProcessor.batch_decode with BertTokenizerFast->PreTrainedTokenizer
def batch_decode(self, *args, **kwargs):
Expand Down
Loading
Loading