Falcon port #24523

Rocketknight1 · 2023-06-27T14:37:29Z

This PR adds the Falcon model to the main library. It's still a work in progress, and integration tests / model checkpoints still need to be added!

TODO:

sgugger

Thanks for working on this! Some initial comments.

README.md

src/transformers/models/falcon/configuration_falcon.py

src/transformers/models/falcon/modeling_falcon.py

Rocketknight1 · 2023-06-27T17:35:56Z

Update: Slightly delayed because there are some breaking architecture changes between the different Falcon checkpoints - I'm merging the various layers and using config variables to switch between the behaviours.

HuggingFaceDocBuilderDev · 2023-06-29T14:31:37Z

The documentation is not available anymore as the PR was closed or merged.

ArthurZucker · 2023-06-30T07:58:17Z

feel free to ping me for a review anytime!

ArthurZucker

Thanks a lot for working on this! Already clean 🤗

Tokenizer: missing in the auto mapping, I have no idea which one is used, and how it will be converted to fast. If it was pretrained using tokenizers and has no class this should be specified somewhere
Tests: let's improve our integration tests. Not all "architecture" that can be enabled by the config are tested. Should be a bit clearer. + some bfloat16 etc conversion / generation are needed. I don't have the issues at hand but let's link them marking them as resolved as we received some linked to these models!

docs/source/en/model_doc/falcon.md

src/transformers/models/falcon/__init__.py

src/transformers/models/falcon/configuration_falcon.py

src/transformers/models/falcon/modeling_falcon.py

tests/models/falcon/test_modeling_falcon.py

ArthurZucker · 2023-06-30T13:13:45Z

tests/models/falcon/test_modeling_falcon.py

+        expected_output = (
+            "My favorite food is pizza. I love it so much that I have a pizza party every year for my birthday."
+        )


Suggested change

expected_output = (

"My favorite food is pizza. I love it so much that I have a pizza party every year for my birthday."

)

EXPECTED_OUTPUT = (

"My favorite food is pizza. I love it so much that I have a pizza party every year for my birthday."

)

Also can we do a batch, with padding to make sure rotary works as expected. Plus let's test small and big

Will see what I can add!

WilliamTambellini · 2023-06-30T20:48:34Z

Hi @Rocketknight1
Would this PR allow to export falcon to onnx?
As today using the latest release (4.30.1):

Traceback (most recent call last):
  File "hf2onnx.py", line 99, in <module>
    model_kind, model_onnx_config = FeaturesManager.check_supported_model_or_raise(model, feature= feature)
  File "site-packages/transformers/onnx/features.py", line 728, in check_supported_model_or_raise
    model_features = FeaturesManager.get_supported_features_for_model_type(model_type, model_name=model_name)
  File "site-packages/transformers/onnx/features.py", line 575, in get_supported_features_for_model_type
    raise KeyError(
KeyError: "refinedwebmodel is not supported yet. Only ['albert', 'bart', 'beit', 'bert', 'big-bird', 'bigbird-pegasus', 'blenderbot', 'blenderbot-small', 'bloom', 'camembert', 'clip', 'codegen', 'convbert', 'convnext', 'data2vec-text', 'data2vec-vision', 'deberta', 'deberta-v2', 'deit', 'detr', 'distilbert', 'electra', 'flaubert', 'gpt2', 'gptj', 'gpt-neo', 'groupvit', 'ibert', 'imagegpt', 'layoutlm', 'layoutlmv3', 'levit', 'longt5', 'longformer', 'marian', 'mbart', 'mobilebert', 'mobilenet-v1', 'mobilenet-v2', 'mobilevit', 'mt5', 'm2m-100', 'owlvit', 'perceiver', 'poolformer', 'rembert', 'resnet', 'roberta', 'roformer', 'segformer', 'squeezebert', 'swin', 't5', 'vision-encoder-decoder', 'vit', 'whisper', 'xlm', 'xlm-roberta', 'yolos'] are supported. If you want to support refinedwebmodel please propose a PR or open up an issue."

best

Rocketknight1 · 2023-07-04T20:56:15Z

Hey all! The main modeling code should be ready for final review now. Thanks @ArthurZucker for the comprehensive review - it was really helpful! There's one bug left that's causing a failing test, but I think it's a one-line fix that I can track down tomorrow. This may also be the issue that's causing assisted generation to fail, but those tests are currently skipped.

I also need to figure out porting the tokenizer, and then once this is merged I'll need to prepare the repos to transition over to the library code.

cc @amyeroberts for core maintainer review!

amyeroberts

Very nice 🤗 Thanks for all the work porting this model!

Mostly just small comments. Mainly missing / incorrect docstrings for the model that need to be added.

README.md

src/transformers/models/falcon/configuration_falcon.py

src/transformers/models/falcon/modeling_falcon.py

amyeroberts · 2023-07-05T12:41:34Z

src/transformers/models/falcon/modeling_falcon.py

+        self.word_embeddings = nn.Embedding(config.vocab_size, self.embed_dim)
+
+        # Transformer blocks
+        self.h = nn.ModuleList([FalconDecoderLayer(config) for _ in range(config.num_hidden_layers)])


src/transformers/models/falcon/modeling_falcon.py

miladm · 2023-07-05T22:03:05Z

tests/models/falcon/test_modeling_falcon.py

+        self, config, input_ids, token_type_ids, input_mask, sequence_labels, token_labels, choice_labels
+    ):
+        model = FalconModel(config=config)
+        model.to(torch_device)


Would love to enable and test torch_xla as a target device backend on Falcon. Happy to do it as a follow up PR or as part of this PR. Any suggestion?

cc @JackCaoG

@miladm sounds great, but I think it should definitely go in a follow-up PR after this is merged! We want to make sure the initial launch and the transition from remote_code checkpoints to in-library checkpoints goes smoothly, and then we can start adding features like that.

@miladm The PR is now merged - feel free to start on any follow-up PR!

amyeroberts

Thanks for all the work porting this!

Just small comments on my side - mainly that the config docstring doesn't match the config init. I also saw at least one comment from another reviewer still outstanding in the PR - so these should be resolved too before merging.

Only outstanding questions I have are:

Have we loaded the remote model and this port and checked their outputs are equivalent?
What is the default model code used when calling e.g. AutoModel.from_pretrained(hub_checkpoint)? I'm assuming the one on this branch?

tests/models/falcon/test_modeling_falcon.py

amyeroberts · 2023-07-07T16:21:55Z

src/transformers/models/falcon/configuration_falcon.py

+        vocab_size (`int`, *optional*, defaults to 65024):
+            Vocabulary size of the Falcon model. Defines the number of different tokens that can be represented by the
+            `inputs_ids` passed when calling [`FalconModel`]
+        hidden_size (`int`, *optional*, defaults to 64):


Docstring doesn't match default values or some var names

amyeroberts · 2023-07-07T16:24:14Z

src/transformers/models/falcon/configuration_falcon.py

+        new_decoder_architecture=False,
+        multi_query=True,
+        parallel_attn=True,


From a quick search, I think new_decoder_architecture and parallel_attn are new params, so need documentation

Documented!

amyeroberts · 2023-07-07T16:31:56Z

src/transformers/models/falcon/modeling_falcon.py

+        self.sin_cached: torch.Tensor | None = None
+
+    def cos_sin(self, seq_len: int, device="cpu", dtype=torch.bfloat16) -> torch.Tensor:
+        if seq_len != self.seq_len_cached:


You're completely right - my mind blupped. I was thinking about NaN's not None's 🤦‍♀️ Ironically what was in my head at the time

src/transformers/models/falcon/modeling_falcon.py

amyeroberts · 2023-07-07T16:47:28Z

src/transformers/models/falcon/modeling_falcon.py

+            [`PreTrainedTokenizer.__call__`] for details.
+
+            [What are input IDs?](../glossary#input-ids)
+        past_key_values (`Tuple[Tuple[torch.Tensor]]` of length `config.n_layers`):


I don't think there is a n_layers attribute?

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

… it anymore

* Initial commit * Update src/transformers/models/falcon/configuration_falcon.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/falcon/configuration_falcon.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Cleanup config docstring * Update src/transformers/models/falcon/configuration_falcon.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Convert to relative imports * Remove torch < 1.8 warning * Restructure cos_sin header * qkv -> query, key, value * Refactor attention calculation * Add a couple of config variables to account for the different checkpoints * Successful merging of the code paths! * Fix misplaced line in the non-parallel attention path * Update config and tests * Add a pad_token_id when testing * Support output_attentions when alibi is None * make fixup * Skip KV cache shape test * No more _keys_to_ignore_on_load_missing * Simplify self attention a bit * Simplify self attention a bit * make fixup * stash commit * Some more attention mask updates * Should pass all tests except assisted generation! * Add big model generation test * make fixup * Add temporary workaround for test * Test overrides for assisted generation * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/models/falcon/test_modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Test overrides for assisted generation * Add generation demo * Update copyright * Make the docstring model actually small * Add module-level docstring * Remove all assertions * Add copied from bloom * Reformat the QKV layer * Add copied from bloom * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove unused line and reformat * No single letter variables * Cleanup return names * Add copied from line * Remove the deprecated arguments blocks * Change the embeddings test to an alibi on/off test * Remove position_ids from FalconForQA * Remove old check for token type IDs * Fix the alibi path when multi_query is False * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/falcon/test_modeling_falcon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update config naming * Fix typo for new_decoder_architecture * Add some comments * Fix docstring * Fix docstring * Create range in the right dtype from the start * Review comment cleanup * n_head_kv -> num_kv_heads * self.alibi -> self.use_alibi * self.num_kv -> self.num_kv_heads * Reorder config args * Made alibi arguments Optional * Add all model docstrings * Add extra checkpoints * Add author info for Falcon * Stop removing token_type_ids because our checkpoints shouldn't return it anymore * Add one hopeful comment for the future * Fix typo * Update tests, fix cache issue for generation * Use -1e9 instead of -inf to avoid float overflow * Recompute the rotary embeddings much less often * Re-enable disabled tests * One final fix to attention mask calculation, and update tests * Cleanup targeting falcon-40b equivalency * Post-rebase docs update * Update docstrings, especially in the config * More descriptive variable names, and comments where we can't rename them --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Enable the `use_flash_attention` configuration flag for Falcon models. When `use_flash_attention` is set to `true` the [FalconAttention.forwad()](https://github.com/huggingface/transformers/blob/c965d302791cf935d6ea7776428749be678cf509/src/transformers/models/falcon/modeling_falcon.py#L281) method is replaced with a variant that uses Tri Dao's flash_attention instead of pytorch's `scaled_dot_product_attention` function. At the moment the patch works only for falcon-7b but technically it will also work for falcon-40b with the right configuration. The falcon model situation is currently a bit messy: The Falcon model was recently added to Huggingface transformers (see [PR transformers#24523](huggingface/transformers#24523)) but the falcon models on the hugginface hub use still the code which is shipped together with the weights (a PR to change this [was reverted](https://huggingface.co/tiiuae/falcon-7b/discussions/66)). Falcon-7b and 40b use both slightly different code (which was unified in the HF transformers impl and can there be controlled via a configuration member called `new_decoder_architecture` see [configuration_falcon.py#L65-L67](https://github.com/huggingface/transformers/blob/main/src/transformers/models/falcon/configuration_falcon.py#L65-L67)). The HF Falcon impl uses different names in the configuration class, e.g. compare new [configuration_falcon.py](https://github.com/huggingface/transformers/blob/main/src/transformers/models/falcon/configuration_falcon.py) and old [configuration_RW.py](https://huggingface.co/tiiuae/falcon-7b/blob/main/configuration_RW.py) HF Falcon implementation compatible model configurations can be found here: 7B: [config.json](https://huggingface.co/tiiuae/falcon-7b/blob/4e2d06f0a7c6370ebabbc30c6f59377ae8f73d76/config.json) 40B: [config.json](https://huggingface.co/tiiuae/falcon-40b/blob/f1ba7d328c06aa6fbb4a8afd3c756f46d7e6b232/config.json)

* Initial commit * Update src/transformers/models/falcon/configuration_falcon.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/falcon/configuration_falcon.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Cleanup config docstring * Update src/transformers/models/falcon/configuration_falcon.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Convert to relative imports * Remove torch < 1.8 warning * Restructure cos_sin header * qkv -> query, key, value * Refactor attention calculation * Add a couple of config variables to account for the different checkpoints * Successful merging of the code paths! * Fix misplaced line in the non-parallel attention path * Update config and tests * Add a pad_token_id when testing * Support output_attentions when alibi is None * make fixup * Skip KV cache shape test * No more _keys_to_ignore_on_load_missing * Simplify self attention a bit * Simplify self attention a bit * make fixup * stash commit * Some more attention mask updates * Should pass all tests except assisted generation! * Add big model generation test * make fixup * Add temporary workaround for test * Test overrides for assisted generation * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/models/falcon/test_modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Test overrides for assisted generation * Add generation demo * Update copyright * Make the docstring model actually small * Add module-level docstring * Remove all assertions * Add copied from bloom * Reformat the QKV layer * Add copied from bloom * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove unused line and reformat * No single letter variables * Cleanup return names * Add copied from line * Remove the deprecated arguments blocks * Change the embeddings test to an alibi on/off test * Remove position_ids from FalconForQA * Remove old check for token type IDs * Fix the alibi path when multi_query is False * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/falcon/test_modeling_falcon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update config naming * Fix typo for new_decoder_architecture * Add some comments * Fix docstring * Fix docstring * Create range in the right dtype from the start * Review comment cleanup * n_head_kv -> num_kv_heads * self.alibi -> self.use_alibi * self.num_kv -> self.num_kv_heads * Reorder config args * Made alibi arguments Optional * Add all model docstrings * Add extra checkpoints * Add author info for Falcon * Stop removing token_type_ids because our checkpoints shouldn't return it anymore * Add one hopeful comment for the future * Fix typo * Update tests, fix cache issue for generation * Use -1e9 instead of -inf to avoid float overflow * Recompute the rotary embeddings much less often * Re-enable disabled tests * One final fix to attention mask calculation, and update tests * Cleanup targeting falcon-40b equivalency * Post-rebase docs update * Update docstrings, especially in the config * More descriptive variable names, and comments where we can't rename them --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Rocketknight1 requested review from sgugger and LysandreJik June 27, 2023 14:42

sgugger reviewed Jun 27, 2023

View reviewed changes

gante mentioned this pull request Jun 28, 2023

Extremely slow model inference for load_in_4bit #24502

Closed

4 tasks

Rocketknight1 force-pushed the falcon_port branch from c2c7239 to 290dd30 Compare June 29, 2023 13:58

ArthurZucker reviewed Jun 30, 2023

View reviewed changes

Rocketknight1 force-pushed the falcon_port branch from db15e91 to a17e783 Compare July 4, 2023 15:56

Rocketknight1 requested a review from amyeroberts July 4, 2023 20:52

amyeroberts reviewed Jul 5, 2023

View reviewed changes

miladm reviewed Jul 5, 2023

View reviewed changes

ydshieh mentioned this pull request Jul 6, 2023

is there any plan to add falcon to instructblip? #24688

Closed

2 tasks

younesbelkada mentioned this pull request Jul 7, 2023

In RWForCausalLM.prepare_inputs_for_generation, the past_key_values are always None. #24701

Closed

4 tasks

mindest mentioned this pull request Jul 7, 2023

Add support for Falcon model to export to ONNX huggingface/optimum#1172

Closed

amyeroberts mentioned this pull request Jul 7, 2023

Assistant Model With Falcon Fails #24072

Closed

4 tasks

amyeroberts approved these changes Jul 7, 2023

View reviewed changes

Rocketknight1 and others added 10 commits July 10, 2023 16:36

Initial commit

37f084a

Update src/transformers/models/falcon/configuration_falcon.py

f7008b4

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update src/transformers/models/falcon/configuration_falcon.py

b3278e0

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Cleanup config docstring

d0c0db5

Update src/transformers/models/falcon/configuration_falcon.py

89a2488

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Convert to relative imports

3e9f11e

Remove torch < 1.8 warning

9633df5

Restructure cos_sin header

a4b91cd

qkv -> query, key, value

f31d0ea

Refactor attention calculation

97a7aa7

Rocketknight1 added 19 commits July 10, 2023 16:36

Create range in the right dtype from the start

3e0b978

Review comment cleanup

58dda7e

n_head_kv -> num_kv_heads

17c4b9d

self.alibi -> self.use_alibi

57fe9f2

self.num_kv -> self.num_kv_heads

302cc95

Reorder config args

b744ccd

Made alibi arguments Optional

e2f2db1

Add all model docstrings

3b94831

Add extra checkpoints

ab66238

Add author info for Falcon

94c33e9

Stop removing token_type_ids because our checkpoints shouldn't return…

131be5f

… it anymore

Add one hopeful comment for the future

e11ff0c

Fix typo

0de24ed

Update tests, fix cache issue for generation

2bdc92d

Use -1e9 instead of -inf to avoid float overflow

e711719

Recompute the rotary embeddings much less often

d7176e2

Re-enable disabled tests

21f84bd

One final fix to attention mask calculation, and update tests

3359388

Cleanup targeting falcon-40b equivalency

fb4e555

Rocketknight1 force-pushed the falcon_port branch from 21333f8 to fb4e555 Compare July 10, 2023 15:36

Rocketknight1 added 3 commits July 10, 2023 16:37

Post-rebase docs update

28d55b9

Update docstrings, especially in the config

c1a3559

More descriptive variable names, and comments where we can't rename them

5125fb9

Rocketknight1 merged commit b3ab3fa into main Jul 11, 2023

Rocketknight1 deleted the falcon_port branch July 11, 2023 12:36

ydshieh mentioned this pull request Jul 11, 2023

Use protobuf 4 #24599

Merged

andreaskoepf mentioned this pull request Jul 17, 2023

Add flash-attention patch for falcon-7b LAION-AI/Open-Assistant#3580

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Falcon port #24523

Falcon port #24523

Rocketknight1 commented Jun 27, 2023 •

edited

Loading

sgugger left a comment

Rocketknight1 commented Jun 27, 2023

HuggingFaceDocBuilderDev commented Jun 29, 2023 •

edited

Loading

ArthurZucker commented Jun 30, 2023

ArthurZucker left a comment

ArthurZucker Jun 30, 2023

Rocketknight1 Jul 4, 2023

WilliamTambellini commented Jun 30, 2023 •

edited

Loading

Rocketknight1 commented Jul 4, 2023 •

edited

Loading

amyeroberts left a comment

amyeroberts Jul 5, 2023

miladm Jul 5, 2023 •

edited

Loading

Rocketknight1 Jul 6, 2023

Rocketknight1 Jul 11, 2023

amyeroberts left a comment

amyeroberts Jul 7, 2023

Rocketknight1 Jul 10, 2023

amyeroberts Jul 7, 2023

Rocketknight1 Jul 10, 2023

amyeroberts Jul 7, 2023

amyeroberts Jul 7, 2023

Rocketknight1 Jul 10, 2023

Falcon port #24523

Falcon port #24523

Conversation

Rocketknight1 commented Jun 27, 2023 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

Rocketknight1 commented Jun 27, 2023

HuggingFaceDocBuilderDev commented Jun 29, 2023 • edited Loading

ArthurZucker commented Jun 30, 2023

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WilliamTambellini commented Jun 30, 2023 • edited Loading

Rocketknight1 commented Jul 4, 2023 • edited Loading

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miladm Jul 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rocketknight1 commented Jun 27, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 29, 2023 •

edited

Loading

WilliamTambellini commented Jun 30, 2023 •

edited

Loading

Rocketknight1 commented Jul 4, 2023 •

edited

Loading

miladm Jul 5, 2023 •

edited

Loading