Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Falcon port #24523

Merged
merged 82 commits into from
Jul 11, 2023
Merged

Falcon port #24523

merged 82 commits into from
Jul 11, 2023

Conversation

Rocketknight1
Copy link
Member

@Rocketknight1 Rocketknight1 commented Jun 27, 2023

This PR adds the Falcon model to the main library. It's still a work in progress, and integration tests / model checkpoints still need to be added!

TODO:

  • Migrate custom code checkpoints to the new architecture
  • Confirm tokenizer can be loaded correctly with AutoTokenizer for all checkpoints
  • Upload a ported 1B model.
  • Add integration tests for the 1B model
  • Add support for output_attention
  • Add support for output_hidden_states
  • Ensure all tests pass
  • Ensure any other issues addressed (see comments on Slack)
  • Address review comments
  • Ensure tokenizers are ported correctly (token_type_ids issue)
  • Upload library ports of all Falcon checkpoints and migrate/redirect to them

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Some initial comments.

README.md Outdated Show resolved Hide resolved
src/transformers/models/falcon/configuration_falcon.py Outdated Show resolved Hide resolved
src/transformers/models/falcon/configuration_falcon.py Outdated Show resolved Hide resolved
src/transformers/models/falcon/configuration_falcon.py Outdated Show resolved Hide resolved
src/transformers/models/falcon/configuration_falcon.py Outdated Show resolved Hide resolved
src/transformers/models/falcon/modeling_falcon.py Outdated Show resolved Hide resolved
src/transformers/models/falcon/modeling_falcon.py Outdated Show resolved Hide resolved
src/transformers/models/falcon/modeling_falcon.py Outdated Show resolved Hide resolved
src/transformers/models/falcon/modeling_falcon.py Outdated Show resolved Hide resolved
src/transformers/models/falcon/modeling_falcon.py Outdated Show resolved Hide resolved
@Rocketknight1
Copy link
Member Author

Update: Slightly delayed because there are some breaking architecture changes between the different Falcon checkpoints - I'm merging the various layers and using config variables to switch between the behaviours.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 29, 2023

The documentation is not available anymore as the PR was closed or merged.

@ArthurZucker
Copy link
Collaborator

feel free to ping me for a review anytime!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this! Already clean 🤗

  • Tokenizer: missing in the auto mapping, I have no idea which one is used, and how it will be converted to fast. If it was pretrained using tokenizers and has no class this should be specified somewhere
  • Tests: let's improve our integration tests. Not all "architecture" that can be enabled by the config are tested. Should be a bit clearer. + some bfloat16 etc conversion / generation are needed. I don't have the issues at hand but let's link them marking them as resolved as we received some linked to these models!

docs/source/en/model_doc/falcon.md Outdated Show resolved Hide resolved
src/transformers/models/falcon/__init__.py Outdated Show resolved Hide resolved
src/transformers/models/falcon/configuration_falcon.py Outdated Show resolved Hide resolved
src/transformers/models/falcon/configuration_falcon.py Outdated Show resolved Hide resolved
src/transformers/models/falcon/modeling_falcon.py Outdated Show resolved Hide resolved
src/transformers/models/falcon/modeling_falcon.py Outdated Show resolved Hide resolved
tests/models/falcon/test_modeling_falcon.py Outdated Show resolved Hide resolved
tests/models/falcon/test_modeling_falcon.py Outdated Show resolved Hide resolved
Comment on lines 366 to 426
expected_output = (
"My favorite food is pizza. I love it so much that I have a pizza party every year for my birthday."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expected_output = (
"My favorite food is pizza. I love it so much that I have a pizza party every year for my birthday."
)
EXPECTED_OUTPUT = (
"My favorite food is pizza. I love it so much that I have a pizza party every year for my birthday."
)

Also can we do a batch, with padding to make sure rotary works as expected. Plus let's test small and big

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will see what I can add!

@WilliamTambellini
Copy link
Contributor

WilliamTambellini commented Jun 30, 2023

Hi @Rocketknight1
Would this PR allow to export falcon to onnx?
As today using the latest release (4.30.1):

Traceback (most recent call last):
  File "hf2onnx.py", line 99, in <module>
    model_kind, model_onnx_config = FeaturesManager.check_supported_model_or_raise(model, feature= feature)
  File "site-packages/transformers/onnx/features.py", line 728, in check_supported_model_or_raise
    model_features = FeaturesManager.get_supported_features_for_model_type(model_type, model_name=model_name)
  File "site-packages/transformers/onnx/features.py", line 575, in get_supported_features_for_model_type
    raise KeyError(
KeyError: "refinedwebmodel is not supported yet. Only ['albert', 'bart', 'beit', 'bert', 'big-bird', 'bigbird-pegasus', 'blenderbot', 'blenderbot-small', 'bloom', 'camembert', 'clip', 'codegen', 'convbert', 'convnext', 'data2vec-text', 'data2vec-vision', 'deberta', 'deberta-v2', 'deit', 'detr', 'distilbert', 'electra', 'flaubert', 'gpt2', 'gptj', 'gpt-neo', 'groupvit', 'ibert', 'imagegpt', 'layoutlm', 'layoutlmv3', 'levit', 'longt5', 'longformer', 'marian', 'mbart', 'mobilebert', 'mobilenet-v1', 'mobilenet-v2', 'mobilevit', 'mt5', 'm2m-100', 'owlvit', 'perceiver', 'poolformer', 'rembert', 'resnet', 'roberta', 'roformer', 'segformer', 'squeezebert', 'swin', 't5', 'vision-encoder-decoder', 'vit', 'whisper', 'xlm', 'xlm-roberta', 'yolos'] are supported. If you want to support refinedwebmodel please propose a PR or open up an issue."

best

@Rocketknight1
Copy link
Member Author

Rocketknight1 commented Jul 4, 2023

Hey all! The main modeling code should be ready for final review now. Thanks @ArthurZucker for the comprehensive review - it was really helpful! There's one bug left that's causing a failing test, but I think it's a one-line fix that I can track down tomorrow. This may also be the issue that's causing assisted generation to fail, but those tests are currently skipped.

I also need to figure out porting the tokenizer, and then once this is merged I'll need to prepare the repos to transition over to the library code.

cc @amyeroberts for core maintainer review!

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice 🤗 Thanks for all the work porting this model!

Mostly just small comments. Mainly missing / incorrect docstrings for the model that need to be added.

README.md Outdated Show resolved Hide resolved
src/transformers/models/falcon/configuration_falcon.py Outdated Show resolved Hide resolved
src/transformers/models/falcon/modeling_falcon.py Outdated Show resolved Hide resolved
src/transformers/models/falcon/modeling_falcon.py Outdated Show resolved Hide resolved
src/transformers/models/falcon/modeling_falcon.py Outdated Show resolved Hide resolved
src/transformers/models/falcon/modeling_falcon.py Outdated Show resolved Hide resolved
self.word_embeddings = nn.Embedding(config.vocab_size, self.embed_dim)

# Transformer blocks
self.h = nn.ModuleList([FalconDecoderLayer(config) for _ in range(config.num_hidden_layers)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☹️

src/transformers/models/falcon/modeling_falcon.py Outdated Show resolved Hide resolved
src/transformers/models/falcon/modeling_falcon.py Outdated Show resolved Hide resolved
self, config, input_ids, token_type_ids, input_mask, sequence_labels, token_labels, choice_labels
):
model = FalconModel(config=config)
model.to(torch_device)
Copy link

@miladm miladm Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love to enable and test torch_xla as a target device backend on Falcon. Happy to do it as a follow up PR or as part of this PR. Any suggestion?

cc @JackCaoG

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miladm sounds great, but I think it should definitely go in a follow-up PR after this is merged! We want to make sure the initial launch and the transition from remote_code checkpoints to in-library checkpoints goes smoothly, and then we can start adding features like that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miladm The PR is now merged - feel free to start on any follow-up PR!

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work porting this!

Just small comments on my side - mainly that the config docstring doesn't match the config init. I also saw at least one comment from another reviewer still outstanding in the PR - so these should be resolved too before merging.

Only outstanding questions I have are:

  • Have we loaded the remote model and this port and checked their outputs are equivalent?
  • What is the default model code used when calling e.g. AutoModel.from_pretrained(hub_checkpoint)? I'm assuming the one on this branch?

tests/models/falcon/test_modeling_falcon.py Outdated Show resolved Hide resolved
vocab_size (`int`, *optional*, defaults to 65024):
Vocabulary size of the Falcon model. Defines the number of different tokens that can be represented by the
`inputs_ids` passed when calling [`FalconModel`]
hidden_size (`int`, *optional*, defaults to 64):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring doesn't match default values or some var names

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

Comment on lines +96 to +98
new_decoder_architecture=False,
multi_query=True,
parallel_attn=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick search, I think new_decoder_architecture and parallel_attn are new params, so need documentation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented!

self.sin_cached: torch.Tensor | None = None

def cos_sin(self, seq_len: int, device="cpu", dtype=torch.bfloat16) -> torch.Tensor:
if seq_len != self.seq_len_cached:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're completely right - my mind blupped. I was thinking about NaN's not None's 🤦‍♀️ Ironically what was in my head at the time

src/transformers/models/falcon/modeling_falcon.py Outdated Show resolved Hide resolved
[`PreTrainedTokenizer.__call__`] for details.

[What are input IDs?](../glossary#input-ids)
past_key_values (`Tuple[Tuple[torch.Tensor]]` of length `config.n_layers`):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a n_layers attribute?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@Rocketknight1 Rocketknight1 merged commit b3ab3fa into main Jul 11, 2023
@Rocketknight1 Rocketknight1 deleted the falcon_port branch July 11, 2023 12:36
@ydshieh ydshieh mentioned this pull request Jul 11, 2023
Lorenzobattistela pushed a commit to Lorenzobattistela/transformers that referenced this pull request Jul 13, 2023
* Initial commit

* Update src/transformers/models/falcon/configuration_falcon.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/falcon/configuration_falcon.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Cleanup config docstring

* Update src/transformers/models/falcon/configuration_falcon.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Convert to relative imports

* Remove torch < 1.8 warning

* Restructure cos_sin header

* qkv -> query, key, value

* Refactor attention calculation

* Add a couple of config variables to account for the different checkpoints

* Successful merging of the code paths!

* Fix misplaced line in the non-parallel attention path

* Update config and tests

* Add a pad_token_id when testing

* Support output_attentions when alibi is None

* make fixup

* Skip KV cache shape test

* No more _keys_to_ignore_on_load_missing

* Simplify self attention a bit

* Simplify self attention a bit

* make fixup

* stash commit

* Some more attention mask updates

* Should pass all tests except assisted generation!

* Add big model generation test

* make fixup

* Add temporary workaround for test

* Test overrides for assisted generation

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/models/falcon/test_modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Test overrides for assisted generation

* Add generation demo

* Update copyright

* Make the docstring model actually small

* Add module-level docstring

* Remove all assertions

* Add copied from bloom

* Reformat the QKV layer

* Add copied from bloom

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove unused line and reformat

* No single letter variables

* Cleanup return names

* Add copied from line

* Remove the deprecated arguments blocks

* Change the embeddings test to an alibi on/off test

* Remove position_ids from FalconForQA

* Remove old check for token type IDs

* Fix the alibi path when multi_query is False

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/falcon/test_modeling_falcon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update config naming

* Fix typo for new_decoder_architecture

* Add some comments

* Fix docstring

* Fix docstring

* Create range in the right dtype from the start

* Review comment cleanup

* n_head_kv -> num_kv_heads

* self.alibi -> self.use_alibi

* self.num_kv -> self.num_kv_heads

* Reorder config args

* Made alibi arguments Optional

* Add all model docstrings

* Add extra checkpoints

* Add author info for Falcon

* Stop removing token_type_ids because our checkpoints shouldn't return it anymore

* Add one hopeful comment for the future

* Fix typo

* Update tests, fix cache issue for generation

* Use -1e9 instead of -inf to avoid float overflow

* Recompute the rotary embeddings much less often

* Re-enable disabled tests

* One final fix to attention mask calculation, and update tests

* Cleanup targeting falcon-40b equivalency

* Post-rebase docs update

* Update docstrings, especially in the config

* More descriptive variable names, and comments where we can't rename them

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
andreaskoepf added a commit to LAION-AI/Open-Assistant that referenced this pull request Jul 19, 2023
Enable the `use_flash_attention` configuration flag for Falcon models.
When `use_flash_attention` is set to `true` the
[FalconAttention.forwad()](https://github.com/huggingface/transformers/blob/c965d302791cf935d6ea7776428749be678cf509/src/transformers/models/falcon/modeling_falcon.py#L281)
method is replaced with a variant that uses Tri Dao's flash_attention
instead of pytorch's `scaled_dot_product_attention` function.

At the moment the patch works only for falcon-7b but technically it will
also work for falcon-40b with the right configuration. The falcon model
situation is currently a bit messy: The Falcon model was recently added
to Huggingface transformers (see [PR
transformers#24523](huggingface/transformers#24523))
but the falcon models on the hugginface hub use still the code which is
shipped together with the weights (a PR to change this [was
reverted](https://huggingface.co/tiiuae/falcon-7b/discussions/66)).
Falcon-7b and 40b use both slightly different code (which was unified in
the HF transformers impl and can there be controlled via a configuration
member called `new_decoder_architecture` see
[configuration_falcon.py#L65-L67](https://github.com/huggingface/transformers/blob/main/src/transformers/models/falcon/configuration_falcon.py#L65-L67)).
The HF Falcon impl uses different names in the configuration class, e.g.
compare new
[configuration_falcon.py](https://github.com/huggingface/transformers/blob/main/src/transformers/models/falcon/configuration_falcon.py)
and old
[configuration_RW.py](https://huggingface.co/tiiuae/falcon-7b/blob/main/configuration_RW.py)

HF Falcon implementation compatible model configurations can be found
here:
7B:
[config.json](https://huggingface.co/tiiuae/falcon-7b/blob/4e2d06f0a7c6370ebabbc30c6f59377ae8f73d76/config.json)
40B:
[config.json](https://huggingface.co/tiiuae/falcon-40b/blob/f1ba7d328c06aa6fbb4a8afd3c756f46d7e6b232/config.json)
blbadger pushed a commit to blbadger/transformers that referenced this pull request Nov 8, 2023
* Initial commit

* Update src/transformers/models/falcon/configuration_falcon.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/falcon/configuration_falcon.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Cleanup config docstring

* Update src/transformers/models/falcon/configuration_falcon.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Convert to relative imports

* Remove torch < 1.8 warning

* Restructure cos_sin header

* qkv -> query, key, value

* Refactor attention calculation

* Add a couple of config variables to account for the different checkpoints

* Successful merging of the code paths!

* Fix misplaced line in the non-parallel attention path

* Update config and tests

* Add a pad_token_id when testing

* Support output_attentions when alibi is None

* make fixup

* Skip KV cache shape test

* No more _keys_to_ignore_on_load_missing

* Simplify self attention a bit

* Simplify self attention a bit

* make fixup

* stash commit

* Some more attention mask updates

* Should pass all tests except assisted generation!

* Add big model generation test

* make fixup

* Add temporary workaround for test

* Test overrides for assisted generation

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/models/falcon/test_modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Test overrides for assisted generation

* Add generation demo

* Update copyright

* Make the docstring model actually small

* Add module-level docstring

* Remove all assertions

* Add copied from bloom

* Reformat the QKV layer

* Add copied from bloom

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove unused line and reformat

* No single letter variables

* Cleanup return names

* Add copied from line

* Remove the deprecated arguments blocks

* Change the embeddings test to an alibi on/off test

* Remove position_ids from FalconForQA

* Remove old check for token type IDs

* Fix the alibi path when multi_query is False

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/falcon/test_modeling_falcon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update config naming

* Fix typo for new_decoder_architecture

* Add some comments

* Fix docstring

* Fix docstring

* Create range in the right dtype from the start

* Review comment cleanup

* n_head_kv -> num_kv_heads

* self.alibi -> self.use_alibi

* self.num_kv -> self.num_kv_heads

* Reorder config args

* Made alibi arguments Optional

* Add all model docstrings

* Add extra checkpoints

* Add author info for Falcon

* Stop removing token_type_ids because our checkpoints shouldn't return it anymore

* Add one hopeful comment for the future

* Fix typo

* Update tests, fix cache issue for generation

* Use -1e9 instead of -inf to avoid float overflow

* Recompute the rotary embeddings much less often

* Re-enable disabled tests

* One final fix to attention mask calculation, and update tests

* Cleanup targeting falcon-40b equivalency

* Post-rebase docs update

* Update docstrings, especially in the config

* More descriptive variable names, and comments where we can't rename them

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants