Skip to content

Commit

Permalink
upgrade other-than-pytorch folders under examples/
Browse files Browse the repository at this point in the history
  • Loading branch information
daitran-moreh committed Aug 28, 2024
1 parent c0d0614 commit 781a215
Show file tree
Hide file tree
Showing 248 changed files with 3,515 additions and 2,024 deletions.
22 changes: 11 additions & 11 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ limitations under the License.

We host a wide range of example scripts for multiple learning frameworks. Simply choose your favorite: [TensorFlow](https://github.com/huggingface/transformers/tree/main/examples/tensorflow), [PyTorch](https://github.com/huggingface/transformers/tree/main/examples/pytorch) or [JAX/Flax](https://github.com/huggingface/transformers/tree/main/examples/flax).

We also have some [research projects](https://github.com/huggingface/transformers/tree/main/examples/research_projects), as well as some [legacy examples](https://github.com/huggingface/transformers/tree/main/examples/legacy). Note that unlike the main examples these are not actively maintained, and may require specific older versions of dependencies in order to run.
We also have some [research projects](https://github.com/huggingface/transformers/tree/main/examples/research_projects), as well as some [legacy examples](https://github.com/huggingface/transformers/tree/main/examples/legacy). Note that unlike the main examples these are not actively maintained, and may require specific older versions of dependencies in order to run.

While we strive to present as many use cases as possible, the example scripts are just that - examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, most of the examples fully expose the preprocessing of the data, allowing you to tweak and edit them as required.
While we strive to present as many use cases as possible, the example scripts are just that - examples. It is expected that they won't work out-of-the-box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, most of the examples fully expose the preprocessing of the data, allowing you to tweak and edit them as required.

Please discuss on the [forum](https://discuss.huggingface.co/) or in an [issue](https://github.com/huggingface/transformers/issues) a feature you would like to implement in an example before submitting a PR; we welcome bug fixes, but since we want to keep the examples as simple as possible it's unlikely that we will merge a pull request adding more functionality at the cost of readability.

Expand Down Expand Up @@ -97,16 +97,16 @@ and run the example command as usual afterward.

## Running the Examples on Remote Hardware with Auto-Setup

[run_on_remote.py](./run_on_remote.py) is a script that launches any example on remote self-hosted hardware,
with automatic hardware and environment setup. It uses [Runhouse](https://github.com/run-house/runhouse) to launch
on self-hosted hardware (e.g. in your own cloud account or on-premise cluster) but there are other options
for running remotely as well. You can easily customize the example used, command line arguments, dependencies,
[run_on_remote.py](./run_on_remote.py) is a script that launches any example on remote self-hosted hardware,
with automatic hardware and environment setup. It uses [Runhouse](https://github.com/run-house/runhouse) to launch
on self-hosted hardware (e.g. in your own cloud account or on-premise cluster) but there are other options
for running remotely as well. You can easily customize the example used, command line arguments, dependencies,
and type of compute hardware, and then run the script to automatically launch the example.

You can refer to
[hardware setup](https://runhouse-docs.readthedocs-hosted.com/en/main/rh_primitives/cluster.html#hardware-setup)
You can refer to
[hardware setup](https://runhouse-docs.readthedocs-hosted.com/en/latest/api/python/cluster.html#hardware-setup)
for more information about hardware and dependency setup with Runhouse, or this
[Colab tutorial](https://colab.research.google.com/drive/1sh_aNQzJX5BKAdNeXthTNGxKz7sM9VPc) for a more in-depth
[Colab tutorial](https://colab.research.google.com/drive/1sh_aNQzJX5BKAdNeXthTNGxKz7sM9VPc) for a more in-depth
walkthrough.

You can run the script with the following commands:
Expand All @@ -119,7 +119,7 @@ pip install runhouse
python run_on_remote.py \
--example pytorch/text-generation/run_generation.py \
--model_type=gpt2 \
--model_name_or_path=gpt2 \
--model_name_or_path=openai-community/gpt2 \
--prompt "I am a language model and"

# For byo (bring your own) cluster:
Expand All @@ -131,4 +131,4 @@ python run_on_remote.py --instance <instance> --provider <provider> \
--example <example> <args>
```

You can also adapt the script to your own needs.
You can also adapt the script to your own needs.
20 changes: 20 additions & 0 deletions examples/diff-conversion/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Using the `diff_converter` linter

`pip install libcst` is a must!

# `sh examples/diff-conversion/convert_examples.sh` to get the converted outputs

The diff converter is a new `linter` specific to `transformers`. It allows us to unpack inheritance in python to convert a modular `diff` file like `diff_gemma.py` into a `single model single file`.

Examples of possible usage are available in the `examples/diff-conversion`, or `diff_gemma` for a full model usage.

`python utils/diff_model_converter.py --files_to_parse "/Users/arthurzucker/Work/transformers/examples/diff-conversion/diff_my_new_model2.py"`

## How it works
We use the `libcst` parser to produce an AST representation of the `diff_xxx.py` file. For any imports that are made from `transformers.models.modeling_xxxx` we parse the source code of that module, and build a class dependency mapping, which allows us to unpack the difference dependencies.

The code from the `diff` file and the class dependency mapping are "merged" to produce the single model single file.
We use ruff to automatically remove the potential duplicate imports.

## Why we use libcst instead of the native AST?
AST is super powerful, but it does not keep the `docstring`, `comment` or code formatting. Thus we decided to go with `libcst`
10 changes: 10 additions & 0 deletions examples/diff-conversion/convert_examples.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash

# Iterate over each file in the current directory
for file in examples/diff-conversion/diff_*; do
# Check if it's a regular file
if [ -f "$file" ]; then
# Call the Python script with the file name as an argument
python utils/diff_model_converter.py --files_to_parse "$file"
fi
done
44 changes: 44 additions & 0 deletions examples/diff-conversion/diff_dummy.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
from math import log
from typing import List, Optional, Tuple, Union

import torch

from transformers import Cache
from transformers.modeling_outputs import CausalLMOutputWithPast
from transformers.models.llama.modeling_llama import LlamaModel


def _pre_process_input(input_ids):
print(log(input_ids))
return input_ids


# example where we need some deps and some functions
class DummyModel(LlamaModel):
def forward(
self,
input_ids: torch.LongTensor = None,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
past_key_values: Optional[Union[Cache, List[torch.FloatTensor]]] = None,
inputs_embeds: Optional[torch.FloatTensor] = None,
use_cache: Optional[bool] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
cache_position: Optional[torch.LongTensor] = None,
) -> Union[Tuple, CausalLMOutputWithPast]:
input_ids = _pre_process_input(input_ids)

return super().forward(
None,
attention_mask,
position_ids,
past_key_values,
inputs_embeds,
use_cache,
output_attentions,
output_hidden_states,
return_dict,
cache_position,
)
14 changes: 14 additions & 0 deletions examples/diff-conversion/diff_my_new_model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
from transformers.models.llama.configuration_llama import LlamaConfig


# Example where we only want to only add a new config argument and new arg doc
# here there is no `ARG` so we are gonna take parent doc
class MyNewModelConfig(LlamaConfig):
r"""
mlp_bias (`bool`, *optional*, defaults to `False`)
"""

def __init__(self, mlp_bias=True, new_param=0, **super_kwargs):
self.mlp_bias = mlp_bias
self.new_param = new_param
super().__init__(self, **super_kwargs)
31 changes: 31 additions & 0 deletions examples/diff-conversion/diff_my_new_model2.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
from transformers.models.gemma.modeling_gemma import GemmaForSequenceClassification
from transformers.models.llama.configuration_llama import LlamaConfig


# Example where we only want to only modify the docstring
class MyNewModel2Config(LlamaConfig):
r"""
This is the configuration class to store the configuration of a [`GemmaModel`]. It is used to instantiate an Gemma
model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
defaults will yield a similar configuration to that of the Gemma-7B.
e.g. [google/gemma-7b](https://huggingface.co/google/gemma-7b)
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
Args:
vocab_size (`int`, *optional*, defaults to 256000):
Vocabulary size of the Gemma model. Defines the number of different tokens that can be represented by the
`inputs_ids` passed when calling [`GemmaModel`]
```python
>>> from transformers import GemmaModel, GemmaConfig
>>> # Initializing a Gemma gemma-7b style configuration
>>> configuration = GemmaConfig()
>>> # Initializing a model from the gemma-7b style configuration
>>> model = GemmaModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
```"""


# Example where alllllll the dependencies are fetched to just copy the entire class
class MyNewModel2ForSequenceClassification(GemmaForSequenceClassification):
pass
30 changes: 30 additions & 0 deletions examples/diff-conversion/diff_new_model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Example where we only want to overwrite the defaults of an init

from transformers.models.gemma.configuration_gemma import GemmaConfig


class NewModelConfig(GemmaConfig):
def __init__(
self,
vocab_size=256030,
hidden_size=64,
intermediate_size=90,
num_hidden_layers=28,
num_attention_heads=16,
num_key_value_heads=16,
head_dim=256,
hidden_act="gelu_pytorch_tanh",
hidden_activation=None,
max_position_embeddings=1500,
initializer_range=0.02,
rms_norm_eps=1e-6,
use_cache=True,
pad_token_id=0,
eos_token_id=1,
bos_token_id=2,
tie_word_embeddings=True,
rope_theta=10000.0,
attention_bias=False,
attention_dropout=0.0,
):
super().__init__(self)
38 changes: 38 additions & 0 deletions examples/diff-conversion/diff_super.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
from typing import List, Optional, Tuple, Union

import torch

from transformers import Cache
from transformers.modeling_outputs import CausalLMOutputWithPast
from transformers.models.llama.modeling_llama import LlamaModel


# example where we need some deps and some functions
class SuperModel(LlamaModel):
def forward(
self,
input_ids: torch.LongTensor = None,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
past_key_values: Optional[Union[Cache, List[torch.FloatTensor]]] = None,
inputs_embeds: Optional[torch.FloatTensor] = None,
use_cache: Optional[bool] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
cache_position: Optional[torch.LongTensor] = None,
) -> Union[Tuple, CausalLMOutputWithPast]:
out = super().forward(
input_ids,
attention_mask,
position_ids,
past_key_values,
inputs_embeds,
use_cache,
output_attentions,
output_hidden_states,
return_dict,
cache_position,
)
out.logits *= 2**4
return out
8 changes: 5 additions & 3 deletions examples/flax/_tests_requirements.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
datasets >= 1.1.3
pytest
datasets >= 1.13.3
pytest<8.0.1
conllu
nltk
rouge-score
seqeval
tensorboard
evaluate >= 0.2.0
evaluate >= 0.2.0
torch
accelerate
2 changes: 1 addition & 1 deletion examples/flax/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@


# allow having multiple repository checkouts and not needing to remember to rerun
# 'pip install -e .[dev]' when switching between checkouts and running tests.
# `pip install -e '.[dev]'` when switching between checkouts and running tests.
git_repo_path = abspath(join(dirname(dirname(dirname(__file__))), "src"))
sys.path.insert(1, git_repo_path)

Expand Down
6 changes: 3 additions & 3 deletions examples/flax/image-captioning/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Image Captioning (vision-encoder-text-decoder model) training example

The following example showcases how to finetune a vision-encoder-text-decoder model for image captioning
using the JAX/Flax backend, leveraging 🤗 Transformers library's [FlaxVisionEncoderDecoderModel](https://huggingface.co/docs/transformers/model_doc/visionencoderdecoder#transformers.FlaxVisionEncoderDecoderModel).
using the JAX/Flax backend, leveraging 🤗 Transformers library's [FlaxVisionEncoderDecoderModel](https://huggingface.co/docs/transformers/model_doc/vision-encoder-decoder#transformers.FlaxVisionEncoderDecoderModel).

JAX/Flax allows you to trace pure functions and compile them into efficient, fused accelerator code on both GPU and TPU.
Models written in JAX/Flax are **immutable** and updated in a purely functional
Expand All @@ -10,7 +10,7 @@ way which enables simple and efficient model parallelism.
`run_image_captioning_flax.py` is a lightweight example of how to download and preprocess a dataset from the 🤗 Datasets
library or use your own files (jsonlines or csv), then fine-tune one of the architectures above on it.

For custom datasets in `jsonlines` format please see: https://huggingface.co/docs/datasets/loading_datasets.html#json-files and you also will find examples of these below.
For custom datasets in `jsonlines` format please see: https://huggingface.co/docs/datasets/loading_datasets#json-files and you also will find examples of these below.

### Download COCO dataset (2017)
This example uses COCO dataset (2017) through a custom dataset script, which requires users to manually download the
Expand All @@ -34,7 +34,7 @@ Next, we create a [FlaxVisionEncoderDecoderModel](https://huggingface.co/docs/tr
python3 create_model_from_encoder_decoder_models.py \
--output_dir model \
--encoder_model_name_or_path google/vit-base-patch16-224-in21k \
--decoder_model_name_or_path gpt2
--decoder_model_name_or_path openai-community/gpt2
```

### Train the model
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,15 +37,15 @@ class ModelArguments:
encoder_model_name_or_path: str = field(
metadata={
"help": (
"The encoder model checkpoint for weights initialization."
"The encoder model checkpoint for weights initialization. "
"Don't set if you want to train an encoder model from scratch."
)
},
)
decoder_model_name_or_path: str = field(
metadata={
"help": (
"The decoder model checkpoint for weights initialization."
"The decoder model checkpoint for weights initialization. "
"Don't set if you want to train a decoder model from scratch."
)
},
Expand Down
Loading

0 comments on commit 781a215

Please sign in to comment.