upgrade other-than-pytorch folders under examples/

moreh-dev · Aug 28, 2024 · 781a215 · 781a215
1 parent c0d0614
commit 781a215
Show file tree

Hide file tree

Showing 248 changed files with 3,515 additions and 2,024 deletions.
diff --git a/examples/README.md b/examples/README.md
@@ -17,9 +17,9 @@ limitations under the License.
 
 We host a wide range of example scripts for multiple learning frameworks. Simply choose your favorite: [TensorFlow](https://github.com/huggingface/transformers/tree/main/examples/tensorflow), [PyTorch](https://github.com/huggingface/transformers/tree/main/examples/pytorch) or [JAX/Flax](https://github.com/huggingface/transformers/tree/main/examples/flax).
 
-We also have some [research projects](https://github.com/huggingface/transformers/tree/main/examples/research_projects), as well as some [legacy examples](https://github.com/huggingface/transformers/tree/main/examples/legacy). Note that unlike the main examples these are not actively maintained, and may require specific older versions of dependencies in order to run. 
+We also have some [research projects](https://github.com/huggingface/transformers/tree/main/examples/research_projects), as well as some [legacy examples](https://github.com/huggingface/transformers/tree/main/examples/legacy). Note that unlike the main examples these are not actively maintained, and may require specific older versions of dependencies in order to run.
 
-While we strive to present as many use cases as possible, the example scripts are just that - examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, most of the examples fully expose the preprocessing of the data, allowing you to tweak and edit them as required.
+While we strive to present as many use cases as possible, the example scripts are just that - examples. It is expected that they won't work out-of-the-box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, most of the examples fully expose the preprocessing of the data, allowing you to tweak and edit them as required.
 
 Please discuss on the [forum](https://discuss.huggingface.co/) or in an [issue](https://github.com/huggingface/transformers/issues) a feature you would like to implement in an example before submitting a PR; we welcome bug fixes, but since we want to keep the examples as simple as possible it's unlikely that we will merge a pull request adding more functionality at the cost of readability.
 
@@ -97,16 +97,16 @@ and run the example command as usual afterward.
 
 ## Running the Examples on Remote Hardware with Auto-Setup
 
-[run_on_remote.py](./run_on_remote.py) is a script that launches any example on remote self-hosted hardware, 
-with automatic hardware and environment setup. It uses [Runhouse](https://github.com/run-house/runhouse) to launch 
-on self-hosted hardware (e.g. in your own cloud account or on-premise cluster) but there are other options 
-for running remotely as well. You can easily customize the example used, command line arguments, dependencies, 
+[run_on_remote.py](./run_on_remote.py) is a script that launches any example on remote self-hosted hardware,
+with automatic hardware and environment setup. It uses [Runhouse](https://github.com/run-house/runhouse) to launch
+on self-hosted hardware (e.g. in your own cloud account or on-premise cluster) but there are other options
+for running remotely as well. You can easily customize the example used, command line arguments, dependencies,
 and type of compute hardware, and then run the script to automatically launch the example.
 
-You can refer to 
-[hardware setup](https://runhouse-docs.readthedocs-hosted.com/en/main/rh_primitives/cluster.html#hardware-setup)
+You can refer to
+[hardware setup](https://runhouse-docs.readthedocs-hosted.com/en/latest/api/python/cluster.html#hardware-setup)
 for more information about hardware and dependency setup with Runhouse, or this
-[Colab tutorial](https://colab.research.google.com/drive/1sh_aNQzJX5BKAdNeXthTNGxKz7sM9VPc) for a more in-depth 
+[Colab tutorial](https://colab.research.google.com/drive/1sh_aNQzJX5BKAdNeXthTNGxKz7sM9VPc) for a more in-depth
 walkthrough.
 
 You can run the script with the following commands:
@@ -119,7 +119,7 @@ pip install runhouse
 python run_on_remote.py \
     --example pytorch/text-generation/run_generation.py \
     --model_type=gpt2 \
-    --model_name_or_path=gpt2 \
+    --model_name_or_path=openai-community/gpt2 \
     --prompt "I am a language model and"
 
 # For byo (bring your own) cluster:
@@ -131,4 +131,4 @@ python run_on_remote.py --instance <instance> --provider <provider> \
   --example <example> <args>
 ```
 
-You can also adapt the script to your own needs.
+You can also adapt the script to your own needs.
diff --git a/examples/diff-conversion/README.md b/examples/diff-conversion/README.md
@@ -0,0 +1,20 @@
+# Using the `diff_converter` linter
+
+`pip install libcst` is a must!
+
+# `sh examples/diff-conversion/convert_examples.sh` to get the converted outputs
+
+The diff converter is a new `linter` specific to `transformers`. It allows us to unpack inheritance in python to convert a modular `diff` file like `diff_gemma.py` into a `single model single file`. 
+
+Examples of possible usage are available in the `examples/diff-conversion`, or `diff_gemma` for a full model usage.
+
+`python utils/diff_model_converter.py --files_to_parse "/Users/arthurzucker/Work/transformers/examples/diff-conversion/diff_my_new_model2.py"`
+
+## How it works
+We use the `libcst` parser to produce an AST representation of the `diff_xxx.py` file. For any imports that are made from `transformers.models.modeling_xxxx` we parse the source code of that module, and build a class dependency mapping, which allows us to unpack the difference dependencies.
+
+The code from the `diff` file and the class dependency mapping are "merged" to produce the single model single file. 
+We use ruff to automatically remove the potential duplicate imports.
+
+## Why we use libcst instead of the native AST?
+AST is super powerful, but it does not keep the `docstring`, `comment` or code formatting. Thus we decided to go with `libcst`
diff --git a/examples/diff-conversion/convert_examples.sh b/examples/diff-conversion/convert_examples.sh
@@ -0,0 +1,10 @@
+#!/bin/bash
+
+# Iterate over each file in the current directory
+for file in examples/diff-conversion/diff_*; do
+    # Check if it's a regular file
+    if [ -f "$file" ]; then
+        # Call the Python script with the file name as an argument
+        python utils/diff_model_converter.py --files_to_parse "$file"
+    fi
+done
diff --git a/examples/diff-conversion/diff_dummy.py b/examples/diff-conversion/diff_dummy.py
@@ -0,0 +1,44 @@
+from math import log
+from typing import List, Optional, Tuple, Union
+
+import torch
+
+from transformers import Cache
+from transformers.modeling_outputs import CausalLMOutputWithPast
+from transformers.models.llama.modeling_llama import LlamaModel
+
+
+def _pre_process_input(input_ids):
+    print(log(input_ids))
+    return input_ids
+
+
+# example where we need some deps and some functions
+class DummyModel(LlamaModel):
+    def forward(
+        self,
+        input_ids: torch.LongTensor = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.LongTensor] = None,
+        past_key_values: Optional[Union[Cache, List[torch.FloatTensor]]] = None,
+        inputs_embeds: Optional[torch.FloatTensor] = None,
+        use_cache: Optional[bool] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+        cache_position: Optional[torch.LongTensor] = None,
+    ) -> Union[Tuple, CausalLMOutputWithPast]:
+        input_ids = _pre_process_input(input_ids)
+
+        return super().forward(
+            None,
+            attention_mask,
+            position_ids,
+            past_key_values,
+            inputs_embeds,
+            use_cache,
+            output_attentions,
+            output_hidden_states,
+            return_dict,
+            cache_position,
+        )
diff --git a/examples/diff-conversion/diff_my_new_model.py b/examples/diff-conversion/diff_my_new_model.py
@@ -0,0 +1,14 @@
+from transformers.models.llama.configuration_llama import LlamaConfig
+
+
+# Example where we only want to only add a new config argument and new arg doc
+# here there is no `ARG` so we are gonna take parent doc
+class MyNewModelConfig(LlamaConfig):
+    r"""
+    mlp_bias (`bool`, *optional*, defaults to `False`)
+    """
+
+    def __init__(self, mlp_bias=True, new_param=0, **super_kwargs):
+        self.mlp_bias = mlp_bias
+        self.new_param = new_param
+        super().__init__(self, **super_kwargs)
diff --git a/examples/diff-conversion/diff_my_new_model2.py b/examples/diff-conversion/diff_my_new_model2.py
@@ -0,0 +1,31 @@
+from transformers.models.gemma.modeling_gemma import GemmaForSequenceClassification
+from transformers.models.llama.configuration_llama import LlamaConfig
+
+
+# Example where we only want to only modify the docstring
+class MyNewModel2Config(LlamaConfig):
+    r"""
+    This is the configuration class to store the configuration of a [`GemmaModel`]. It is used to instantiate an Gemma
+    model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
+    defaults will yield a similar configuration to that of the Gemma-7B.
+    e.g. [google/gemma-7b](https://huggingface.co/google/gemma-7b)
+    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
+    documentation from [`PretrainedConfig`] for more information.
+    Args:
+        vocab_size (`int`, *optional*, defaults to 256000):
+            Vocabulary size of the Gemma model. Defines the number of different tokens that can be represented by the
+            `inputs_ids` passed when calling [`GemmaModel`]
+    ```python
+    >>> from transformers import GemmaModel, GemmaConfig
+    >>> # Initializing a Gemma gemma-7b style configuration
+    >>> configuration = GemmaConfig()
+    >>> # Initializing a model from the gemma-7b style configuration
+    >>> model = GemmaModel(configuration)
+    >>> # Accessing the model configuration
+    >>> configuration = model.config
+    ```"""
+
+
+# Example where alllllll the dependencies are fetched to just copy the entire class
+class MyNewModel2ForSequenceClassification(GemmaForSequenceClassification):
+    pass
diff --git a/examples/diff-conversion/diff_new_model.py b/examples/diff-conversion/diff_new_model.py
@@ -0,0 +1,30 @@
+# Example where we only want to overwrite the defaults of an init
+
+from transformers.models.gemma.configuration_gemma import GemmaConfig
+
+
+class NewModelConfig(GemmaConfig):
+    def __init__(
+        self,
+        vocab_size=256030,
+        hidden_size=64,
+        intermediate_size=90,
+        num_hidden_layers=28,
+        num_attention_heads=16,
+        num_key_value_heads=16,
+        head_dim=256,
+        hidden_act="gelu_pytorch_tanh",
+        hidden_activation=None,
+        max_position_embeddings=1500,
+        initializer_range=0.02,
+        rms_norm_eps=1e-6,
+        use_cache=True,
+        pad_token_id=0,
+        eos_token_id=1,
+        bos_token_id=2,
+        tie_word_embeddings=True,
+        rope_theta=10000.0,
+        attention_bias=False,
+        attention_dropout=0.0,
+    ):
+        super().__init__(self)
diff --git a/examples/diff-conversion/diff_super.py b/examples/diff-conversion/diff_super.py
@@ -0,0 +1,38 @@
+from typing import List, Optional, Tuple, Union
+
+import torch
+
+from transformers import Cache
+from transformers.modeling_outputs import CausalLMOutputWithPast
+from transformers.models.llama.modeling_llama import LlamaModel
+
+
+# example where we need some deps and some functions
+class SuperModel(LlamaModel):
+    def forward(
+        self,
+        input_ids: torch.LongTensor = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.LongTensor] = None,
+        past_key_values: Optional[Union[Cache, List[torch.FloatTensor]]] = None,
+        inputs_embeds: Optional[torch.FloatTensor] = None,
+        use_cache: Optional[bool] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+        cache_position: Optional[torch.LongTensor] = None,
+    ) -> Union[Tuple, CausalLMOutputWithPast]:
+        out = super().forward(
+            input_ids,
+            attention_mask,
+            position_ids,
+            past_key_values,
+            inputs_embeds,
+            use_cache,
+            output_attentions,
+            output_hidden_states,
+            return_dict,
+            cache_position,
+        )
+        out.logits *= 2**4
+        return out
diff --git a/examples/flax/_tests_requirements.txt b/examples/flax/_tests_requirements.txt
@@ -1,8 +1,10 @@
-datasets >= 1.1.3
-pytest
+datasets >= 1.13.3
+pytest<8.0.1
 conllu
 nltk
 rouge-score
 seqeval
 tensorboard
-evaluate >= 0.2.0
+evaluate >= 0.2.0
+torch
+accelerate
diff --git a/examples/flax/conftest.py b/examples/flax/conftest.py
@@ -21,7 +21,7 @@
 
 
 # allow having multiple repository checkouts and not needing to remember to rerun
-# 'pip install -e .[dev]' when switching between checkouts and running tests.
+# `pip install -e '.[dev]'` when switching between checkouts and running tests.
 git_repo_path = abspath(join(dirname(dirname(dirname(__file__))), "src"))
 sys.path.insert(1, git_repo_path)
 

diff --git a/examples/flax/image-captioning/README.md b/examples/flax/image-captioning/README.md
@@ -1,7 +1,7 @@
 # Image Captioning (vision-encoder-text-decoder model) training example
 
 The following example showcases how to finetune a vision-encoder-text-decoder model for image captioning
-using the JAX/Flax backend, leveraging 🤗 Transformers library's [FlaxVisionEncoderDecoderModel](https://huggingface.co/docs/transformers/model_doc/visionencoderdecoder#transformers.FlaxVisionEncoderDecoderModel).
+using the JAX/Flax backend, leveraging 🤗 Transformers library's [FlaxVisionEncoderDecoderModel](https://huggingface.co/docs/transformers/model_doc/vision-encoder-decoder#transformers.FlaxVisionEncoderDecoderModel).
 
 JAX/Flax allows you to trace pure functions and compile them into efficient, fused accelerator code on both GPU and TPU.
 Models written in JAX/Flax are **immutable** and updated in a purely functional
@@ -10,7 +10,7 @@ way which enables simple and efficient model parallelism.
 `run_image_captioning_flax.py` is a lightweight example of how to download and preprocess a dataset from the 🤗 Datasets
 library or use your own files (jsonlines or csv), then fine-tune one of the architectures above on it.
 
-For custom datasets in `jsonlines` format please see: https://huggingface.co/docs/datasets/loading_datasets.html#json-files and you also will find examples of these below.
+For custom datasets in `jsonlines` format please see: https://huggingface.co/docs/datasets/loading_datasets#json-files and you also will find examples of these below.
 
 ### Download COCO dataset (2017)
 This example uses COCO dataset (2017) through a custom dataset script, which requires users to manually download the
@@ -34,7 +34,7 @@ Next, we create a [FlaxVisionEncoderDecoderModel](https://huggingface.co/docs/tr
 python3 create_model_from_encoder_decoder_models.py \
     --output_dir model \
     --encoder_model_name_or_path google/vit-base-patch16-224-in21k \
-    --decoder_model_name_or_path gpt2
+    --decoder_model_name_or_path openai-community/gpt2
 ```
 
 ### Train the model

diff --git a/examples/flax/image-captioning/create_model_from_encoder_decoder_models.py b/examples/flax/image-captioning/create_model_from_encoder_decoder_models.py
@@ -37,15 +37,15 @@ class ModelArguments:
     encoder_model_name_or_path: str = field(
         metadata={
             "help": (
-                "The encoder model checkpoint for weights initialization."
+                "The encoder model checkpoint for weights initialization. "
                 "Don't set if you want to train an encoder model from scratch."
             )
         },
     )
     decoder_model_name_or_path: str = field(
         metadata={
             "help": (
-                "The decoder model checkpoint for weights initialization."
+                "The decoder model checkpoint for weights initialization. "
                 "Don't set if you want to train a decoder model from scratch."
             )
         },