Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT-J-6B #13022

Merged
merged 133 commits into from
Aug 31, 2021
Merged

GPT-J-6B #13022

Show file tree
Hide file tree
Changes from 76 commits
Commits
Show all changes
133 commits
Select commit Hold shift + click to select a range
f01d47f
Test GPTJ implementation
StellaAthena Aug 4, 2021
44ccee7
fix conflicts
StellaAthena Aug 4, 2021
fe991bf
Fixed conflicts
StellaAthena Aug 4, 2021
54f2b33
Update __init__.py
StellaAthena Aug 4, 2021
e2ce2a3
Update __init__.py
StellaAthena Aug 4, 2021
e59e579
change GPT_J to GPTJ
kurumuz Aug 4, 2021
e2329b4
fix missing imports and typos
kurumuz Aug 4, 2021
1bee4ee
use einops for now
kurumuz Aug 4, 2021
03b7278
Use torch ops instead of einsum
kurumuz Aug 4, 2021
8034f2c
remove einops deps
kurumuz Aug 4, 2021
f86b47b
Merge pull request #1 from kurumuz/gptj_fixes
StellaAthena Aug 4, 2021
0a344ba
Merge branch 'huggingface:master' into master
StellaAthena Aug 5, 2021
194d024
Update configuration_auto.py
StellaAthena Aug 5, 2021
06f07da
Added GPT J
StellaAthena Aug 5, 2021
979bff8
Update gptj.rst
StellaAthena Aug 5, 2021
30635c1
Update __init__.py
StellaAthena Aug 5, 2021
bae5e27
Update test_modeling_gptj.py
StellaAthena Aug 5, 2021
1bcf933
Added GPT J
StellaAthena Aug 5, 2021
12a12a7
Changed configs to match GPT2 instead of GPT Neo
StellaAthena Aug 5, 2021
4efbbec
Removed non-existent sequence model
StellaAthena Aug 5, 2021
6877889
Update configuration_auto.py
StellaAthena Aug 5, 2021
cfaaae4
Update configuration_auto.py
StellaAthena Aug 5, 2021
e9860e9
Update configuration_auto.py
StellaAthena Aug 5, 2021
e8a2333
Update modeling_gptj.py
StellaAthena Aug 5, 2021
3bd2879
Update modeling_gptj.py
StellaAthena Aug 5, 2021
8c524f7
Progress on updating configs to agree with GPT2
StellaAthena Aug 6, 2021
f0c0a31
Update modeling_gptj.py
StellaAthena Aug 6, 2021
1ad512b
num_layers -> n_layer
StellaAthena Aug 6, 2021
89b8724
layer_norm_eps -> layer_norm_epsilon
StellaAthena Aug 6, 2021
76fc4e1
attention_layers -> num_hidden_layers
StellaAthena Aug 6, 2021
6284c7e
Update modeling_gptj.py
StellaAthena Aug 6, 2021
2d5cc30
attention_pdrop -> attn_pdrop
StellaAthena Aug 6, 2021
1ddbb63
hidden_act -> activation_function
StellaAthena Aug 6, 2021
b46551d
Update configuration_gptj.py
StellaAthena Aug 6, 2021
60daf97
Update configuration_gptj.py
StellaAthena Aug 6, 2021
1c9ba25
Update configuration_gptj.py
StellaAthena Aug 6, 2021
7f52c42
Update configuration_gptj.py
StellaAthena Aug 6, 2021
33380ca
Update configuration_gptj.py
StellaAthena Aug 6, 2021
05b2b3b
Update modeling_gptj.py
StellaAthena Aug 6, 2021
d6b86f8
Update modeling_gptj.py
StellaAthena Aug 6, 2021
2b633be
Update modeling_gptj.py
StellaAthena Aug 6, 2021
dde732d
Update modeling_gptj.py
StellaAthena Aug 6, 2021
89818be
Update modeling_gptj.py
StellaAthena Aug 6, 2021
0bc0c25
Update modeling_gptj.py
StellaAthena Aug 6, 2021
e9bc670
fix layernorm and lm_head size
kurumuz Aug 6, 2021
0ac5c64
Merge branch 'huggingface:master' into master
StellaAthena Aug 6, 2021
3553d8e
Update docs/source/model_doc/gptj.rst
StellaAthena Aug 6, 2021
6b6e41d
removed claim that GPT J uses local attention
StellaAthena Aug 6, 2021
5fc60db
Removed GPTJForSequenceClassification
StellaAthena Aug 6, 2021
d70f1f8
Update src/transformers/models/gptj/configuration_gptj.py
StellaAthena Aug 6, 2021
878518a
Removed unsupported boilerplate
StellaAthena Aug 6, 2021
a48ee07
Update tests/test_modeling_gptj.py
StellaAthena Aug 6, 2021
f189b6d
Update src/transformers/models/gptj/modeling_gptj.py
StellaAthena Aug 6, 2021
6059ef6
Update src/transformers/models/gptj/modeling_gptj.py
StellaAthena Aug 6, 2021
8793237
Update src/transformers/models/gptj/modeling_gptj.py
StellaAthena Aug 6, 2021
d5d758e
Update tests/test_modeling_gptj.py
StellaAthena Aug 6, 2021
58b77c1
Update tests/test_modeling_gptj.py
StellaAthena Aug 6, 2021
513fa3e
Update tests/test_modeling_gptj.py
StellaAthena Aug 6, 2021
9876057
Update src/transformers/models/gptj/modeling_gptj.py
StellaAthena Aug 6, 2021
de6c5af
Update __init__.py
StellaAthena Aug 7, 2021
3f21e98
Update configuration_gptj.py
StellaAthena Aug 7, 2021
316cc95
Update modeling_gptj.py
StellaAthena Aug 7, 2021
4be1484
Corrected indentation
StellaAthena Aug 7, 2021
80f4658
Remove stray backslash
EricHallahan Aug 8, 2021
f4d70d2
Delete .DS_Store
leogao2 Aug 8, 2021
28caefb
Delete .DS_Store
leogao2 Aug 8, 2021
6633eba
Delete .DS_Store
leogao2 Aug 8, 2021
2f92631
Delete .DS_Store
leogao2 Aug 8, 2021
23356a0
Delete .DS_Store
leogao2 Aug 8, 2021
dda2643
Update docs to match
leogao2 Aug 8, 2021
a31f11a
Remove tf loading
leogao2 Aug 8, 2021
cbf8dd1
Remove config.jax
leogao2 Aug 8, 2021
fed0955
Remove stray `else:` statement
EricHallahan Aug 8, 2021
0ae4be5
Remove references to `load_tf_weights_in_gptj`
EricHallahan Aug 8, 2021
3c6161d
Adapt tests to match output from GPT-J 6B
EricHallahan Aug 8, 2021
dd4f02d
Apply suggestions from code review
StellaAthena Aug 9, 2021
752595f
Default `activation_function` to `gelu_new`
EricHallahan Aug 9, 2021
7a032e5
Fix part of the config documentation
EricHallahan Aug 9, 2021
455c311
Revert "Update configuration_auto.py"
EricHallahan Aug 10, 2021
49ba5cc
Revert "Update configuration_auto.py"
EricHallahan Aug 10, 2021
3ebf87b
Revert "Update configuration_auto.py"
EricHallahan Aug 10, 2021
c844dcb
Revert "Update configuration_auto.py"
EricHallahan Aug 10, 2021
9af5cff
Hyphenate GPT-J
EricHallahan Aug 10, 2021
74a9777
Undid sorting of the models alphabetically
StellaAthena Aug 10, 2021
4a40d00
Reverting previous commit
StellaAthena Aug 10, 2021
176ec56
Merge branch 'master' into master
patil-suraj Aug 13, 2021
24ac25a
fix style and quality issues
patil-suraj Aug 13, 2021
d7ac30f
Update docs/source/model_doc/gptj.rst
StellaAthena Aug 14, 2021
6857e93
Update src/transformers/__init__.py
StellaAthena Aug 14, 2021
94db694
Update tests/test_modeling_gptj.py
StellaAthena Aug 14, 2021
5fa31e0
Update src/transformers/models/gptj/modeling_gptj.py
StellaAthena Aug 14, 2021
f38e019
Update src/transformers/__init__.py
StellaAthena Aug 14, 2021
e4a5f5a
Update src/transformers/models/gptj/modeling_gptj.py
StellaAthena Aug 14, 2021
2d0a2a0
Update src/transformers/models/gptj/modeling_gptj.py
StellaAthena Aug 14, 2021
7443fcb
Update src/transformers/models/gptj/configuration_gptj.py
StellaAthena Aug 14, 2021
b3c1a20
Update src/transformers/models/gptj/configuration_gptj.py
StellaAthena Aug 14, 2021
0bedd33
Update src/transformers/models/gptj/configuration_gptj.py
StellaAthena Aug 14, 2021
2507592
Update src/transformers/models/gptj/modeling_gptj.py
StellaAthena Aug 14, 2021
cd4713f
Update src/transformers/models/gptj/modeling_gptj.py
StellaAthena Aug 14, 2021
f0a3c0a
Update src/transformers/models/gptj/modeling_gptj.py
StellaAthena Aug 14, 2021
f046728
Update src/transformers/models/gptj/modeling_gptj.py
StellaAthena Aug 14, 2021
3ae0298
Update src/transformers/models/gptj/modeling_gptj.py
StellaAthena Aug 17, 2021
27aeb5f
Replaced GPTJ-specific code with generic code
StellaAthena Aug 17, 2021
8c8ee6b
Update src/transformers/models/gptj/modeling_gptj.py
StellaAthena Aug 17, 2021
53b801a
Made the code always use rotary positional encodings
StellaAthena Aug 17, 2021
ae18ff5
Update index.rst
StellaAthena Aug 17, 2021
c4d11ca
Fix documentation
StellaAthena Aug 17, 2021
4a37a78
Combine attention classes
EricHallahan Aug 17, 2021
d5e5f84
Removed `config.rotary_dim` from tests
StellaAthena Aug 17, 2021
3522e07
Merge branch 'huggingface:master' into master
StellaAthena Aug 17, 2021
c27e587
Update test_modeling_gptj.py
StellaAthena Aug 17, 2021
9eebb6f
Update test_modeling_gptj.py
StellaAthena Aug 17, 2021
ff301d3
Fix formatting
EricHallahan Aug 17, 2021
ff1eb1d
Removed depreciated argument `layer_id` to `GPTJAttention`
StellaAthena Aug 18, 2021
4c86bbc
Update modeling_gptj.py
StellaAthena Aug 18, 2021
1f99941
Update modeling_gptj.py
StellaAthena Aug 18, 2021
b6021cf
Fix code quality
EricHallahan Aug 18, 2021
d2c85a2
Restore model functionality
EricHallahan Aug 19, 2021
223bda1
Save `lm_head.weight` in checkpoints
EricHallahan Aug 22, 2021
ad567a9
Fix crashes when loading with reduced precision
EricHallahan Aug 22, 2021
af0c01d
refactor self._attn(...)` and rename layer weights"
Aug 23, 2021
c90eb3b
make sure logits are in fp32 for sampling
patrickvonplaten Aug 23, 2021
3897823
improve docs
patrickvonplaten Aug 23, 2021
504b339
Add `GPTJForCausalLM` to `TextGenerationPipeline` whitelist
EricHallahan Aug 25, 2021
7dcc7c5
Merge branch 'huggingface:master' into master
StellaAthena Aug 25, 2021
8cbaa1f
Added GPT-J to the README
StellaAthena Aug 25, 2021
bca938d
Fix doc/readme consistency
EricHallahan Aug 25, 2021
71d3300
Merge branch 'master' into master
StellaAthena Aug 27, 2021
22f8131
Add rough parallelization support
EricHallahan Aug 27, 2021
1d69e42
Clean up loose ends
EricHallahan Aug 30, 2021
bce04cb
Merge branch 'master' into master
EricHallahan Aug 30, 2021
4784eae
Fix index.rst
EricHallahan Aug 30, 2021
3466cd0
fix merge conflicts
patrickvonplaten Aug 31, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 38 additions & 34 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -194,104 +194,106 @@ Supported models
Luan, Dario Amodei** and Ilya Sutskever**.
31. :doc:`GPT Neo <model_doc/gpt_neo>` (from EleutherAI) released in the repository `EleutherAI/gpt-neo
<https://github.com/EleutherAI/gpt-neo>`__ by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
32. :doc:`Hubert <model_doc/hubert>` (from Facebook) released with the paper `HuBERT: Self-Supervised Speech
32. :doc:`GPT J <model_doc/gptj>` (from EleutherAI) released in the repository `kingoflolz/mesh-transformer-jax
<https://github.com/kingoflolz/mesh-transformer-jax>`__ by Ben Wang and Aran Aran Komatsuzaki.
33. :doc:`Hubert <model_doc/hubert>` (from Facebook) released with the paper `HuBERT: Self-Supervised Speech
Representation Learning by Masked Prediction of Hidden Units <https://arxiv.org/abs/2106.07447>`__ by Wei-Ning Hsu,
Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
33. :doc:`I-BERT <model_doc/ibert>` (from Berkeley) released with the paper `I-BERT: Integer-only BERT Quantization
34. :doc:`I-BERT <model_doc/ibert>` (from Berkeley) released with the paper `I-BERT: Integer-only BERT Quantization
<https://arxiv.org/abs/2101.01321>`__ by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer
34. :doc:`LayoutLM <model_doc/layoutlm>` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training
35. :doc:`LayoutLM <model_doc/layoutlm>` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training
of Text and Layout for Document Image Understanding <https://arxiv.org/abs/1912.13318>`__ by Yiheng Xu, Minghao Li,
Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
35. :doc:`LED <model_doc/led>` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer
36. :doc:`LED <model_doc/led>` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer
<https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
36. :doc:`Longformer <model_doc/longformer>` (from AllenAI) released with the paper `Longformer: The Long-Document
37. :doc:`Longformer <model_doc/longformer>` (from AllenAI) released with the paper `Longformer: The Long-Document
Transformer <https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
37. :doc:`LUKE <model_doc/luke>` (from Studio Ousia) released with the paper `LUKE: Deep Contextualized Entity
38. :doc:`LUKE <model_doc/luke>` (from Studio Ousia) released with the paper `LUKE: Deep Contextualized Entity
Representations with Entity-aware Self-attention <https://arxiv.org/abs/2010.01057>`__ by Ikuya Yamada, Akari Asai,
Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
38. :doc:`LXMERT <model_doc/lxmert>` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality
39. :doc:`LXMERT <model_doc/lxmert>` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality
Encoder Representations from Transformers for Open-Domain Question Answering <https://arxiv.org/abs/1908.07490>`__
by Hao Tan and Mohit Bansal.
39. :doc:`M2M100 <model_doc/m2m_100>` (from Facebook) released with the paper `Beyond English-Centric Multilingual
40. :doc:`M2M100 <model_doc/m2m_100>` (from Facebook) released with the paper `Beyond English-Centric Multilingual
Machine Translation <https://arxiv.org/abs/2010.11125>`__ by by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi
Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman
Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
40. :doc:`MarianMT <model_doc/marian>` Machine translation models trained using `OPUS <http://opus.nlpl.eu/>`__ data by
41. :doc:`MarianMT <model_doc/marian>` Machine translation models trained using `OPUS <http://opus.nlpl.eu/>`__ data by
Jörg Tiedemann. The `Marian Framework <https://marian-nmt.github.io/>`__ is being developed by the Microsoft
Translator Team.
41. :doc:`MBart <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Denoising Pre-training for
42. :doc:`MBart <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Denoising Pre-training for
Neural Machine Translation <https://arxiv.org/abs/2001.08210>`__ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li,
Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
42. :doc:`MBart-50 <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Translation with Extensible
43. :doc:`MBart-50 <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Translation with Extensible
Multilingual Pretraining and Finetuning <https://arxiv.org/abs/2008.00401>`__ by Yuqing Tang, Chau Tran, Xian Li,
Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
43. :doc:`Megatron-BERT <model_doc/megatron_bert>` (from NVIDIA) released with the paper `Megatron-LM: Training
44. :doc:`Megatron-BERT <model_doc/megatron_bert>` (from NVIDIA) released with the paper `Megatron-LM: Training
Multi-Billion Parameter Language Models Using Model Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad
Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
44. :doc:`Megatron-GPT2 <model_doc/megatron_gpt2>` (from NVIDIA) released with the paper `Megatron-LM: Training
45. :doc:`Megatron-GPT2 <model_doc/megatron_gpt2>` (from NVIDIA) released with the paper `Megatron-LM: Training
Multi-Billion Parameter Language Models Using Model Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad
Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
45. :doc:`MPNet <model_doc/mpnet>` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted
46. :doc:`MPNet <model_doc/mpnet>` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted
Pre-training for Language Understanding <https://arxiv.org/abs/2004.09297>`__ by Kaitao Song, Xu Tan, Tao Qin,
Jianfeng Lu, Tie-Yan Liu.
46. :doc:`MT5 <model_doc/mt5>` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained
47. :doc:`MT5 <model_doc/mt5>` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained
text-to-text transformer <https://arxiv.org/abs/2010.11934>`__ by Linting Xue, Noah Constant, Adam Roberts, Mihir
Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
47. :doc:`Pegasus <model_doc/pegasus>` (from Google) released with the paper `PEGASUS: Pre-training with Extracted
48. :doc:`Pegasus <model_doc/pegasus>` (from Google) released with the paper `PEGASUS: Pre-training with Extracted
Gap-sentences for Abstractive Summarization <https://arxiv.org/abs/1912.08777>`__> by Jingqing Zhang, Yao Zhao,
Mohammad Saleh and Peter J. Liu.
48. :doc:`ProphetNet <model_doc/prophetnet>` (from Microsoft Research) released with the paper `ProphetNet: Predicting
49. :doc:`ProphetNet <model_doc/prophetnet>` (from Microsoft Research) released with the paper `ProphetNet: Predicting
Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi,
Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
49. :doc:`Reformer <model_doc/reformer>` (from Google Research) released with the paper `Reformer: The Efficient
50. :doc:`Reformer <model_doc/reformer>` (from Google Research) released with the paper `Reformer: The Efficient
Transformer <https://arxiv.org/abs/2001.04451>`__ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
50. :doc:`RemBERT <model_doc/rembert>` (from Google Research) released with the paper `Rethinking embedding coupling in
51. :doc:`RemBERT <model_doc/rembert>` (from Google Research) released with the paper `Rethinking embedding coupling in
pre-trained language models <https://arxiv.org/pdf/2010.12821.pdf>`__ by Hyung Won Chung, Thibault Févry, Henry
Tsai, M. Johnson, Sebastian Ruder.
51. :doc:`RoBERTa <model_doc/roberta>` (from Facebook), released together with the paper a `Robustly Optimized BERT
52. :doc:`RoBERTa <model_doc/roberta>` (from Facebook), released together with the paper a `Robustly Optimized BERT
Pretraining Approach <https://arxiv.org/abs/1907.11692>`__ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar
Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
52. :doc:`RoFormer <model_doc/roformer>` (from ZhuiyiTechnology), released together with the paper a `RoFormer:
53. :doc:`RoFormer <model_doc/roformer>` (from ZhuiyiTechnology), released together with the paper a `RoFormer:
Enhanced Transformer with Rotary Position Embedding <https://arxiv.org/pdf/2104.09864v1.pdf>`__ by Jianlin Su and
Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
53. :doc:`SpeechToTextTransformer <model_doc/speech_to_text>` (from Facebook), released together with the paper
54. :doc:`SpeechToTextTransformer <model_doc/speech_to_text>` (from Facebook), released together with the paper
`fairseq S2T: Fast Speech-to-Text Modeling with fairseq <https://arxiv.org/abs/2010.05171>`__ by Changhan Wang, Yun
Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
54. :doc:`SqueezeBert <model_doc/squeezebert>` released with the paper `SqueezeBERT: What can computer vision teach NLP
55. :doc:`SqueezeBert <model_doc/squeezebert>` released with the paper `SqueezeBERT: What can computer vision teach NLP
about efficient neural networks? <https://arxiv.org/abs/2006.11316>`__ by Forrest N. Iandola, Albert E. Shaw, Ravi
Krishna, and Kurt W. Keutzer.
55. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
56. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel and Noam Shazeer and Adam
Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
56. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
57. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
Pre-training <https://arxiv.org/abs/2004.02349>`__ by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller,
Francesco Piccinno and Julian Martin Eisenschlos.
57. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
58. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__ by Zihang Dai*,
Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
58. :doc:`Vision Transformer (ViT) <model_doc/vit>` (from Google AI) released with the paper `An Image is Worth 16x16
59. :doc:`Vision Transformer (ViT) <model_doc/vit>` (from Google AI) released with the paper `An Image is Worth 16x16
Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`__ by Alexey Dosovitskiy,
Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias
Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
59. :doc:`VisualBERT <model_doc/visual_bert>` (from UCLA NLP) released with the paper `VisualBERT: A Simple and
60. :doc:`VisualBERT <model_doc/visual_bert>` (from UCLA NLP) released with the paper `VisualBERT: A Simple and
Performant Baseline for Vision and Language <https://arxiv.org/pdf/1908.03557>`__ by Liunian Harold Li, Mark
Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
60. :doc:`Wav2Vec2 <model_doc/wav2vec2>` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for
61. :doc:`Wav2Vec2 <model_doc/wav2vec2>` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for
Self-Supervised Learning of Speech Representations <https://arxiv.org/abs/2006.11477>`__ by Alexei Baevski, Henry
Zhou, Abdelrahman Mohamed, Michael Auli.
61. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
62. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
Pretraining <https://arxiv.org/abs/1901.07291>`__ by Guillaume Lample and Alexis Conneau.
62. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
63. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
Predicting Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan,
Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
63. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
64. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau*, Kartikay
Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke
Zettlemoyer and Veselin Stoyanov.
64. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `​XLNet: Generalized Autoregressive
65. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `​XLNet: Generalized Autoregressive
Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`__ by Zhilin Yang*, Zihang Dai*, Yiming
Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
65. :doc:`XLSR-Wav2Vec2 <model_doc/xlsr_wav2vec2>` (from Facebook AI) released with the paper `Unsupervised
66. :doc:`XLSR-Wav2Vec2 <model_doc/xlsr_wav2vec2>` (from Facebook AI) released with the paper `Unsupervised
Cross-Lingual Representation Learning For Speech Recognition <https://arxiv.org/abs/2006.13979>`__ by Alexis
Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.

Expand Down Expand Up @@ -363,6 +365,8 @@ Flax), PyTorch, and/or TensorFlow.
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| GPT Neo | ❌ | ❌ | ✅ | ❌ | ✅ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| GPT J | ❌ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Hubert | ❌ | ❌ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| I-BERT | ❌ | ❌ | ✅ | ❌ | ❌ |
Expand Down
62 changes: 62 additions & 0 deletions docs/source/model_doc/gptj.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
..
Copyright 2021 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

GPT J
-----------------------------------------------------------------------------------------------------------------------

Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The GPT J model was released in the `kingoflolz/mesh-transformer-jax <https://github.com/kingoflolz/mesh-transformer-jax>`__ repository by Ben Wang and Aran Komatsuzaki. It is a GPT2-like causal language model trained on the
`Pile <https://pile.eleuther.ai/>`__ dataset.

This model was contributed by `Stella Biderman <https://huggingface.co/stellaathena>`__.
patrickvonplaten marked this conversation as resolved.
Show resolved Hide resolved

Generation
_______________________________________________________________________________________________________________________

The :obj:`generate()` method can be used to generate text using GPT J model.

.. code-block::

>>> from transformers import GPTNeoForCausalLM, GPT2Tokenizer
>>> model = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-j-6B")
>>> tokenizer = GPT2Tokenizer.from_pretrained("EleutherAI/gpt-j-6B")
StellaAthena marked this conversation as resolved.
Show resolved Hide resolved

>>> prompt = "In a shocking finding, scientists discovered a herd of unicorns living in a remote, " \
... "previously unexplored valley, in the Andes Mountains. Even more surprising to the " \
... "researchers was the fact that the unicorns spoke perfect English."

>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids

>>> gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=100,)
>>> gen_text = tokenizer.batch_decode(gen_tokens)[0]


GPTJConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPTJConfig
:members:

GPTJModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPTJModel
:members: forward


GPTJForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.GPTJForCausalLM
:members: forward
Loading