Skip to content

v0.5.0: GPT Neo + misc fixes

Compare
Choose a tag to compare
@minimaxir minimaxir released this 19 Apr 01:15
· 32 commits to master since this release
8a44c4d

aitextgen has been updated to support GPT Neo and fix a few outstanding generation issues! However, in the process there are a few breaking changes.

Breaking Changes

Loading Models

While making model-loading architecture-agnostic for GPT Neo support, it turns out aitextgen was loading models in an unofficial way, so this has now been addressed. The user must now specify the model_folder where the pytorch_model.bin and config.json are located (with those exact filenames).

Assuming the model is located in trained_folder:

Old :

ai2 = aitextgen(model="trained_model/pytorch_model.bin",
                tokenizer_file="aitextgen.tokenizer.json",
                config="trained_model/config.json")

New:

ai2 = aitextgen(model_folder="trained_model",
                	   tokenizer_file="aitextgen.tokenizer.json")

All notebooks and documentation have been updated with this new workflow, and an assert will be raised of the old behavior is still used.

Incorrect tokenization for Colab-trained GPT-2 tokenizers.

There was an underlying issue due to a recent change in tokenizers which broke the implementation of the default GPT-2 tokenizer by preventing it from tokenizing <|endoftext|> tokens correctly. As a result, this broke the truncation

Only the case where the Colab GPT-2 Notebook was used for training line-by-line texts were affected by this; unfortunately the only fix now is to retrain the model with v0.5.0

Other Major Changes/Fixes

GPT Neo support

GPT Neo is now supported! The Colab Notebook was updated to indicate how to finetune the smaller versions of the model.

Out of the box, all variants of GPT-Neo have a 2048 context window (versus GPT-2’s 1024 context length) allowing double the generation length, and the pretrained models are trained on much more recent data. Finetuning a GPT Neo model takes about 2x as long per step as a GPT-2 model: notable as normally increasing the context window causes training to scale quadraticly instead of linearly, and does appear to converge faster.

However, text-generation performance-wise, it’s currently unclear whether GPT-Neo is “better”, especially on short-form content. Future releases of aitextgen will analyze this more closely.

DeepSpeed support [BETA] (#103)

Thanks to the team at pytorch-lightning, DeepSpeed support has been added for aitextgen, allowing training of larger models (>1.5B params) with multi-GPUs. However, this isn’t fully tested, so more documentation is pending!

Misc changes

  • Added a nonempty_output param to generate(), default True: If the output is empty (possible on shortform content), skip it if generating multiple texts, or try again if it's a single text. If min_length is specified, the same behavior occurs for texts below the minimum length after processing.

  • Bumped minimum versions of transformers and pytorch-lightning.

  • Completed another pass of notebooks and documentation.

  • Forced single-GPU training on Windows to avoid bugs (#116)

  • Calling the aitextgen instance will now print the model type and number of params to the console, helpful for debugging.