Skip to content

Commit

Permalink
gptj config changes to enable finetuning of gpt-j-6B and gpt-j-xl (#3785
Browse files Browse the repository at this point in the history
)

* sync

* clear notebook outputs and linting
  • Loading branch information
anapt authored Feb 16, 2023
1 parent b818b20 commit f8ae4ba
Show file tree
Hide file tree
Showing 11 changed files with 868 additions and 157 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"cells": [
{
"cell_type": "markdown",
"id": "cf87f233",
"metadata": {},
"source": [
"# Train EleutherAI GPT-J with PyTorch 1.8.1 and Pipeline Parallelism Using the SageMaker Model Parallelism Library\n",
Expand Down Expand Up @@ -31,6 +32,7 @@
},
{
"cell_type": "markdown",
"id": "77eb83a9",
"metadata": {},
"source": [
"## SageMaker Distributed Training \n",
Expand Down Expand Up @@ -67,6 +69,7 @@
},
{
"cell_type": "markdown",
"id": "7aa9251a",
"metadata": {},
"source": [
"### SageMaker Model Parallel configuration\n",
Expand All @@ -90,6 +93,7 @@
},
{
"cell_type": "markdown",
"id": "2b1f0327",
"metadata": {},
"source": [
"#### Additional Resources\n",
Expand All @@ -104,6 +108,7 @@
},
{
"cell_type": "markdown",
"id": "f571615e",
"metadata": {},
"source": [
"#### Amazon SageMaker Initialization\n",
Expand All @@ -117,6 +122,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "95aaf0c2",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -126,6 +132,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "4408ceae",
"metadata": {
"scrolled": true
},
Expand All @@ -137,6 +144,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "ebe376a8",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -146,6 +154,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "02a7d9e3",
"metadata": {
"scrolled": true
},
Expand All @@ -157,6 +166,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "64d2c112",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -177,6 +187,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "452456a3",
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -216,6 +227,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "129a6da2",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -224,13 +236,15 @@
},
{
"cell_type": "markdown",
"id": "bebdc6e9",
"metadata": {},
"source": [
"## Training Dataset"
]
},
{
"cell_type": "markdown",
"id": "02d2ac3a",
"metadata": {},
"source": [
"The training script fine-tunes GPT-J on the `sst2` dataset. \n",
Expand All @@ -242,6 +256,7 @@
},
{
"cell_type": "markdown",
"id": "d676e00b",
"metadata": {},
"source": [
"## Setup Hyperparameters\n",
Expand All @@ -254,6 +269,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "175f94c2",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -264,6 +280,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "695b8ab3",
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -295,6 +312,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "1fcd2760",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -314,6 +332,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "efd36f09",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -323,6 +342,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "e2f03456",
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -364,6 +384,7 @@
},
{
"cell_type": "markdown",
"id": "7ef20b10",
"metadata": {},
"source": [
"## Setup SageMaker Training Job"
Expand All @@ -372,6 +393,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c43285cb",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -387,6 +409,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "bb9616e3",
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -419,6 +442,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "f4cabfd4",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -431,6 +455,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "b9f05c07",
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -459,6 +484,7 @@
},
{
"cell_type": "markdown",
"id": "45eb0cde",
"metadata": {},
"source": [
"If you receive a `ResourceLimitExceeded` error message when running the following cell, you can request an increase on the default quota by contacting [AWS support](https://console.aws.amazon.com/support). Open the [AWS Support Center](https://console.aws.amazon.com/support), and then choose Create case. Choose Service limit increase. For Limit Type choose SageMaker Training Jobs. Complete the rest of the form and submit."
Expand All @@ -467,6 +493,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "2601cc8a",
"metadata": {
"scrolled": true
},
Expand All @@ -484,6 +511,7 @@
},
{
"cell_type": "markdown",
"id": "31b51fd8",
"metadata": {},
"source": [
"## Accessing the Training Logs\n",
Expand Down Expand Up @@ -511,6 +539,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "3de8f1d2",
"metadata": {},
"outputs": [],
"source": []
Expand Down
Loading

0 comments on commit f8ae4ba

Please sign in to comment.