Skip to content

Commit

Permalink
Faster, more space-efficient tutorials (#1124)
Browse files Browse the repository at this point in the history
* Speed up notebook tests

* Black fix

* Mock rest of variables

* Undo URL changes

* Update conda deps

* Notebooks also plot images

* Fix undefined variable

* Test with serial data loading

* Use tempfile for all data download directories

* Encode timeout in notebook

* Share datasets across processes

* Fix missing import

* Pretrained Weights: use EuroSAT100

* Transforms: use EuroSAT100

* Trainers: use EuroSAT100

* Blacken

* MPLBACKEND is already Agg by default on Linux

* Indices: use EuroSAT100

* Pretrained Weights: add output

* Pretrained Weights: add output

* Trainers: save output

* Pretrained Weights: ResNet 50 -> 18

* Trainers: better graph

* Indices: add missing plot

* Cache downloads

* Small edit

* Revert "Cache downloads"

This reverts commit 5276c53.

* Revert "Revert "Cache downloads""

This reverts commit 137c69e.

* env only

* half env

* Variable with no braces

* Set tmpdir elsewhere

* Give up on tmpdir caching

* Trainers: clear output

* lightning.pytorch package import

* nbstripout

* Rerun upon failure

* Re-add caching

* Rerun failures on release branch too
  • Loading branch information
adamjstewart authored and calebrob6 committed Apr 10, 2023
1 parent cfe4541 commit ce4c4b1
Show file tree
Hide file tree
Showing 12 changed files with 849 additions and 6,960 deletions.
6 changes: 2 additions & 4 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,12 +72,10 @@ jobs:
- name: Install pip dependencies
if: steps.cache.outputs.cache-hit != 'true'
run: |
pip install .[datasets,docs,tests]
pip install .[docs,tests] planetary_computer pystac pytest-rerunfailures
pip list
- name: Run notebook checks
env:
MLHUB_API_KEY: ${{ secrets.MLHUB_API_KEY }}
run: pytest --nbmake docs/tutorials --durations=10
run: pytest --nbmake --durations=10 --reruns=10 docs/tutorials
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
6 changes: 2 additions & 4 deletions .github/workflows/tutorials.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,10 @@ jobs:
- name: Install pip dependencies
if: steps.cache.outputs.cache-hit != 'true'
run: |
pip install .[datasets,docs,tests]
pip install .[docs,tests] planetary_computer pystac pytest-rerunfailures
pip list
- name: Run notebook checks
env:
MLHUB_API_KEY: ${{ secrets.MLHUB_API_KEY }}
run: pytest --nbmake --nbmake-timeout=3000 docs/tutorials --durations=10
run: pytest --nbmake --durations=10 --reruns=10 docs/tutorials
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
99 changes: 50 additions & 49 deletions docs/tutorials/benchmarking.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -58,7 +58,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {
"gather": {
"logged": 1629238744113
Expand Down Expand Up @@ -90,12 +90,11 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_root = tempfile.gettempdir()\n",
"naip_root = os.path.join(data_root, \"naip\")\n",
"naip_root = os.path.join(tempfile.gettempdir(), \"naip\")\n",
"naip_url = (\n",
" \"https://naipeuwest.blob.core.windows.net/naip/v002/de/2018/de_060cm_2018/38075/\"\n",
")\n",
Expand All @@ -118,12 +117,11 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chesapeake_root = os.path.join(data_root, \"chesapeake\")\n",
"\n",
"chesapeake_root = os.path.join(tempfile.gettempdir(), \"chesapeake\")\n",
"chesapeake = ChesapeakeDE(chesapeake_root, download=True)"
]
},
Expand All @@ -143,7 +141,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"metadata": {
"gather": {
"logged": 1629238744228
Expand All @@ -167,6 +165,34 @@
" return toc - tic, i"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following variables can be modified to control the number of samples drawn per epoch."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"nbmake": {
"mock": {
"batch_size": 1,
"length": 1,
"size": 1,
"stride": 1000000
}
}
},
"outputs": [],
"source": [
"size = 1000\n",
"length = 888\n",
"batch_size = 12\n",
"stride = 500"
]
},
{
"cell_type": "markdown",
"metadata": {
Expand All @@ -183,7 +209,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"metadata": {
"gather": {
"logged": 1629248963725
Expand All @@ -197,24 +223,15 @@
"outputId": "edcc8199-bd09-4832-e50c-7be8ac78995b",
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"296.582683801651 74\n",
"54.20210099220276 74\n"
]
}
],
"outputs": [],
"source": [
"for cache in [False, True]:\n",
" chesapeake = ChesapeakeDE(chesapeake_root, cache=cache)\n",
" naip = NAIP(naip_root, crs=chesapeake.crs, res=chesapeake.res, cache=cache)\n",
" dataset = chesapeake & naip\n",
" sampler = RandomGeoSampler(dataset, size=1000, length=888)\n",
" sampler = RandomGeoSampler(dataset, size=size, length=length)\n",
" dataloader = DataLoader(\n",
" dataset, batch_size=12, sampler=sampler, collate_fn=stack_samples\n",
" dataset, batch_size=batch_size, sampler=sampler, collate_fn=stack_samples\n",
" )\n",
" duration, count = time_epoch(dataloader)\n",
" print(duration, count)"
Expand All @@ -236,7 +253,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"metadata": {
"gather": {
"logged": 1629239313388
Expand All @@ -250,24 +267,15 @@
"outputId": "159ce99f-a438-4ecc-d218-9b9e28d02055",
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"391.90197944641113 74\n",
"118.0611424446106 74\n"
]
}
],
"outputs": [],
"source": [
"for cache in [False, True]:\n",
" chesapeake = ChesapeakeDE(chesapeake_root, cache=cache)\n",
" naip = NAIP(naip_root, crs=chesapeake.crs, res=chesapeake.res, cache=cache)\n",
" dataset = chesapeake & naip\n",
" sampler = GridGeoSampler(dataset, size=1000, stride=500)\n",
" sampler = GridGeoSampler(dataset, size=size, stride=stride)\n",
" dataloader = DataLoader(\n",
" dataset, batch_size=12, sampler=sampler, collate_fn=stack_samples\n",
" dataset, batch_size=batch_size, sampler=sampler, collate_fn=stack_samples\n",
" )\n",
" duration, count = time_epoch(dataloader)\n",
" print(duration, count)"
Expand All @@ -289,7 +297,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"metadata": {
"gather": {
"logged": 1629249843438
Expand All @@ -303,22 +311,15 @@
"outputId": "497f6869-1ab7-4db7-bbce-e943b493ca41",
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"230.51380324363708 74\n",
"53.99923872947693 74\n"
]
}
],
"outputs": [],
"source": [
"for cache in [False, True]:\n",
" chesapeake = ChesapeakeDE(chesapeake_root, cache=cache)\n",
" naip = NAIP(naip_root, crs=chesapeake.crs, res=chesapeake.res, cache=cache)\n",
" dataset = chesapeake & naip\n",
" sampler = RandomBatchGeoSampler(dataset, size=1000, batch_size=12, length=888)\n",
" sampler = RandomBatchGeoSampler(\n",
" dataset, size=size, batch_size=batch_size, length=length\n",
" )\n",
" dataloader = DataLoader(dataset, batch_sampler=sampler, collate_fn=stack_samples)\n",
" duration, count = time_epoch(dataloader)\n",
" print(duration, count)"
Expand Down Expand Up @@ -349,10 +350,10 @@
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "ipython",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
"version": "3.10.8"
},
"nteract": {
"version": "nteract-front-end@1.0.0"
Expand Down
Loading

0 comments on commit ce4c4b1

Please sign in to comment.