adds TPU to CI. #981

erip · 2020-04-25T22:40:46Z

Fixes #963

Description: Adds TPU test runner.

Check list:

New tests are added (if a new feature is added)
New doc strings: description and/or example code are in RST format
Documentation is updated (if required)

erip · 2020-04-25T22:52:16Z

First pass will fail since there are no marked tests. Do you want a silly test to satisfy CI or do we have some small but useful test we'd like to write?

vfdev-5 · 2020-04-25T22:56:51Z

@erip yes, we recently made create_supervised_trainer to support tpus, so juste a basic integration can be a good test ?
Like this one: https://github.com/pytorch/ignite/blob/master/tests/ignite/engine/test_create_supervised.py#L83

erip · 2020-04-25T22:58:47Z

Excellent! I also foresee the docker pull being a tremendous bottleneck in CI. I think I can add the official cache action to CI... I'll try that, too.

vfdev-5 · 2020-04-25T23:02:14Z

@erip just noticed on my desktop, this xla image is 17.9GB :(

erip · 2020-04-25T23:13:00Z

So I noticed... GitHub actions died from lack of space. 😄

vfdev-5 · 2020-04-25T23:18:47Z

@erip otherwise, can you try just to install everything manually with conda env like this

conda install pytorch cpuonly -c pytorch-nightly

VERSION = "1.5"
curl https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
python pytorch-xla-env-setup.py --version $VERSION

and then setup somewhere their env vars to use CPU

EDIT: THIS WONT WORK

vfdev-5 · 2020-04-25T23:21:54Z

I checked the content of conda list in this xla image and seems like they included everything there :)

vfdev-5 · 2020-04-26T01:43:44Z

@erip you can install torch_xla like that on ubuntu with conda support

## Install conda env for python 3.6 (xla works on 3.6 only)
conda create -y -n py36 python=3.6
conda activate py36

## Install gsutil
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
apt-get install -y apt-transport-https ca-certificates gnupg curl
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
apt-get update && apt-get install -y google-cloud-sdk

## Install openblas and mkl
apt-get install -y libopenblas-dev
conda install -y mkl

## Download torch & xla
gsutil cp gs://tpu-pytorch/wheels/torch-1.5-cp36-cp36m-linux_x86_64.whl .
gsutil cp gs://tpu-pytorch/wheels/torch_xla-1.5-cp36-cp36m-linux_x86_64.whl .

## Install torch & xla
pip install torch-1.5-cp36-cp36m-linux_x86_64.whl
pip install torch_xla-1.5-cp36-cp36m-linux_x86_64.whl

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/conda/pkgs/mkl-2020.0-166/lib/
export XRT_DEVICE_MAP="CPU:0;/job:localservice/replica:0/task:0/device:XLA_CPU:0"
export XRT_WORKERS="localservice:0;grpc://localhost:40934"

to check if everything is OK

python -c "import torch_xla; print(torch_xla.__version__)"

erip · 2020-04-26T02:03:45Z

It kind of looks like the TPU action isn't being trigged on force pushes... I'll take a look at that in the morning, but it's probably something to do with the on: line only being pull_request.

vfdev-5 · 2020-04-26T09:45:25Z

@erip I made some changes since your last commit. It installs a usable torch xla:
https://github.com/pytorch/ignite/pull/981/checks?check_run_id=619491582
It will remain to write some "make-sense" tests. If you can add that, will be great !

tests/ignite/engine/test_create_supervised.py

erip · 2020-04-26T16:26:54Z

Since this PR only adds CI and a simple test (w/o new functionality), I don't think there are any documentation things required. WDYT @vfdev-5?

vfdev-5

LGTM! Thanks @erip !

add TPU github action. TODO: add test marked "tpu"

8478fa9

vfdev-5 added 2 commits April 26, 2020 10:17

Update tpu.yml

22f4da2

Update tpu.yml

f8b5934

vfdev-5 changed the base branch from master to tpu April 26, 2020 08:54

vfdev-5 added 2 commits April 26, 2020 11:01

Update tpu.yml

4c63d56

Update tpu.yml

e868e00

vfdev-5 changed the base branch from tpu to master April 26, 2020 09:05

vfdev-5 added 6 commits April 26, 2020 11:05

Merge branch 'master' into feature/add-tpu-ci

08b6288

Delete tpu.yml

5935d12

Update tpu-tests.yaml

49f2dc9

Update tpu-tests.yaml

aa97400

Update tpu-tests.yaml

6408f8a

Update tpu-tests.yaml

ad13175

vfdev-5 and others added 7 commits April 26, 2020 11:48

Update tpu-tests.yaml

62fe689

add TPU test.

0f4e91c

add pytest TPU marker to the setup.cfg

d071fe2

uncomment pytest invocation.

1a00b3c

fix test to explicitly skip.

0d70a96

add coverage details.

bf052c9

autopep8 fix

e9a00a1

vfdev-5 reviewed Apr 26, 2020

View reviewed changes

tests/ignite/engine/test_create_supervised.py Show resolved Hide resolved

re-add missing marker for TPU test.

352c39b

erip changed the title ~~[WIP] adds TPU to CI.~~ adds TPU to CI. Apr 26, 2020

vfdev-5 approved these changes Apr 26, 2020

View reviewed changes

vfdev-5 merged commit 7491c83 into pytorch:master Apr 26, 2020

erip deleted the feature/add-tpu-ci branch April 26, 2020 17:14

lezwon mentioned this pull request May 9, 2020

add TPU tests Lightning-AI/pytorch-lightning#1246

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds TPU to CI. #981

adds TPU to CI. #981

erip commented Apr 25, 2020 •

edited

Loading

erip commented Apr 25, 2020

vfdev-5 commented Apr 25, 2020

erip commented Apr 25, 2020

vfdev-5 commented Apr 25, 2020

erip commented Apr 25, 2020

vfdev-5 commented Apr 25, 2020 •

edited

Loading

vfdev-5 commented Apr 25, 2020

vfdev-5 commented Apr 26, 2020 •

edited

Loading

erip commented Apr 26, 2020

vfdev-5 commented Apr 26, 2020 •

edited

Loading

erip commented Apr 26, 2020 •

edited

Loading

vfdev-5 left a comment

adds TPU to CI. #981

adds TPU to CI. #981

Conversation

erip commented Apr 25, 2020 • edited Loading

erip commented Apr 25, 2020

vfdev-5 commented Apr 25, 2020

erip commented Apr 25, 2020

vfdev-5 commented Apr 25, 2020

erip commented Apr 25, 2020

vfdev-5 commented Apr 25, 2020 • edited Loading

vfdev-5 commented Apr 25, 2020

vfdev-5 commented Apr 26, 2020 • edited Loading

erip commented Apr 26, 2020

vfdev-5 commented Apr 26, 2020 • edited Loading

erip commented Apr 26, 2020 • edited Loading

vfdev-5 left a comment

Choose a reason for hiding this comment

erip commented Apr 25, 2020 •

edited

Loading

vfdev-5 commented Apr 25, 2020 •

edited

Loading

vfdev-5 commented Apr 26, 2020 •

edited

Loading

vfdev-5 commented Apr 26, 2020 •

edited

Loading

erip commented Apr 26, 2020 •

edited

Loading