Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds TPU to CI. #981

Merged
merged 19 commits into from
Apr 26, 2020
Merged

adds TPU to CI. #981

merged 19 commits into from
Apr 26, 2020

Conversation

erip
Copy link
Contributor

@erip erip commented Apr 25, 2020

Fixes #963

Description: Adds TPU test runner.

Check list:

  • New tests are added (if a new feature is added)
  • New doc strings: description and/or example code are in RST format
  • Documentation is updated (if required)

@erip
Copy link
Contributor Author

erip commented Apr 25, 2020

First pass will fail since there are no marked tests. Do you want a silly test to satisfy CI or do we have some small but useful test we'd like to write?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Apr 25, 2020

@erip yes, we recently made create_supervised_trainer to support tpus, so juste a basic integration can be a good test ?
Like this one: https://github.com/pytorch/ignite/blob/master/tests/ignite/engine/test_create_supervised.py#L83

@erip
Copy link
Contributor Author

erip commented Apr 25, 2020

Excellent! I also foresee the docker pull being a tremendous bottleneck in CI. I think I can add the official cache action to CI... I'll try that, too.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Apr 25, 2020

@erip just noticed on my desktop, this xla image is 17.9GB :(

@erip
Copy link
Contributor Author

erip commented Apr 25, 2020

So I noticed... GitHub actions died from lack of space. 😄

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Apr 25, 2020

@erip otherwise, can you try just to install everything manually with conda env like this

  • conda install pytorch cpuonly -c pytorch-nightly
VERSION = "1.5"
curl https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
python pytorch-xla-env-setup.py --version $VERSION

and then setup somewhere their env vars to use CPU

EDIT: THIS WONT WORK

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Apr 25, 2020

I checked the content of conda list in this xla image and seems like they included everything there :)

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Apr 26, 2020

@erip you can install torch_xla like that on ubuntu with conda support

## Install conda env for python 3.6 (xla works on 3.6 only)
conda create -y -n py36 python=3.6
conda activate py36

## Install gsutil
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
apt-get install -y apt-transport-https ca-certificates gnupg curl
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
apt-get update && apt-get install -y google-cloud-sdk

## Install openblas and mkl
apt-get install -y libopenblas-dev
conda install -y mkl

## Download torch & xla
gsutil cp gs://tpu-pytorch/wheels/torch-1.5-cp36-cp36m-linux_x86_64.whl .
gsutil cp gs://tpu-pytorch/wheels/torch_xla-1.5-cp36-cp36m-linux_x86_64.whl .

## Install torch & xla
pip install torch-1.5-cp36-cp36m-linux_x86_64.whl
pip install torch_xla-1.5-cp36-cp36m-linux_x86_64.whl

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/conda/pkgs/mkl-2020.0-166/lib/
export XRT_DEVICE_MAP="CPU:0;/job:localservice/replica:0/task:0/device:XLA_CPU:0"
export XRT_WORKERS="localservice:0;grpc://localhost:40934"

to check if everything is OK

python -c "import torch_xla; print(torch_xla.__version__)"

@erip
Copy link
Contributor Author

erip commented Apr 26, 2020

It kind of looks like the TPU action isn't being trigged on force pushes... I'll take a look at that in the morning, but it's probably something to do with the on: line only being pull_request.

@vfdev-5 vfdev-5 changed the base branch from master to tpu April 26, 2020 08:54
@vfdev-5 vfdev-5 changed the base branch from tpu to master April 26, 2020 09:05
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Apr 26, 2020

@erip I made some changes since your last commit. It installs a usable torch xla:
https://github.com/pytorch/ignite/pull/981/checks?check_run_id=619491582
It will remain to write some "make-sense" tests. If you can add that, will be great !

@erip erip changed the title [WIP] adds TPU to CI. adds TPU to CI. Apr 26, 2020
@erip
Copy link
Contributor Author

erip commented Apr 26, 2020

Since this PR only adds CI and a simple test (w/o new functionality), I don't think there are any documentation things required. WDYT @vfdev-5?

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @erip !

@vfdev-5 vfdev-5 merged commit 7491c83 into pytorch:master Apr 26, 2020
@erip erip deleted the feature/add-tpu-ci branch April 26, 2020 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setup CI as running on TPU
2 participants