Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerize CI + Release builds #1234

Merged
merged 1 commit into from
Aug 30, 2022
Merged

Dockerize CI + Release builds #1234

merged 1 commit into from
Aug 30, 2022

Conversation

powderluv
Copy link
Collaborator

Gets both CI and Release builds integrated in one workflow.

Tested the callout with in-tree / out-of-tree and torch-mlir
release packages

TODO: add the correct CMake commands in the functions.
Mount ccache and pip cache as required

Out to build all the CI and Release builds in one go:

anush@denali ~/github/torch-mlir % TM_PACKAGES="out-of-tree in-tree torch-mlir" TM_TORCH_SRC=ON TM_PYTHON_VERSIONS="cp310-cp310"  ./build_tools/python_deploy/build_linux_packages.sh                                   
                                                                                                                                                                                                                     
Setting torch-mlir Python Package version to: 0.0.1                                                                                                                                                                  
Running on host                                                                                                                                                                                                      
Launching docker image stellaraccident/manylinux2014_x86_64-bazel-5.1.0:latest                                                                                                                                       
Outputting to /home/anush/github/torch-mlir/build_tools/python_deploy/wheelhouse                                                                                                                                     
Setting torch-mlir Python Package version to: 0.0.1                                                                                                                                                                  
Running in docker                                                                                                                                                                                                    
Using python versions: cp310-cp310                                                                                                                                                                                   
******************** BUILDING PACKAGE out-of-tree ********************                                                                                                                                               
:::: Python version Python 3.10.4                                                                                                                                                                                    
:::: Clean build dir out-of-tree ecp310-cp310                                                                                                                                                                        
:::: Build out-of-tree Torch from source: ON                                                                                                                                                                         
:::: Test out-of-tree                                                                                                                                                                                                
******************** BUILDING PACKAGE in-tree ********************                                                                                                                                                   
:::: Python version Python 3.10.4                                                                                                                                                                                    
:::: Clean build dir in-tree ecp310-cp310                                                                                                                                                                            
:::: Build in-tree Torch from source: ON                                                                                                                                                                             
:::: Test in-tree                                                                                                                                                                                                    
******************** BUILDING PACKAGE torch-mlir ********************                                                                                                                                                
:::: Python version Python 3.10.4                                                                                                                                                                                    
:::: Clean wheels torch_mlir cp310-cp310                                                                                                                                                                             
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/nightly/cpu                                                                                                                            
Looking in links: https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html                                                                                                                                    
Collecting torch                                                                                                                                                                                                     
  Downloading https://download.pytorch.org/whl/nightly/cpu/torch-1.13.0.dev20220816%2Bcpu-cp310-cp310-linux_x86_64.whl (195.1 MB)                                                                                    
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 195.1/195.1 MB 9.9 MB/s eta 0:00:00                                                                                                                                     
Collecting numpy                                                                                                                     
....
  adding 'torch_mlir-0.0.1.dist-info/WHEEL'
  adding 'torch_mlir-0.0.1.dist-info/top_level.txt'
  adding 'torch_mlir-0.0.1.dist-info/RECORD'
  removing build/bdist.linux-x86_64/wheel
  Building wheel for torch-mlir (pyproject.toml): finished with status 'done'
  Created wheel for torch-mlir: filename=torch_mlir-0.0.1-cp310-cp310-linux_x86_64.whl size=35159325 sha256=3f00551a9f426b8f58457a1306a143ac9f66865113dbaf2050b082e2ae543654
  Stored in directory: /root/.cache/pip/wheels/b9/1a/c9/fd0f4b77f13d9149c3c25bd33ce4a61807bcc222a8b04a5ea9
Successfully built torch-mlir
WARNING: You are using pip version 22.0.4; however, version 22.2.2 is available.
You should consider upgrading via the '/opt/python/cp310-cp310/bin/python -m pip install --upgrade pip' command.


If we want to use Ubuntu 22.04 for the CI:

anush@denali ~/github/torch-mlir % TM_PACKAGES="out-of-tree in-tree" TM_TORCH_SRC=ON TM_PYTHON_VERSIONS="cp310-cp310" TM_DOCKER_IMAGE="ubuntu:22.04" ./build_tools/python_deploy/build_linux_packages.sh

Setting torch-mlir Python Package version to: 0.0.1
Running on host
Launching docker image ubuntu:22.04
Outputting to /home/anush/github/torch-mlir/build_tools/python_deploy/wheelhouse
Unable to find image 'ubuntu:22.04' locally
22.04: Pulling from library/ubuntu
d19f32bd9e41: Pull complete 
Digest: sha256:34fea4f31bf187bc915536831fd0afc9d214755bf700b5cdb1336c82516d154e
Status: Downloaded newer image for ubuntu:22.04

This was referenced Aug 16, 2022
@powderluv powderluv changed the title WIP: Dockerize CI + Release builds Dockerize CI + Release builds Aug 29, 2022
@powderluv
Copy link
Collaborator Author

Gets both CI and Release builds integrated in one workflow.
Mount ccache and pip cache as required for fast iterative builds
Current Release docker builds still run with root perms, fix it
in the future to run as the same user.

There may be some corner cases left especially when switching
build types etc.

Docker build TEST plan:

tl;dr:
Build everything: Releases (Python 3.8, 3.9, 3.10) and CIs.
TM_PACKAGES="torch-mlir out-of-tree in-tree" 2.57s user 2.49s system 0% cpu 30:33.11 total

Out of Tree + PyTorch binaries:

Fresh build (purged cache):
TM_PACKAGES="out-of-tree" 0.47s user 0.51s system 0% cpu 5:24.99 total

Incremental with ccache:
TM_PACKAGES="out-of-tree" 0.09s user 0.08s system 0% cpu 34.817 total

Out of Tree + PyTorch from source

Incremental
TM_PACKAGES="out-of-tree" TM_USE_PYTORCH_BINARY=OFF 1.58s user 1.81s system 2% cpu 1:59.61 total

In-Tree + PyTorch binaries:

Fresh build and tests: (purge ccache)
TM_PACKAGES="in-tree" 0.53s user 0.49s system 0% cpu 6:23.35 total

Fresh build/ but with prior ccache
TM_PACKAGES="in-tree" 0.45s user 0.66s system 0% cpu 3:57.47 total

Incremental in-tree with all tests and regression tests
TM_PACKAGES="in-tree" 0.16s user 0.09s system 0% cpu 2:18.52 total

In-Tree + PyTorch from source

Fresh build and tests: (purge ccache)
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF 2.03s user 2.28s system 0% cpu 11:11.86 total

Fresh build/ but with prior ccache
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF 1.58s user 1.88s system 1% cpu 4:53.15 total

Incremental in-tree with all tests and regression tests
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF 1.09s user 1.10s system 1% cpu 3:29.84 total

Incremental without tests
TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF TM_SKIP_TESTS=ON 1.52s user 1.42s system 3% cpu 1:15.82 total

In-tree+out-of-tree + Pytorch Binaries
TM_PACKAGES="out-of-tree in-tree" 0.25s user 0.18s system 0% cpu 3:01.91 total

To clear all artifacts:
rm -rf build build_oot llvm-build libtorch docker_venv externals/pytorch/build .ccache

@silvasean
Copy link
Contributor

This seems reasonable to me. @sjain-stanford ?

Copy link
Member

@sjain-stanford sjain-stanford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @powderluv . This looks great. I will give the "local" build+test flow a try later today (very excited!). The main request I have is - since we set out to "dockerize CI" - it'd be good to also see GHA CI workflows updated to use these docker flows. This will validate the requirements fully, and ensure any cache issues or other GHA issues can be addressed alongside this PR.

@powderluv
Copy link
Collaborator Author

Happy to add the GHA pieces in the follow-on but wanted to get the base functionality in first so we don't have a mega commit and easy to revert just GHA if something goes haywire

@powderluv powderluv force-pushed the docker-ci branch 2 times, most recently from 2055b25 to 6a8a345 Compare August 30, 2022 14:02
@powderluv
Copy link
Collaborator Author

powderluv commented Aug 30, 2022

I have also added the GHA workflows now in a follow on commit. It is currently running CI etc. #1313 and Release builds pass (https://github.com/llvm/torch-mlir/runs/8090506802?check_suite_focus=true).

Copy link
Member

@sjain-stanford sjain-stanford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG(reat)TM. From my local testing, I can confirm that re-runs are blazing fast (utilize pip cache)! Left some minor comments to get this going.

I have also added the GHA workflows now in a follow on commit. It is currently running CI etc. #1313

It seems #1313 doesn't have GHA workflows yet - I'm showing this PR replicated there, could you PTAL? Again, thanks for working on the follow-on commit to validate the GHA workflows as well.

build_tools/python_deploy/build_linux_packages.sh Outdated Show resolved Hide resolved
build_tools/docker/Dockerfile Show resolved Hide resolved
build_tools/python_deploy/build_linux_packages.sh Outdated Show resolved Hide resolved
build_tools/python_deploy/build_linux_packages.sh Outdated Show resolved Hide resolved
build_tools/python_deploy/build_linux_packages.sh Outdated Show resolved Hide resolved
build_tools/python_deploy/build_linux_packages.sh Outdated Show resolved Hide resolved
docs/development.md Outdated Show resolved Hide resolved
Gets both CI and Release builds integrated in one workflow.
Mount ccache and pip cache as required for fast iterative builds
Current Release docker builds still run with root perms, fix it
in the future to run as the same user.

There may be some corner cases left especially when switching
build types etc.

Docker build TEST plan:

tl;dr:
Build everythin: Releases (Python 3.8, 3.9, 3.10) and CIs.
  TM_PACKAGES="torch-mlir out-of-tree in-tree"
  2.57s user 2.49s system 0% cpu 30:33.11 total

Out of Tree + PyTorch binaries:

  Fresh build (purged cache):
    TM_PACKAGES="out-of-tree"
    0.47s user 0.51s system 0% cpu 5:24.99 total

  Incremental with ccache:
    TM_PACKAGES="out-of-tree"
    0.09s user 0.08s system 0% cpu 34.817 total

Out of Tree + PyTorch from source

  Incremental
    TM_PACKAGES="out-of-tree" TM_USE_PYTORCH_BINARY=OFF
    1.58s user 1.81s system 2% cpu 1:59.61 total

In-Tree + PyTorch binaries:

  Fresh build and tests: (purge ccache)
  TM_PACKAGES="in-tree"
  0.53s user 0.49s system 0% cpu 6:23.35 total

  Fresh build/ but with prior ccache
  TM_PACKAGES="in-tree"
  0.45s user 0.66s system 0% cpu 3:57.47 total

  Incremental in-tree with all tests and regression tests
  TM_PACKAGES="in-tree"
  0.16s user 0.09s system 0% cpu 2:18.52 total

In-Tree + PyTorch from source

  Fresh build and tests: (purge ccache)
  TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF
  2.03s user 2.28s system 0% cpu 11:11.86 total

  Fresh build/ but with prior ccache
  TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF
  1.58s user 1.88s system 1% cpu 4:53.15 total

  Incremental in-tree with all tests and regression tests
  TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF
  1.09s user 1.10s system 1% cpu 3:29.84 total

  Incremental without tests
  TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF TM_SKIP_TESTS=ON
  1.52s user 1.42s system 3% cpu 1:15.82 total

In-tree+out-of-tree + Pytorch Binaries
  TM_PACKAGES="out-of-tree in-tree"
  0.25s user 0.18s system 0% cpu 3:01.91 total

To clear all artifacts:
rm -rf build build_oot llvm-build libtorch docker_venv
externals/pytorch/build
@powderluv powderluv merged commit 9f061ea into llvm:main Aug 30, 2022
@powderluv powderluv deleted the docker-ci branch August 30, 2022 18:07
powderluv added a commit that referenced this pull request Aug 30, 2022
Now that #1234 has landed and anyone can run CI / Release builds locally move GHA to use the same flow.
Copy link
Collaborator

@ashay ashay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for jumping in late, but I just wanted to say Thanks for adding the instructions to docs/development.md! Having never quite figured out how to get Docker working, I was concerned that I'd never be able to figure out how to make use of this change, but the setup instructions are very helpful.

# Location to store Release wheels
TM_OUTPUT_DIR="${TM_OUTPUT_DIR:-${this_dir}/wheelhouse}"
# What "packages to build"
TM_PACKAGES="${TM_PACKAGES:-torch-mlir out-of-tree in-tree}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change (building all three packages) is causing a timeout in the Release build [https://github.com/llvm/torch-mlir/runs/8105971333?check_suite_focus=true].

@powderluv
Copy link
Collaborator Author

Yeah I noticed a timeout yesterday too and a rerun ran faster. There was no functional change for the release build but if you noticed anything that got added that could affect it please let me know

@powderluv
Copy link
Collaborator Author

Actually looking at the code somemore we did change the docker settings to use --ipc=host and bumped the ulimit so the tests can pass. So it is possible the host this VM is running on is slow w.r.t the IPC etc.

Opened #1322 to investigate

AmosLewis pushed a commit to AmosLewis/torch-mlir that referenced this pull request Sep 2, 2022
Gets both CI and Release builds integrated in one workflow.
Mount ccache and pip cache as required for fast iterative builds
Current Release docker builds still run with root perms, fix it
in the future to run as the same user.

There may be some corner cases left especially when switching
build types etc.

Docker build TEST plan:

tl;dr:
Build everythin: Releases (Python 3.8, 3.9, 3.10) and CIs.
  TM_PACKAGES="torch-mlir out-of-tree in-tree"
  2.57s user 2.49s system 0% cpu 30:33.11 total

Out of Tree + PyTorch binaries:

  Fresh build (purged cache):
    TM_PACKAGES="out-of-tree"
    0.47s user 0.51s system 0% cpu 5:24.99 total

  Incremental with ccache:
    TM_PACKAGES="out-of-tree"
    0.09s user 0.08s system 0% cpu 34.817 total

Out of Tree + PyTorch from source

  Incremental
    TM_PACKAGES="out-of-tree" TM_USE_PYTORCH_BINARY=OFF
    1.58s user 1.81s system 2% cpu 1:59.61 total

In-Tree + PyTorch binaries:

  Fresh build and tests: (purge ccache)
  TM_PACKAGES="in-tree"
  0.53s user 0.49s system 0% cpu 6:23.35 total

  Fresh build/ but with prior ccache
  TM_PACKAGES="in-tree"
  0.45s user 0.66s system 0% cpu 3:57.47 total

  Incremental in-tree with all tests and regression tests
  TM_PACKAGES="in-tree"
  0.16s user 0.09s system 0% cpu 2:18.52 total

In-Tree + PyTorch from source

  Fresh build and tests: (purge ccache)
  TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF
  2.03s user 2.28s system 0% cpu 11:11.86 total

  Fresh build/ but with prior ccache
  TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF
  1.58s user 1.88s system 1% cpu 4:53.15 total

  Incremental in-tree with all tests and regression tests
  TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF
  1.09s user 1.10s system 1% cpu 3:29.84 total

  Incremental without tests
  TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF TM_SKIP_TESTS=ON
  1.52s user 1.42s system 3% cpu 1:15.82 total

In-tree+out-of-tree + Pytorch Binaries
  TM_PACKAGES="out-of-tree in-tree"
  0.25s user 0.18s system 0% cpu 3:01.91 total

To clear all artifacts:
rm -rf build build_oot llvm-build libtorch docker_venv
externals/pytorch/build
powderluv added a commit that referenced this pull request Sep 3, 2022
* Move CIs to use docker builds

Now that #1234 has landed and anyone can run CI / Release builds locally move GHA to use the same flow.

* update names

* Update comments
qedawkins pushed a commit to nod-ai/torch-mlir that referenced this pull request Oct 3, 2022
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants