-
Notifications
You must be signed in to change notification settings - Fork 496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dockerize CI + Release builds #1234
Conversation
Gets both CI and Release builds integrated in one workflow. There may be some corner cases left especially when switching Docker build TEST plan: tl;dr: Out of Tree + PyTorch binaries: Fresh build (purged cache): Incremental with ccache: Out of Tree + PyTorch from source Incremental In-Tree + PyTorch binaries: Fresh build and tests: (purge ccache) Fresh build/ but with prior ccache Incremental in-tree with all tests and regression tests In-Tree + PyTorch from source Fresh build and tests: (purge ccache) Fresh build/ but with prior ccache Incremental in-tree with all tests and regression tests Incremental without tests In-tree+out-of-tree + Pytorch Binaries To clear all artifacts: |
This seems reasonable to me. @sjain-stanford ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @powderluv . This looks great. I will give the "local" build+test flow a try later today (very excited!). The main request I have is - since we set out to "dockerize CI" - it'd be good to also see GHA CI workflows updated to use these docker flows. This will validate the requirements fully, and ensure any cache issues or other GHA issues can be addressed alongside this PR.
Happy to add the GHA pieces in the follow-on but wanted to get the base functionality in first so we don't have a mega commit and easy to revert just GHA if something goes haywire |
2055b25
to
6a8a345
Compare
I have also added the GHA workflows now in a follow on commit. It is currently running CI etc. #1313 and Release builds pass (https://github.com/llvm/torch-mlir/runs/8090506802?check_suite_focus=true). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG(reat)TM. From my local testing, I can confirm that re-runs are blazing fast (utilize pip cache)! Left some minor comments to get this going.
I have also added the GHA workflows now in a follow on commit. It is currently running CI etc. #1313
It seems #1313 doesn't have GHA workflows yet - I'm showing this PR replicated there, could you PTAL? Again, thanks for working on the follow-on commit to validate the GHA workflows as well.
Gets both CI and Release builds integrated in one workflow. Mount ccache and pip cache as required for fast iterative builds Current Release docker builds still run with root perms, fix it in the future to run as the same user. There may be some corner cases left especially when switching build types etc. Docker build TEST plan: tl;dr: Build everythin: Releases (Python 3.8, 3.9, 3.10) and CIs. TM_PACKAGES="torch-mlir out-of-tree in-tree" 2.57s user 2.49s system 0% cpu 30:33.11 total Out of Tree + PyTorch binaries: Fresh build (purged cache): TM_PACKAGES="out-of-tree" 0.47s user 0.51s system 0% cpu 5:24.99 total Incremental with ccache: TM_PACKAGES="out-of-tree" 0.09s user 0.08s system 0% cpu 34.817 total Out of Tree + PyTorch from source Incremental TM_PACKAGES="out-of-tree" TM_USE_PYTORCH_BINARY=OFF 1.58s user 1.81s system 2% cpu 1:59.61 total In-Tree + PyTorch binaries: Fresh build and tests: (purge ccache) TM_PACKAGES="in-tree" 0.53s user 0.49s system 0% cpu 6:23.35 total Fresh build/ but with prior ccache TM_PACKAGES="in-tree" 0.45s user 0.66s system 0% cpu 3:57.47 total Incremental in-tree with all tests and regression tests TM_PACKAGES="in-tree" 0.16s user 0.09s system 0% cpu 2:18.52 total In-Tree + PyTorch from source Fresh build and tests: (purge ccache) TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF 2.03s user 2.28s system 0% cpu 11:11.86 total Fresh build/ but with prior ccache TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF 1.58s user 1.88s system 1% cpu 4:53.15 total Incremental in-tree with all tests and regression tests TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF 1.09s user 1.10s system 1% cpu 3:29.84 total Incremental without tests TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF TM_SKIP_TESTS=ON 1.52s user 1.42s system 3% cpu 1:15.82 total In-tree+out-of-tree + Pytorch Binaries TM_PACKAGES="out-of-tree in-tree" 0.25s user 0.18s system 0% cpu 3:01.91 total To clear all artifacts: rm -rf build build_oot llvm-build libtorch docker_venv externals/pytorch/build
Now that #1234 has landed and anyone can run CI / Release builds locally move GHA to use the same flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for jumping in late, but I just wanted to say Thanks for adding the instructions to docs/development.md! Having never quite figured out how to get Docker working, I was concerned that I'd never be able to figure out how to make use of this change, but the setup instructions are very helpful.
# Location to store Release wheels | ||
TM_OUTPUT_DIR="${TM_OUTPUT_DIR:-${this_dir}/wheelhouse}" | ||
# What "packages to build" | ||
TM_PACKAGES="${TM_PACKAGES:-torch-mlir out-of-tree in-tree}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change (building all three packages) is causing a timeout in the Release build [https://github.com/llvm/torch-mlir/runs/8105971333?check_suite_focus=true].
Yeah I noticed a timeout yesterday too and a rerun ran faster. There was no functional change for the release build but if you noticed anything that got added that could affect it please let me know |
Actually looking at the code somemore we did change the docker settings to use Opened #1322 to investigate |
Gets both CI and Release builds integrated in one workflow. Mount ccache and pip cache as required for fast iterative builds Current Release docker builds still run with root perms, fix it in the future to run as the same user. There may be some corner cases left especially when switching build types etc. Docker build TEST plan: tl;dr: Build everythin: Releases (Python 3.8, 3.9, 3.10) and CIs. TM_PACKAGES="torch-mlir out-of-tree in-tree" 2.57s user 2.49s system 0% cpu 30:33.11 total Out of Tree + PyTorch binaries: Fresh build (purged cache): TM_PACKAGES="out-of-tree" 0.47s user 0.51s system 0% cpu 5:24.99 total Incremental with ccache: TM_PACKAGES="out-of-tree" 0.09s user 0.08s system 0% cpu 34.817 total Out of Tree + PyTorch from source Incremental TM_PACKAGES="out-of-tree" TM_USE_PYTORCH_BINARY=OFF 1.58s user 1.81s system 2% cpu 1:59.61 total In-Tree + PyTorch binaries: Fresh build and tests: (purge ccache) TM_PACKAGES="in-tree" 0.53s user 0.49s system 0% cpu 6:23.35 total Fresh build/ but with prior ccache TM_PACKAGES="in-tree" 0.45s user 0.66s system 0% cpu 3:57.47 total Incremental in-tree with all tests and regression tests TM_PACKAGES="in-tree" 0.16s user 0.09s system 0% cpu 2:18.52 total In-Tree + PyTorch from source Fresh build and tests: (purge ccache) TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF 2.03s user 2.28s system 0% cpu 11:11.86 total Fresh build/ but with prior ccache TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF 1.58s user 1.88s system 1% cpu 4:53.15 total Incremental in-tree with all tests and regression tests TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF 1.09s user 1.10s system 1% cpu 3:29.84 total Incremental without tests TM_PACKAGES="in-tree" TM_USE_PYTORCH_BINARY=OFF TM_SKIP_TESTS=ON 1.52s user 1.42s system 3% cpu 1:15.82 total In-tree+out-of-tree + Pytorch Binaries TM_PACKAGES="out-of-tree in-tree" 0.25s user 0.18s system 0% cpu 3:01.91 total To clear all artifacts: rm -rf build build_oot llvm-build libtorch docker_venv externals/pytorch/build
* Move CIs to use docker builds Now that #1234 has landed and anyone can run CI / Release builds locally move GHA to use the same flow. * update names * Update comments
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Gets both CI and Release builds integrated in one workflow.
Tested the callout with in-tree / out-of-tree and torch-mlir
release packages
TODO: add the correct CMake commands in the functions.
Mount ccache and pip cache as required
Out to build all the CI and Release builds in one go:
If we want to use Ubuntu 22.04 for the CI: