Simplify matrix configuration for CI workflows #1213

sjain-stanford · 2022-08-11T14:50:46Z

Addresses #1207.

Provisioned jobs:

# ubuntu - x86_64 - llvm in-tree     - pytorch binary - build+test    # most used dev flow and fastest signal
# ubuntu - x86_64 - llvm out-of-tree - pytorch source - build+test    # most elaborate build
# macos  - arm64  - llvm in-tree     - pytorch source - build only    # cross compile, can't test arm64

Main changes

Spawn macos builds from a separate matrix (in the same workflow). It made sense to do this as they are fairly different from ubuntu (cross compile, use a different cmake configuration). This simplifies the matrix configuration and exclusions quite a bit, and makes the workflow a bit more tractable and maintenance friendly.
Remove the submodule md5sum step for ccache config. This was broken for a while now.
Removes unused matrix options - os, targetarch, python-version, llvmtype.
Address ZSTD comment on @powderluv's cross compile PR.

Further improvements (to be addressed in follow-on):

ubuntu-x86_64 out-of-tree integration tests fail (error); only run unit tests for now (tests are excluded in current CI too)

Passing workflow:

https://github.com/sjain-stanford/torch-mlir/actions/runs/2840676309

powderluv · 2022-08-11T17:31:47Z

This is looking great. The arm64 Pytorch src build should work if you remove this one line

torch-mlir/build_tools/build_libtorch.sh

Line 134 in 51bfe25

pip uninstall torch

It is trying to uninstall the systemwide torch.

.github/workflows/buildAndTest.yml

…ch source build for arm64

sjain-stanford · 2022-08-11T18:07:51Z

The arm64 Pytorch src build should work if you remove this one line

Thanks, patched. Let's wait to see if the arm64 pytorch source workflow goes through with the fix. If there are more errors, I can revert to pytorch binary and land it for now (to avoid cache evictions the longer this is open). I'll wait for an "all green" CI before landing, but if this looks good otherwise, please feel free to ✅ this.

powderluv

LGTM. Feel free to switch arm64 pytorch source in a follow on.

powderluv · 2022-08-11T23:43:00Z

nicely done. The silly cache gets generated again when it merges so we got to wait for it again

sjain-stanford · 2022-08-11T23:53:02Z

The silly cache gets generated again when it merges so we got to wait for it again

Ah I was wondering why it didn't restore from cache after landing because the keys didn't change. Good to know this is normal. Maybe it treats GHA runs on PRs differently than runs on push to main. Oh well...

sjain-stanford · 2022-08-12T00:02:28Z

... and thank you for the help in reviewing it!

My earlier[ PR](#1213) had (among other things) decoupled ubuntu and macos builds into separate matrix runs. This is not working well due to limited number of MacOS GHA VMs causing long queue times and backlog. There are two reasons causing this backlog: 1. macos arm64 builds with pytorch source are getting erratically cancelled due to resource / network constraints. This is addressed with this: #1215 > "macos-arm64 (in-tree, OFF) The hosted runner: GitHub Actions 3 lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error." 2. macos runs don't fail-fast when ubuntu runs fail due to being in separate matrix setups. This PR couples them again.

…harable for other drivers (llvm#1213) Signed-off-by: Tung D. Le <tung@jp.ibm.com>

sjain-stanford requested review from powderluv, asaadaldien and silvasean August 11, 2022 14:50

sjain-stanford linked an issue Aug 11, 2022 that may be closed by this pull request

clean up llvmtype / buildtype in Github workflows #1207

Closed

optimize CI workflow

b11f37c

sjain-stanford force-pushed the sambhav/build_reconfig branch from a8b8ac5 to b11f37c Compare August 11, 2022 16:46

powderluv reviewed Aug 11, 2022

View reviewed changes

.github/workflows/buildAndTest.yml Show resolved Hide resolved

Remove pip uninstall pytorch from build_libtorch.sh and attempt pytor…

501edcd

…ch source build for arm64

powderluv approved these changes Aug 11, 2022

View reviewed changes

sjain-stanford merged commit f00ca91 into llvm:main Aug 11, 2022

sjain-stanford mentioned this pull request Aug 12, 2022

Merge matrix runs to fail fast globally #1216

Merged

tanyokwok mentioned this pull request Sep 21, 2022

features/bladedisc rebase 20220830 pai-disc/torch-mlir#20

Closed

qedawkins pushed a commit to nod-ai/torch-mlir that referenced this pull request Oct 3, 2022

Make checkConstantOutputs and populateAffineAndKrnlToLLVMConversion s…

77fac29

…harable for other drivers (llvm#1213) Signed-off-by: Tung D. Le <tung@jp.ibm.com>

sjain-stanford deleted the sambhav/build_reconfig branch November 10, 2022 19:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify matrix configuration for CI workflows #1213

Simplify matrix configuration for CI workflows #1213

sjain-stanford commented Aug 11, 2022 •

edited

Loading

powderluv commented Aug 11, 2022

sjain-stanford commented Aug 11, 2022

powderluv left a comment

powderluv commented Aug 11, 2022

sjain-stanford commented Aug 11, 2022

sjain-stanford commented Aug 12, 2022

Simplify matrix configuration for CI workflows #1213

Simplify matrix configuration for CI workflows #1213

Conversation

sjain-stanford commented Aug 11, 2022 • edited Loading

Provisioned jobs:

Main changes

Further improvements (to be addressed in follow-on):

Passing workflow:

powderluv commented Aug 11, 2022

sjain-stanford commented Aug 11, 2022

powderluv left a comment

Choose a reason for hiding this comment

powderluv commented Aug 11, 2022

sjain-stanford commented Aug 11, 2022

sjain-stanford commented Aug 12, 2022

sjain-stanford commented Aug 11, 2022 •

edited

Loading