Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Improve windows machine CI test time #8730

Merged
merged 10 commits into from
Jan 3, 2024
Merged

[CI] Improve windows machine CI test time #8730

merged 10 commits into from
Jan 3, 2024

Conversation

comphead
Copy link
Contributor

@comphead comphead commented Jan 3, 2024

Which issue does this PR close?

Closes #8696.

Rationale for this change

CI tests for windows machines are extremely slow and running 3+ hours now, whereas other architectures are able to finish within 30 mins.

This PR adds more granular build options for each architecture.
With this change windows build time is 60 minutes, which is 2x slower than others OS but still 3x better than current windows run

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@comphead
Copy link
Contributor Author

comphead commented Jan 3, 2024

@Jefffrey I think I did some progress to improve win64 test performance.

@comphead comphead changed the title Test WIN64 CI [CI] Improve windows machine CI test time Jan 3, 2024
@comphead comphead marked this pull request as ready for review January 3, 2024 18:33
@comphead
Copy link
Contributor Author

comphead commented Jan 3, 2024

@Jefffrey @andygrove @alamb please review

Thanks to @Jefffrey investigations in #8696 (comment) it looks like the issue with newer rustc and the next step should probably be a ticket to a Rust team.

During this PR I found that increasing opt-level for rustc majorly mitigates the problem but not fixes completely.

@alamb
Copy link
Contributor

alamb commented Jan 3, 2024

@Jefffrey @andygrove @alamb please review

Thanks to @Jefffrey investigations in #8696 (comment) it looks like the issue with newer rustc and the next step should probably be a ticket to a Rust team.

During this PR I found that increasing opt-level for rustc majorly mitigates the problem but not fixes completely.

Do we need to increase opt-level for all OSs or just windows?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the investigation @comphead and @Jefffrey

.github/workflows/rust.yml Show resolved Hide resolved
.github/workflows/rust.yml Show resolved Hide resolved
@comphead
Copy link
Contributor Author

comphead commented Jan 3, 2024

Thanks @alamb for the feedback. Addressed all the comments

@comphead comphead requested a review from alamb January 3, 2024 20:49
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improvements to Windows CI runtime is amazing 👍

I've raised an issue on Rust repo: rust-lang/rust#119560

But until that can be futher investigated/triaged, this will help reduce the CI runtime (current 3 hours is ages...)

For Linux and Mac runners, do those changes have any significant effect on test runtime? Otherwise maybe won't set those options to try keep things simple and as close to local environment as we could?

@@ -290,6 +298,7 @@ jobs:
# with a OS-dependent path.
- name: Setup Rust toolchain
run: |
rustup update stable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the update here strictly necessary? Or should already be covered as part of the toolchain install below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll check it, maybe its not needed.

Comment on lines +359 to +360
# avoid rust stack overflows on tpc-ds tests
RUST_MINSTACK: "3000000"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like something we should fix in tpc-ds tests, instead of accounting for it after the fact (I know I've had some trouble running those tests locally due to stack issue, so could happen to others)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is true, during testing I saw spontaneous stack overflow issue on physical q54 TPC-DS test, that was the reason to bump the stack size. But I agree its good to create a follow up ticket and figure out why q54 is not the same as others, maybe the bigger problem is behind it.

# do not produce debug symbols to keep memory usage down
# hardcoding other profile params to avoid profile override values
# More on Cargo profiles https://doc.rust-lang.org/cargo/reference/profiles.html?profile-settings#profile-settings
RUSTFLAGS: "-C debuginfo=0 -C opt-level=0 -C incremental=false -C codegen-units=256"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/apache/arrow-datafusion/blob/1179a76567892b259c88f08243ee01f05c4c3d5c/.github/actions/setup-builder/action.yaml#L45-L50

I think some of these flags are already enabled, technically? Just for Linux though, not Mac/Windows

(Well debuginfo was set to 0, not sure on incremental)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats good point, I would like to have configuration in a single place which is rust.yml, I'll create a followup to remove CARGO_INCREMENTAL and backtraces and other rust conf from action.yaml

@comphead comphead requested a review from Jefffrey January 3, 2024 21:46
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @comphead -- let's get this merged and file follow on tickets / cleanups as follow on PRs

I verified that this makes windows a lot faster

Screenshot 2024-01-03 at 4 51 44 PM

@alamb
Copy link
Contributor

alamb commented Jan 3, 2024

For Linux and Mac runners, do those changes have any significant effect on test runtime? Otherwise maybe won't set those options to try keep things simple and as close to local environment as we could?

I compared the linux and mac times on the most recent commit to master

Mac took slightly longer (16m on this pr compared to 12m on main)

Linux was faster (8m on this PR, compared to 9m on main)

I think the differences are not significant and within normal variances

Thanks again @comphead and @Jefffrey

@alamb alamb merged commit 93da699 into apache:main Jan 3, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Windows CI test is extremely long (~3 hours)
3 participants