[ci][fix] Fix cuda_exp ci #5438

shiyu1994 · 2022-08-24T09:36:05Z

Unfortunately, in a very long time (since the merge of #4630), the ci tests for cuda_exp has not actually run.
Since the task name for both cuda and cuda_exp is cuda

LightGBM/.github/workflows/cuda.yml

Line 14 in 702db13

task: cuda

in test.sh the cuda_exp option cannot be detected, and it will always run for device cuda!

LightGBM/.ci/test.sh

Lines 198 to 199 in 702db13

    
           elif [[ $TASK == "cuda" || $TASK == "cuda_exp" ]]; then 
        
               if [[ $TASK == "cuda" ]]; then

That's a serious mistake. And I think we need to fix it right now for our on-going development of cuda_exp.

shiyu1994 · 2022-08-24T09:40:01Z

See #5425 (comment)

jameslamb

Thank you very much for noticing and proposing a fix for this! But I think there's a much simpler fix that will achieve the same goal.

I think doing the following would prevent adding new complexity to .ci/test.sh:

replace treelearner: in .github/workflows/cuda.yml with task:
remove this line:

LightGBM/.github/workflows/cuda.yml

Line 14 in 702db13

task: cuda
change the following line to TASK="${{ matrix.task }}":

LightGBM/.github/workflows/cuda.yml

Line 89 in 702db13

TASK=${{ env.task }}

I remember we had a discussion in #4630 about not adding more complexity to test.sh: #4630 (comment).

In addition to that request...right now, both cuda_exp CI jobs are failing with the following error

Fatal Python error: Aborted

(build link)

We should never merge a PR that will result in CI being broken on master, so please do one of these:

push a fix for that issue to this PR
remove the cuda_exp CI jobs in this PR, document the work to add them back in a separate issue, merge this PR, and then push a follow-up PR fixing cuda_exp and adding back the CI jobs

I hope a fix can just be pushed here, but I don't know how complicated it will be to debug this. If it is very complicated, then we might as well eliminate those two cuda_exp jobs that are not actually testing cuda_exp, to save CI time.

And if the "make an issue and fix this later" approach is taken, then I think we should not merge any more cuda_exp PRs until those CI jobs have been re-enabled.

.ci/test.sh

jameslamb · 2022-08-25T04:41:36Z

I've added the label maintenance to this PR.

As I mentioned in #5403 (comment) and #5413 (comment), please add a labels when you create pull requests here, so your PR will be correctly categorized in the release notes.

shiyu1994 · 2022-08-25T09:17:53Z

I think doing the following would prevent adding new complexity to .ci/test.sh:

Thanks for the suggestion. Already applied via 0206da8 and eb34dd5.

We should never merge a PR that will result in CI being broken on master, so please do one of these:

push a fix for that issue to this PR

remove the cuda_exp CI jobs in this PR, document the work to add them back in a separate issue, merge this PR, and then push a follow-up PR fixing cuda_exp and adding back the CI jobs

I'd prefer the first choice, and already pushed the fixes via ca7df38.

As I mentioned in #5403 (comment) and #5413 (comment), please add a labels when you create pull requests here, so your PR will be correctly categorized in the release notes.

Thanks again for the reminder.

shiyu1994 · 2022-08-25T09:26:03Z

Note that this is blocking other related PRs, including (#5425 and #4827).

shiyu1994 · 2022-08-26T00:05:39Z

This is ready. @jameslamb @guolinke Could you please review this again. If it looks good to you, let's merge this so that we can work on related PRs. Thanks.

StrikerRUS

LGTM for .github/workflows/cuda.yml and tests/python_package_test/test_utilities.py except one minor simplification.

StrikerRUS · 2022-08-28T16:17:13Z

src/boosting/gbdt.cpp

+      CHECK_EQ(gradients_pointer_, gradients_.data());
+      CHECK_EQ(hessians_pointer_, hessians_.data());


Aren't these checks slow? Maybe move them under #ifdef DEBUG?

Thanks. These checks are fast.

tests/python_package_test/test_utilities.py

dismissing to prevent blocking, as I might not be able to review again today

jameslamb

I don't have any additional comments.

Please address @StrikerRUS 's suggestions, and then I think @guolinke should re-review this (since you've added significant changes since his original review) prior to this being merged.

Thanks for fixing this!

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

shiyu1994 · 2022-08-29T02:27:30Z

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/2945515006

Status: success ✔️.

shiyu1994 · 2022-08-29T02:30:10Z

/gha run r-solaris

jameslamb · 2022-08-29T02:49:57Z

@shiyu1994 Solaris support was removed in #5226. The r-solaris check no longer exists, and is no longer necessary.

But thank you for being so thorough!

shiyu1994 · 2022-08-29T06:37:44Z

@guolinke @jameslamb @StrikerRUS Thanks for reviewing this. I'll merge this since all tests have been passed and it is blocking many related PRs including #4827, #5425 and #4266.

github-actions · 2023-08-19T03:23:52Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

fix cuda_exp ci

eb0b5c8

shiyu1994 requested a review from guolinke August 24, 2022 09:36

shiyu1994 requested review from StrikerRUS and jameslamb as code owners August 24, 2022 09:36

guolinke approved these changes Aug 24, 2022

View reviewed changes

jameslamb previously requested changes Aug 25, 2022

View reviewed changes

.ci/test.sh Outdated Show resolved Hide resolved

jameslamb added maintenance in progress labels Aug 25, 2022

shiyu1994 added 3 commits August 25, 2022 08:55

fix ci failures introduced by #5279

ca7df38

cleanup cuda.yml

eb34dd5

fix test.sh

0206da8

shiyu1994 added 2 commits August 25, 2022 09:22

clean up test.sh

e73262f

clean up test.sh

f79ce1b

skip lines by cuda_exp in test_register_logger

d8010f6

shiyu1994 requested a review from jmoralez as a code owner August 25, 2022 11:25

Merge branch 'master' into cuda/fix-cuda-exp-ci

085e436

shiyu1994 requested review from jameslamb and guolinke August 26, 2022 00:05

shiyu1994 added awaiting review and removed in progress labels Aug 26, 2022

Merge branch 'master' into cuda/fix-cuda-exp-ci

cbe45b0

StrikerRUS approved these changes Aug 28, 2022

View reviewed changes

jameslamb reviewed Aug 29, 2022

View reviewed changes

Update tests/python_package_test/test_utilities.py

27dd92e

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

shiyu1994 merged commit be7f321 into master Aug 29, 2022

shiyu1994 deleted the cuda/fix-cuda-exp-ci branch August 29, 2022 06:38

jameslamb removed the awaiting review label Aug 29, 2022

jameslamb mentioned this pull request Oct 7, 2022

[DO NOT MERGE] Release v3.3.3 #5525

Closed

40 tasks

github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ci][fix] Fix cuda_exp ci #5438

[ci][fix] Fix cuda_exp ci #5438

shiyu1994 commented Aug 24, 2022

shiyu1994 commented Aug 24, 2022

jameslamb left a comment

jameslamb commented Aug 25, 2022

shiyu1994 commented Aug 25, 2022 •

edited

Loading

shiyu1994 commented Aug 25, 2022

shiyu1994 commented Aug 26, 2022

StrikerRUS left a comment

StrikerRUS Aug 28, 2022

shiyu1994 Aug 29, 2022

jameslamb left a comment

shiyu1994 commented Aug 29, 2022 •

edited by guolinke

Loading

shiyu1994 commented Aug 29, 2022

jameslamb commented Aug 29, 2022

shiyu1994 commented Aug 29, 2022

github-actions bot commented Aug 19, 2023

	elif [[ $TASK == "cuda" \|\| $TASK == "cuda_exp" ]]; then
	if [[ $TASK == "cuda" ]]; then

		CHECK_EQ(gradients_pointer_, gradients_.data());
		CHECK_EQ(hessians_pointer_, hessians_.data());

[ci][fix] Fix cuda_exp ci #5438

[ci][fix] Fix cuda_exp ci #5438

Conversation

shiyu1994 commented Aug 24, 2022

shiyu1994 commented Aug 24, 2022

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb commented Aug 25, 2022

shiyu1994 commented Aug 25, 2022 • edited Loading

shiyu1994 commented Aug 25, 2022

shiyu1994 commented Aug 26, 2022

StrikerRUS left a comment

Choose a reason for hiding this comment

StrikerRUS Aug 28, 2022

Choose a reason for hiding this comment

shiyu1994 Aug 29, 2022

Choose a reason for hiding this comment

jameslamb left a comment

Choose a reason for hiding this comment

shiyu1994 commented Aug 29, 2022 • edited by guolinke Loading

shiyu1994 commented Aug 29, 2022

jameslamb commented Aug 29, 2022

shiyu1994 commented Aug 29, 2022

github-actions bot commented Aug 19, 2023

shiyu1994 commented Aug 25, 2022 •

edited

Loading

shiyu1994 commented Aug 29, 2022 •

edited by guolinke

Loading