Add parallel support to nightly spark standalone tests #3264

pxLi · 2021-08-20T09:54:37Z

Signed-off-by: Peixin Li pxli@nyu.edu

To speed our nightly spark standalone integrations tests,
spark 3.0.x total time: ~3h 40m to ~1h 15m
spark 3.1.x total time: ~4h to ~1h 35m (include extra ParquetCachedBatchSerializer cache_test)

I am still doing more verification on other scenarios, submit first to collect feedbacks, thanks!

This requires to be enabled in nightly pipelines' Jenkinsfile separately

Signed-off-by: Peixin Li <pxli@nyu.edu>

jlowe · 2021-08-20T13:21:39Z

jenkins/spark-tests.sh

+# integration tests
+if [[ $PARALLEL_TEST == "true" ]] && [ -x "$(command -v parallel)" ]; then
+  # put most time-consuming tests at the head of queue
+  time_consuming_tests="join_test.py generate_expr_test.py"


I thought we were tagging the source to indicate they were slow as in #3241. Curious why this isn't leveraging that, e.g.: time_consuming_tests=$(grep -rl pytest.mark.slow_test "$SCRIPT_PATH"/src/main/python)

slow_tag there was for pre-merge parallel split (either time-consuming or high memory usage), mostly to balance test time of two parallel test stages. The tag naming looks ambiguous, I talked to Alex, he will help to rename this tag
In nightly cases, I want pick time-consuming ones only, and per my tests join_test (3000s) + generate_expr_test (1800s) were the only 2 tests that spent consistently over 15 mins, so I manually put them here to avoid randomly putting them in the middle or tail of task queue.

Should we split the test cases in join_test and generate_expr_test so they spread more evenly even without using the parallel hack?

Yes, split in test-cases level that would be better. I have tested w/ a few non-serious cases-level split but not getting much benefits from it, and this could make it harder for developer to manage the test scenarios. I would like to get back to this if current setup does not fulfill our efficiency requirement

pxLi · 2021-08-23T06:29:10Z

build

jlowe

Looks OK to me other than we may want a followup to split some of these expensive test cases to make it easier for pytest to make better decisions about running them in parallel on its own.

pxLi · 2021-08-24T01:04:42Z

Looks OK to me other than we may want a followup to split some of these expensive test cases to make it easier for pytest to make better decisions about running them in parallel on its own.

Thanks! I filed an issue #3279 to track the followup

Add parallel support to nightly spark standalone tests

c130db8

Signed-off-by: Peixin Li <pxli@nyu.edu>

pxLi added the test Only impacts tests label Aug 20, 2021

pxLi requested a review from GaryShen2008 as a code owner August 20, 2021 09:54

pxLi requested review from jlowe, NvTimLiu, revans2 and tgravescs as code owners August 20, 2021 09:54

pxLi marked this pull request as draft August 20, 2021 09:54

fix indentation

6684673

jlowe reviewed Aug 20, 2021

View reviewed changes

more tests to head of queue

6bfe65a

pxLi marked this pull request as ready for review August 23, 2021 06:29

pxLi changed the title ~~[REVIEW] Add parallel support to nightly spark standalone tests~~ Add parallel support to nightly spark standalone tests Aug 23, 2021

jlowe approved these changes Aug 23, 2021

View reviewed changes

pxLi merged commit 34df0c4 into NVIDIA:branch-21.10 Aug 24, 2021

zhanga5 mentioned this pull request Aug 25, 2021

Rename pytest 'slow_test' tag as 'premerge_ci_1' to avoid confusion #3295

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parallel support to nightly spark standalone tests #3264

Add parallel support to nightly spark standalone tests #3264

pxLi commented Aug 20, 2021 •

edited

Loading

jlowe Aug 20, 2021 •

edited

Loading

pxLi Aug 23, 2021 •

edited

Loading

jlowe Aug 23, 2021

pxLi Aug 24, 2021

pxLi commented Aug 23, 2021

jlowe left a comment

pxLi commented Aug 24, 2021

Add parallel support to nightly spark standalone tests #3264

Add parallel support to nightly spark standalone tests #3264

Conversation

pxLi commented Aug 20, 2021 • edited Loading

jlowe Aug 20, 2021 • edited Loading

Choose a reason for hiding this comment

pxLi Aug 23, 2021 • edited Loading

Choose a reason for hiding this comment

jlowe Aug 23, 2021

Choose a reason for hiding this comment

pxLi Aug 24, 2021

Choose a reason for hiding this comment

pxLi commented Aug 23, 2021

jlowe left a comment

Choose a reason for hiding this comment

pxLi commented Aug 24, 2021

pxLi commented Aug 20, 2021 •

edited

Loading

jlowe Aug 20, 2021 •

edited

Loading

pxLi Aug 23, 2021 •

edited

Loading