[CI] Explicitly set eval batch size in determinism tests, introduce a new integration test group, and exclude slow tests. #3590

justinxzhao · 2023-09-07T03:37:51Z

Make Ludwig CI consistently green again.

With the changes in this PR, CI time is cut down from 1.5 hours (with timeouts) to 40 minutes.

`tests/ludwig/models/test_training_determinism.py::test_training_determinism_local_backend`:

Looked through commit history and CI history to determine a commit when tests still passed and a commit when tests failed.
Narrowed down the culprit to this change. This PR made some changes to batch size tuning mechanics for auto effective batch size. This PR made a change that also tunes eval batch size even if batch size is fixed.
I've added a couple of docstrings to help clarify how auto batch size parameters are resolved and why batch size tuning can be non-deterministic even with fixed random seeds.
We now explicitly set the eval batch size in tests to eliminate non-determinism.

CI speedup: New integration test group and marking tests as slow

Added a new integration test group E to further parallelize integration tests.

Marking tests as slow: The purpose of on-PR-CI is to give us a timely sense of whether a change is safe to land. The slowest tests (largely hyperopt+ray), in my opinion, provide more limited utility and not worth the on-PR-CI slowdown.

NOTE: Slow tests are still run when a PR is merged to master (PR authors are notified).

Tests marked as slow:

- test_automl.py:
	- test_train_with_config_remote
	- test_train_with_config
- test_cached_preprocessing.py
	- test_onehot_encoding
	- test_hf_text_embedding
	- test_onehot_encoding_preprocessing
- test_experiment.py:
	- test_experiment_model_resume_missing_file
	- test_experiment_model_resume_before_1st_epoch_distributed
	- test_tabnet_with_batch_size_1
- test_explain.py
	- test_explainer_api_ray_minimum_batch_size
- test_gbm.py
	- test_ray_gbm_binary
	- test_ray_gbm_non_number_inputs
	- test_ray_gbm_category
	- test_gbm_category_one_hot_encoding
	- test_gbm_text_tfidf
	- test_gbm_feature_name_special_characters
- test_hyperopt.py
	- test_hyperopt_run_hyperopt
	- test_hyperopt_without_config_defaults
	- test_hyperopt_with_time_budget
- test_hyperopt_ray.py
	- test_hyperopt_executor
	- test_hyperopt_executor_with_metric
	- test_hyperopt_ray_mlflow
- test_hyperopt_ray_horovod.py
	- test_hyperopt_executor_variant_generator
- test_postprocessing.py
	- test_binary_predictions
	- test_binary_predictions_with_number_dtype
- test_ray.py
	- test_ray_lazy_load_audio_error
	- test_ray_lazy_load_image_works
	- [Removed] test_ray_progress_bar
		- This test instantiates a simple training run. There are many other tests in the file that cover this.
	- test_ray_calibration
	- test_ray_distributed_predict
	- test_ray_preprocessing_placement_group
- test_remote.py
	- test_remote_training_set
- test_sequence_decoders.py
	- test_sequence_decoder_predictions
- test_augmentation_pipeline.py
	- test_ray_model_training_with_augmentation_pipeline
- test_dask.py
	- test_from_ray_dataset_empty

github-actions · 2023-09-07T07:17:03Z

Unit Test Results

  6 files ±0   6 suites ±0 44m 5s ⏱️ - 36m 49s
31 tests - 3 26 ✔️ - 3   5 💤 ±0 0 ❌ ±0
82 runs - 6 66 ✔️ - 6 16 💤 ±0 0 ❌ ±0

Results for commit c6309d3. ± Comparison against base commit 6931fe4.

This pull request removes 3 tests.

tests.integration_tests.test_experiment ‑ test_experiment_model_resume_distributed[horovod]
tests.integration_tests.test_ray ‑ test_ray_outputs[horovod-csv]
tests.integration_tests.test_ray ‑ test_ray_outputs[horovod-parquet]

♻️ This comment has been updated with latest results.

tests/integration_tests/test_hyperopt_ray.py

arnavgarg1 · 2023-09-12T16:24:03Z

tests/ludwig/data/dataframe/test_dask.py

@@ -5,6 +5,7 @@
 from tests.integration_tests.utils import generate_data_as_dataframe


+@pytest.mark.slow


I wonder what makes this test particularly slow, perhaps we can modify to use 1 feature and just train for 5 steps instead of an entire epoch

arnavgarg1

Generally LGTM except for one comment!

jeffkinnison

LGTM! +1 to @arnavgarg1's comment about the hyperopt tests

justinxzhao added 3 commits September 7, 2023 03:29

Explicitly set eval batch size in determinism tests.

c5867f3

Skip test.

80202db

Pin torch nightly and create a separate integration test group E.

68222b0

justinxzhao added 2 commits September 7, 2023 13:44

Use 2.1.0 nightly.

0b64f31

Aggressively mark slow tests.

0d209cc

justinxzhao changed the title ~~[CI] Explicitly set eval batch size in determinism tests.~~ [CI] Explicitly set eval batch size in determinism tests, introduce a new integration test group, and exclude slow tests. Sep 7, 2023

justinxzhao marked this pull request as ready for review September 7, 2023 18:47

justinxzhao requested review from arnavgarg1, tgaddair and jeffkinnison September 7, 2023 18:47

justinxzhao added 5 commits September 7, 2023 19:53

Unpin torch nightly and skip the test.

f043d75

Fix skip condition.

9395238

Use version.parse.

04f2a54

from packaging.

574951f

Use base_versions to do comparison accurately.

612cc87

justinxzhao requested review from w4nderlust and jimthompson5802 September 11, 2023 17:37

justinxzhao added 5 commits September 11, 2023 17:38

Merge branch 'master' of github.com:ludwig-ai/ludwig into debug_ci

49ae33e

Unskip tabtransformer tests.

5e7f154

Undo non-change.

df434f0

Merge branch 'master' of github.com:ludwig-ai/ludwig into debug_ci

a35dc84

Set eval batch size for reproducible test.

340046f

arnavgarg1 reviewed Sep 12, 2023

View reviewed changes

tests/integration_tests/test_hyperopt_ray.py Outdated Show resolved Hide resolved

arnavgarg1 reviewed Sep 12, 2023

View reviewed changes

arnavgarg1 approved these changes Sep 12, 2023

View reviewed changes

jeffkinnison approved these changes Sep 12, 2023

View reviewed changes

justinxzhao added 2 commits September 13, 2023 00:59

Unmark test_hyperopt_run_hyperopt test.

029a0db

Merge branch 'master' of github.com:ludwig-ai/ludwig into debug_ci

c6309d3

justinxzhao merged commit 03a0da0 into master Sep 13, 2023
17 checks passed

justinxzhao deleted the debug_ci branch September 13, 2023 02:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Explicitly set eval batch size in determinism tests, introduce a new integration test group, and exclude slow tests. #3590

[CI] Explicitly set eval batch size in determinism tests, introduce a new integration test group, and exclude slow tests. #3590

justinxzhao commented Sep 7, 2023 •

edited

Loading

github-actions bot commented Sep 7, 2023 •

edited

Loading

arnavgarg1 Sep 12, 2023

arnavgarg1 left a comment

jeffkinnison left a comment

		@@ -5,6 +5,7 @@
		from tests.integration_tests.utils import generate_data_as_dataframe


		@pytest.mark.slow

[CI] Explicitly set eval batch size in determinism tests, introduce a new integration test group, and exclude slow tests. #3590

[CI] Explicitly set eval batch size in determinism tests, introduce a new integration test group, and exclude slow tests. #3590

Conversation

justinxzhao commented Sep 7, 2023 • edited Loading

tests/ludwig/models/test_training_determinism.py::test_training_determinism_local_backend:

CI speedup: New integration test group and marking tests as slow

github-actions bot commented Sep 7, 2023 • edited Loading

Unit Test Results

arnavgarg1 Sep 12, 2023

Choose a reason for hiding this comment

arnavgarg1 left a comment

Choose a reason for hiding this comment

jeffkinnison left a comment

Choose a reason for hiding this comment

justinxzhao commented Sep 7, 2023 •

edited

Loading

`tests/ludwig/models/test_training_determinism.py::test_training_determinism_local_backend`:

github-actions bot commented Sep 7, 2023 •

edited

Loading