Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Explicitly set eval batch size in determinism tests, introduce a new integration test group, and exclude slow tests. #3590

Merged
merged 17 commits into from
Sep 13, 2023

Conversation

justinxzhao
Copy link
Contributor

@justinxzhao justinxzhao commented Sep 7, 2023

Make Ludwig CI consistently green again.

With the changes in this PR, CI time is cut down from 1.5 hours (with timeouts) to 40 minutes.

tests/ludwig/models/test_training_determinism.py::test_training_determinism_local_backend:

  • Looked through commit history and CI history to determine a commit when tests still passed and a commit when tests failed.
  • Narrowed down the culprit to this change. This PR made some changes to batch size tuning mechanics for auto effective batch size. This PR made a change that also tunes eval batch size even if batch size is fixed.
  • I've added a couple of docstrings to help clarify how auto batch size parameters are resolved and why batch size tuning can be non-deterministic even with fixed random seeds.
  • We now explicitly set the eval batch size in tests to eliminate non-determinism.

CI speedup: New integration test group and marking tests as slow

Added a new integration test group E to further parallelize integration tests.

Marking tests as slow: The purpose of on-PR-CI is to give us a timely sense of whether a change is safe to land. The slowest tests (largely hyperopt+ray), in my opinion, provide more limited utility and not worth the on-PR-CI slowdown.

NOTE: Slow tests are still run when a PR is merged to master (PR authors are notified).

Tests marked as slow:

- test_automl.py:
	- test_train_with_config_remote
	- test_train_with_config
- test_cached_preprocessing.py
	- test_onehot_encoding
	- test_hf_text_embedding
	- test_onehot_encoding_preprocessing
- test_experiment.py:
	- test_experiment_model_resume_missing_file
	- test_experiment_model_resume_before_1st_epoch_distributed
	- test_tabnet_with_batch_size_1
- test_explain.py
	- test_explainer_api_ray_minimum_batch_size
- test_gbm.py
	- test_ray_gbm_binary
	- test_ray_gbm_non_number_inputs
	- test_ray_gbm_category
	- test_gbm_category_one_hot_encoding
	- test_gbm_text_tfidf
	- test_gbm_feature_name_special_characters
- test_hyperopt.py
	- test_hyperopt_run_hyperopt
	- test_hyperopt_without_config_defaults
	- test_hyperopt_with_time_budget
- test_hyperopt_ray.py
	- test_hyperopt_executor
	- test_hyperopt_executor_with_metric
	- test_hyperopt_ray_mlflow
- test_hyperopt_ray_horovod.py
	- test_hyperopt_executor_variant_generator
- test_postprocessing.py
	- test_binary_predictions
	- test_binary_predictions_with_number_dtype
- test_ray.py
	- test_ray_lazy_load_audio_error
	- test_ray_lazy_load_image_works
	- [Removed] test_ray_progress_bar
		- This test instantiates a simple training run. There are many other tests in the file that cover this.
	- test_ray_calibration
	- test_ray_distributed_predict
	- test_ray_preprocessing_placement_group
- test_remote.py
	- test_remote_training_set
- test_sequence_decoders.py
	- test_sequence_decoder_predictions
- test_augmentation_pipeline.py
	- test_ray_model_training_with_augmentation_pipeline
- test_dask.py
	- test_from_ray_dataset_empty

@github-actions
Copy link

github-actions bot commented Sep 7, 2023

Unit Test Results

  6 files  ±0    6 suites  ±0   44m 5s ⏱️ - 36m 49s
31 tests  - 3  26 ✔️  - 3    5 💤 ±0  0 ±0 
82 runs   - 6  66 ✔️  - 6  16 💤 ±0  0 ±0 

Results for commit c6309d3. ± Comparison against base commit 6931fe4.

This pull request removes 3 tests.
tests.integration_tests.test_experiment ‑ test_experiment_model_resume_distributed[horovod]
tests.integration_tests.test_ray ‑ test_ray_outputs[horovod-csv]
tests.integration_tests.test_ray ‑ test_ray_outputs[horovod-parquet]

♻️ This comment has been updated with latest results.

@justinxzhao justinxzhao changed the title [CI] Explicitly set eval batch size in determinism tests. [CI] Explicitly set eval batch size in determinism tests, introduce a new integration test group, and exclude slow tests. Sep 7, 2023
@justinxzhao justinxzhao marked this pull request as ready for review September 7, 2023 18:47
@@ -5,6 +5,7 @@
from tests.integration_tests.utils import generate_data_as_dataframe


@pytest.mark.slow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what makes this test particularly slow, perhaps we can modify to use 1 feature and just train for 5 steps instead of an entire epoch

Copy link
Contributor

@arnavgarg1 arnavgarg1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM except for one comment!

Copy link
Contributor

@jeffkinnison jeffkinnison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! +1 to @arnavgarg1's comment about the hyperopt tests

@justinxzhao justinxzhao merged commit 03a0da0 into master Sep 13, 2023
17 checks passed
@justinxzhao justinxzhao deleted the debug_ci branch September 13, 2023 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants