Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize tests #651

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from
Draft

Parallelize tests #651

wants to merge 29 commits into from

Conversation

dan-mm
Copy link
Contributor

@dan-mm dan-mm commented Jan 12, 2024

Not ready yet, making PR to test in VM

…ements; updated tests to account for parallel_id in container names
…orker session; more cleanup functions editing so parallel tests can run; made test_jobs sequential (only way to ensure they are run on the same worker)
…d for ensuring test_jobs.py are in the same group; updated documentation and pytest invocation command
@ArneTR
Copy link
Member

ArneTR commented Jan 14, 2024

I do understand it is not ready yet. Two questions still:

  • Can you make a draft PR in the future for these? I think it is even more clear to outsiders and us internally what the state of the PR is then. Looks like this: Monitor mode #630
  • The implementation route is now so that the runner.py is changed. Wasn't the idea to patch the usage_scenario and variables of the runner at runtime? Or does the PR not reflect this discussion we had yet?

@dan-mm dan-mm linked an issue Jan 15, 2024 that may be closed by this pull request
- base test/tmp directory remains, only subfolders are cleaned
- updated run-tests script with new pytest run commands
- added dummy metric providers for tests (same as cpuprocfs provider, but skips base.py check_system as that causes collisions during paralleizations)
- temporarily turn off test_volume_loading::test_volume_loading_subdirectories_subdir2 test as that is failing for unknown reasons
- added serial pytest mark for tests that cannot be parallelized (test_jobs)
- removed now unnecessary jobs table cleanup in test_jobs
- removed duplicate test in test_jobs
@dan-mm
Copy link
Contributor Author

dan-mm commented Jan 18, 2024

  • Can you make a draft PR in the future for these? I think it is even more clear to outsiders and us internally what the state of the PR is then. Looks like this: Monitor mode #630

Aye, I can make drafts for the future

  • The implementation route is now so that the runner.py is changed. Wasn't the idea to patch the usage_scenario and variables of the runner at runtime? Or does the PR not reflect this discussion we had yet?

I think there was a miscommunication here, I thought we had agreed that the runner.py implementation was to be changed? Patching the usage_scenario will not be enough, as we also need to change the /tmp/green-metrics-tool folder that gets used to be unique.

The discussion regarding patching the usage_scenario was in response to a checking I was doing against the container names from the flow section to match against the parallel_id. You pointed out that patching the object at runtime would make more sense than regex matching, but I thought this was still in context of within the runner.py - which is what I've implemented.

We also had a discussion about patching the usage_scenario directly regarding this ticket: #653, but that was entirely unrelated

…ctions to helper files; test_function's setup_runner now parallizes loaded yml files;
…h parallelization correctly; gmt tmp folders now have parallel id pre-pended, not appended; fix issue with yml dumping during parallelization where key order was not preserved; better default parameters for setup_runner; added flags for setup_runner to fine-grain how it is used
…ctory; - isntead of _ for parallel id; depends_on now properly writes yaml in test parallelization; setup_runner doesn't override uri if its passed in; yml_parsing tests serialized;
…ity officially destroyed); update reference to green-coding-solutions in test_jobs
…uld have been; setup_runner now properly runs with no_build flag by default; cleaned up calls to setup_runner; build gcb_stress_gmt_run_tmp image before test runs; revert _tmp_image_name parallelization (unneeded);
Copy link

github-actions bot commented Feb 16, 2024

Old Energy Estimation

Eco-CI Output:

Label 🖥 avg. CPU utilization [%] 🔋 Total Energy [Joules] 🔌 avg. Power [Watts] Duration [Seconds]
Total Run 28.6846 1033.11 3.79818 280
Measurement #1 29.2145 1033.11 3.79818 274

📈 Energy graph:

 
 7.74 ┤                                                                                                                 ╭╮
 7.14 ┤                                                                                                ╭╮               ││
 6.55 ┤                                                                                               ╭╯│          ╭╮   ││
 5.95 ┤                                                                                               │ │          ││   ││
 5.35 ┤                                                                                               │ │         ╭╯╰─╮ │╰──╮            ╭─╮      ╭╮             ╭╮                                              ╭╮
 4.75 ┤                       ╭╮                                                                      │ ╰╮ ╭─╮╭╮  │   │ │   │            │ ╰╮   ╭─╯│╭─╮╭─╮╭╮ ╭─╮ │╰╮╭─────╮     ╭╮                       ╭──╮╭──╮││╭╮ ╭╮
 4.16 ┤     ╭─╮    ╭─╮ ╭─╮   ╭╯╰╮   ╭╮    ╭╮  ╭╮          ╭╮╭╮  ╭─╮                     ╭╮        ╭╮  │  ╰─╯ ╰╯│ ╭╯   ╰─╯   │           ╭╯  ╰─╮╭╯  ╰╯ ╰╯ ╰╯╰─╯ ╰─╯ ╰╯     ╰─────╯╰─╮      ╭╮  ╭╮ ╭─╮   ╭─╯  ╰╯  ╰╯╰╯╰─╯╰╮╭───╮  ╭╮  ╭╮      ╭╮      ╭╮╭╮ ╭╮                          ╭
 3.56 ┤     │ ╰────╯ ╰─╯ ╰───╯  ╰───╯╰────╯╰──╯╰──────────╯╰╯╰╮╭╯ ╰─────────────────────╯╰────────╯╰──╯        ╰╮│          │       ╭╮ ╭╯     ╰╯                                   ╰──────╯╰──╯╰─╯ │ ╭─╯                ╰╯   ╰──╯╰──╯╰──────╯╰──────╯╰╯╰─╯╰─╮         ╭╮         ╭╮ ╭╯
 2.96 ┤     │                                                 ╰╯                                                ││          │       ││ │                                                           │ │                                                      │        ╭╯│         ││ │
 2.37 ┤    ╭╯                                                                                                   ╰╯          │       ││ │                                                           │ │                                                      │        │ ╰╮        ││ │
 1.77 ┼────╯                                                                                                                ╰───────╯╰─╯                                                           ╰─╯                                                      ╰────────╯  ╰────────╯╰─╯
                                                                                                                                       Watts over time

Copy link

github-actions bot commented Feb 16, 2024

Old Energy Estimation

Eco-CI Output:

Label 🖥 avg. CPU utilization [%] 🔋 Total Energy [Joules] 🔌 avg. Power [Watts] Duration [Seconds]
Total Run 28.2288 985.703 3.74792 270
Measurement #1 28.7783 985.703 3.74792 264

📈 Energy graph:

 
 7.69 ┤                                                                                  ╭╮                 ╭╮
 7.10 ┤                                                                                  ││                 ││
 6.51 ┤                                                                                  ││            ╭╮   ││
 5.91 ┤                                                                                  │╰╮           ││   ││╭╮
 5.32 ┤            ╭╮                                                                    │ │          ╭╯╰─╮ │╰╯│            ╭╮       ╭╮  ╭╮               ╭╮╭─╮
 4.73 ┤            │╰╮   ╭╮                                                ╭╮            │ │ ╭─╮╭─╮  ╭╯   │ │  │           ╭╯╰─╮    ╭╯╰──╯│╭╮╭─╮╭─╮   ╭╮╭─╯││ ╰╮    ╭╮                      ╭────╮╭──╮╭╮╭╮
 4.14 ┤          ╭─╯ ╰╮ ╭╯╰───╮ ╭─╮   ╭─╮   ╭╮╭╮   ╭─╮              ╭╮     ││       ╭─╮ ╭╯ ╰─╯ ╰╯ │  │    ╰─╯  ╰╮         ╭╯   │╭╮╭─╯     ╰╯╰╯ ╰╯ ╰─╮╭╯╰╯  ╰╯  ╰────╯╰╮╭╮  ╭╮╭─╮╭╮╭╮  ╭╮   ╭╯    ╰╯  ╰╯╰╯╰╮╭─╮╭─╮ ╭╮  ╭╮ ╭─╮ ╭─╮ ╭─╮ ╭──╮╭╮  ╭╮ ╭╮                          ╭
 3.55 ┤        ╭─╯    ╰─╯     ╰─╯ ╰──╮│ ╰───╯╰╯│   │ ╰──────────────╯╰─────╯╰───────╯ ╰─╯         ╰╮ │          │       ╭╮│    ╰╯╰╯                 ╰╯                ╰╯╰──╯╰╯ ╰╯╰╯╰──╯╰╮ ╭╯              ╰╯ ╰╯ ╰─╯╰╮╭╯╰╮│ ╰─╯ ╰─╯ ╰─╯  ╰╯╰╮╭╯╰─╯╰╮          ╭╮         ╭╮ ╭╯
 2.95 ┤     ╭─╮│                     ││        ╰╮╭╮│                                               │ │          │       │││                                                             │ │                         ╰╯  ╰╯                 ││     ╰╮        ╭╯│         ││ │
 2.36 ┤     │ ││                     ╰╯         ││││                                               │ │          │       │╰╯                                                             │ │                                                ╰╯      │       ╭╯ ╰╮        ││ │
 1.77 ┼─────╯ ╰╯                                ╰╯╰╯                                               ╰─╯          ╰───────╯                                                               ╰─╯                                                        ╰───────╯   ╰────────╯╰─╯
                                                                                                                                   Watts over time

@dan-mm
Copy link
Contributor Author

dan-mm commented Feb 16, 2024

@ArneTR - ready for review!

@ArneTR
Copy link
Member

ArneTR commented Feb 16, 2024

  • Can you give me a summary for that commit / design description
  • Also: Can you tell me how many tests are parallelized in the end?
    • I see that the test time on Github Actions has gone down from 7 Minutes to 5.30 minutes. Is that what you are seeing on local too? Is that what you expect?

@dan-mm
Copy link
Contributor Author

dan-mm commented Feb 16, 2024

Summary:

  • test_functions does the setup_runner differently now: it parses through the usage_scenario and docker compose files that are passed in, and appends a parallel_id (usually a random 12 digit number) to any container names and networks

  • after creating a Runner object, it also edits its _tmp_folder and _folder to also be unique (tmp/gmt_tests-{parallel_id}/green-metrics-tool/")

  • not all tests were properly using dev_no_build as that wasn't the default in setup_runner (this was the root cause of the docker issue I was seeing). dev_no_build is now the default

  • tests that need a uri now more uniformly create (and cleanup) a tmp folder in the tests subdirectory to use as its project folder

  • most tests had to be updated to properly check for the parallel_id in their expectations and call the setup_runner correctly

  • the base gcb_stress_gmt_tmp_image is now created before any tests are run in conftest.py

  • similarly the conftest.py cleanup step is a bit more thorough (getting rid of anything under tests/tmp and clearing db, and done once after all tests are finished)

  • I have added a pytest marker (pytest.mark.serial) to mark any tests that need to be run in serial. I've added documentation in the tests/Readme regarding this and running the tests now.

  • Custom yaml loader moved from inline definition in runner.py into yml_helper.py file

  • similarly, join_paths also moved to utils.py

  • I was reusing these functions to the the usage_scenario yaml loading/parsing during setup_runner

As for the reduction, it is what I expect. These are number and times of the kinds of tests:

Parallel: 57 tests; time: 0:01:23
Serial: 15 tests; time: 0:01:01

so of that 5.30, 2.24 minutes test time, 3,06 workflow overhead

From the last main branch workflow, 72 tests; 0:03:41 test time and 3,32 workflow overhead. Which is almost 50% reduction in the test time itself.

It is what I expect as I see a similar percentage reduction in local:

Serialized: 0:07:16 total test time
With parallelization (+ the 15 serial): 0:02:11 + 0:02:09, 4:20 total test time

However.... I have to take back that this PR is ready. When I was running the tests locally to benchmark these times some more I'm seeing some sporadic failures still. It didn't show up when I tested earlier today during pushing, or in the PR tests, but they're there. As best as I can tell right now I think some cleanups are happening early, but I'm not sure yet. At least one more commit is incoming.

Copy link

Eco-CI Output:

Label 🖥 avg. CPU utilization [%] 🔋 Total Energy [Joules] 🔌 avg. Power [Watts] Duration [Seconds]
Total Run 27.9268 1106.18 3.73709 304
Measurement #1 28.3768 1106.18 3.73709 298

📈 Energy graph:

 
 7.78 ┤                                                                                                                       ╭╮                ╭╮
 7.18 ┤                                                                                                                       ││                ││
 6.58 ┤                                                                                                                       ││           ╭╮   ││
 5.98 ┤                                                                                                                       ││           ││   ││╭╮
 5.38 ┤                                                                                                                       │╰╮         ╭╯╰─╮ │╰╯│            ╭─╮     ╭╮                     ╭─╮
 4.77 ┤      ╭╮                                                                                                              ╭╯ │ ╭─╮╭─╮ ╭╯   │╭╯  ╰╮          ╭╯ ╰╮╭╮╭─╯╰───╮╭╮╭─╮╭─╮╭╮╭─────╮│ ╰╮    ╭╮                      ╭────────╮╭╮╭╮ ╭╮
 4.17 ┤     ╭╯│     ╭─╮ ╭─╮  ╭─╮╭─╮                                ╭╮     ╭─╮                                  ╭─╮           │  ╰─╯ ╰╯ │ │    ╰╯    │         ╭╯   ││││      ╰╯╰╯ ╰╯ ╰╯││     ╰╯  ╰────╯╰─╮              ╭╮   ╭╯        ╰╯╰╯╰─╯╰──╮ ╭╮      ╭╮      ╭╮  ╭╮    ╭╮ ╭╮                          ╭
 3.57 ┤     │ ╰─────╯ ╰─╯ ╰──╯ ╰╯ ╰────────────────────────────────╯╰─╮╭─╮│ ╰────────╮           ╭─────────────╯ ╰───────────╯         ╰╮│          │       ╭╮│    ╰╯╰╯                ╰╯                 ╰──────────────╯╰╮ ╭╯                   ╰─╯╰──────╯╰──────╯╰──╯╰───╮│╰─╯╰─╮        ╭─╮         ╭╮ ╭╯
 2.97 ┤     │                                                         ╰╯ ││          │           │                                      ││          │       │││                                                            │ │                                               ╰╯     │        │ │         ││ │
 2.37 ┤    ╭╯                                                            ╰╯          │           │                                      ││          │       │╰╯                                                            │ │                                                      │        │ │╭╮       │╰╮│
 1.77 ┼────╯                                                                         ╰───────────╯                                      ╰╯          ╰───────╯                                                              ╰─╯                                                      ╰────────╯ ╰╯╰───────╯ ╰╯
                                                                                                                                                   Watts over time

@dan-mm
Copy link
Contributor Author

dan-mm commented Feb 16, 2024

Ok, my local failures had to do with my system, I had some test-containers still living in the background. I would still want to test more before merging, I'll run some more local sanity checks tomorrow.

@ArneTR ArneTR requested a review from ribalba February 19, 2024 09:11
@ArneTR
Copy link
Member

ArneTR commented Feb 19, 2024

@ribalba Please give it a review and validate if the approach is feasible and satisfies the complexity for the expected gains. Then I would like to plan further steps on this

@ArneTR
Copy link
Member

ArneTR commented Mar 4, 2024

I wil make this PR as a draft to have it as concept in the backlog.

The idea and general implemementation of the parallelization is nice but the PR has some issues:

  • Currently it is experiencing unknown race conditions
  • many additional refactorings and small changes are added which are unclear if they might be causal for the breaking functionality or just refactorings
  • Time gain is not as big as expected due to many synchronous downgrades

@ArneTR ArneTR marked this pull request as draft March 4, 2024 08:29
@ribalba ribalba removed their request for review March 4, 2024 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add parallel testing
2 participants