Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make CI tests faster #1246

Merged
merged 29 commits into from
Sep 8, 2023
Merged

Conversation

coruscating
Copy link
Collaborator

@coruscating coruscating commented Aug 8, 2023

Summary

This PR does a few things to make tests run faster:

  • Only test on the lowest and highest supported python versions
  • Set MacOS build options to be the same as the other OSes (and remove coverage)
  • Add .stestr to the cache as suggested by @mtreinish. This doesn't seem to improve the Ubuntu and Windows runtimes but significantly improves MacOS's.
  • Group tests in stestr by class name. This might avoid large parallelized tests being run on multiple workers simultaneously and slowing each test down.
  • Fail a test automatically if it takes longer than 60 seconds (hopefully this can be shortened in the future, but for now it mostly prevents a very long test from being added)
  • Shorten long tests by decreasing shots and generating smaller circuits where the size isn't relevant (such as the roundtrip serialization tests)
  • Also fixes a bug in the DRAG experiment where integer beta values caused a serialization error.

Test are currently 20-40 minutes for Windows/Ubuntu and 50+ minutes for MacOS. With this PR, all tests go down to ~10 minutes.

@coruscating coruscating marked this pull request as ready for review September 6, 2023 21:50
Copy link
Collaborator

@wshanks wshanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice. I like all the changes to trim the tests. I had just a few small comments.

.stestr/m*
.stestr/n*
.stestr/t*
key: ${{ runner.os }}-${{ matrix.python-version }}-stestr-tests-${{ hashFiles('setup.py','requirements.txt','requirements-extras.txt','requirements-dev.txt','constraints.txt') }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to hash the same files as for the pip cache above. I would think that we want to hash the test files. I don't see any files being hashed in the qiskit Azure pipelines equivalent here. Maybe we should check with mtreinish. I am not sure how the cache is used by stestr (like what changes invalidate it).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like Qiskit is caching the whole .stestr directory. @mtreinish had told me the cache get large so the number files above 0 should be trimmed, so the list of paths I listed out should be every .stestr file except for number files above 0. But I'm not sure if keeping a few latest runs would also be useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numbered files in the .stestr directory are the full result stream (in the subunit format) from the previous test runs. At the end of a test run it uses the next integer as the filename. For timing purposes there is a collection of times.dbm.* files which actually contain the timing data from the most recent run, it's just a key value store of test id keys to floats of elapsed time for the value. For scheduling/balancing the times dbm files are all that's needed.

For terra's CI we wipe the numbered files to avoid the growth of the cache size: https://github.com/Qiskit/qiskit/blob/main/.azure/test-linux.yml#L137 before the cache action collects the files. But I think what we're doing in terra may not be working correctly (also honestly we should just change that line to stestr history remove it predates the history subcommand iirc) as things still appear to be running in alphabetical order in terra's ci.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a brief description of the repository directory's contents in the docs here: https://stestr.readthedocs.io/en/latest/MANUAL.html#repositories

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does caching .stestr do that helps CI? Is it just that it saves writing some default files that happen to be slow to write in some CI systems? Should we not cache .stestr/0? Or just cache .stestr but run stestr history remove all at the end before the cache is saved?

I had thought the cache might help make test discovery faster, but I think there is not a good way to do that (since even a test file that is unchanged could import a factory function that generates tests from a dependency that was updated since the previous run).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for test scheduling, if there is timing data available then stestr will partition the tests across the parallel workers using that information to try and maximize throughput. The default scheduler sorts all the tests by their duration in the timing database and will then try to pack the test workers to balance the runtime evenly and minimize runtime.

If you've been running stestr locally you can experiment with this pretty easily if you do rm -rf .stestr && tox -epy && tox -epy, you can observe the difference in the runtime between the first and second execution of the unit tests (and also see the order of execution should be different).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mtreinish, I didn't scroll down enough to see the deletion of numbered files. I tried caching only the times.* files, but I also needed to include format or I'd get the error "The specified repository directory ./stestr already exists. Please check if the repository already exists or select a different path". I also updated the cache name to use the run number instead of the hash of requirement files the pip cache is using.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I follow the documentation for the cache action and for run_number correctly, I don't think you need to add run_number to the cache key. The cache action documentation says that the cache is scoped to a branch and has read only access to the cache of the default branch. The run number is incremented every time you run a workflow for a branch (not counting re-runs). So if you push a new commit, the run number on a PR will increment, but I think you would still want the cache from the previous commit on the PR. With your restore keys, you will still be able to fall back to the last commit's cache, but I don't see a benefit to keying on the number. In either case, you will use the cache from the previous commit when you push and will reuse the current commit's cache if you re-run. There is not much downside to having the build number other than that it will keep extra old runs in the cache.

test/library/calibration/test_drag.py Outdated Show resolved Hide resolved
@@ -113,6 +97,7 @@ def test_nasty_data(self, freq, amp, offset, reps, betas, tol):

drag = RoughDrag([0], self.x_plus, betas=betas)
drag.set_experiment_options(reps=reps)
drag.set_run_options(shots=500)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should try to refactor MockIQBackend if reducing shots like this makes a difference. We are working with probabilities, so drawing shots from them shouldn't be too slow, but maybe the Result format is just inefficient (lots of nested dictionaries instead of numpy arrays)?

@@ -160,7 +160,7 @@ def test_experiment_config(self):

def test_roundtrip_serializable(self):
"""Test round trip JSON serialization"""
exp = QubitSpectroscopy([1], np.linspace(int(100e6), int(150e6), int(20e6)))
exp = QubitSpectroscopy([1], np.linspace(int(100e6), int(150e6), 4))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

20e6 🤣

@@ -270,7 +270,9 @@ def test_parallel_experiment(self):
par_experiment = ParallelExperiment(
exp_list, flatten_results=False, backend=parallel_backend
)
par_experiment.set_run_options(meas_level=MeasLevel.KERNELED, meas_return="single")
par_experiment.set_run_options(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is fine, but I wonder why we test parallel experiments here. Ideally, each experiment works okay on its own and then there are specific tests for ParallelExperiment. I don't see why individual experiments need to test parallel execution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. @ItamarGoldman do you think this test is still necessary? Seems like we can remove it since parallel experiments are tested elsewhere.

tox.ini Outdated Show resolved Hide resolved

class QiskitExperimentsTestCase(QiskitTestCase):
"""Qiskit Experiments specific extra functionality for test cases."""

def setUp(self):
super().setUp()
self.useFixture(fixtures.Timeout(TEST_TIMEOUT, gentle=True))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried both gentle options. False will cause the tests to exit earlier when there's a timeout, but the message given is "The following tests exited without returning a status and likely segfaulted or crashed Python", which is cryptic. Since tests are relatively fast now, I think it's okay to use True which will show the timeout exception on tests that take too long and not stop running tests upon failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the gentle=False send SIGALRM to the process without setting a handler which will kill the process by default. If the process exits before the test worker returns an event for it's final status stestr prints an error saying it never received a status for the test that executed (which is what a segfault looks like to stestr).

Copy link
Collaborator

@wshanks wshanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@wshanks wshanks added this pull request to the merge queue Sep 8, 2023
because tox 4 doesn't allow sharing environments, the separate env to
run stestr-clean was causing over a minute to be added to the CI.
@coruscating coruscating removed this pull request from the merge queue due to a manual request Sep 8, 2023
@coruscating coruscating added this pull request to the merge queue Sep 8, 2023
Merged via the queue into qiskit-community:main with commit f2e5d8b Sep 8, 2023
@coruscating coruscating deleted the make-tests-faster branch September 8, 2023 22:02
wshanks added a commit to wshanks/qiskit-experiments that referenced this pull request Nov 2, 2023
This change removes the dependence on `QiskitTestCase`, replacing it
with a direct dependence on `unittest.TestCase` and
`testtools.TestCase`.

As with `QiskitTestCase`, the ability to run the tests based either on
`unittest.TestCase` or `testtools.TestCase` (a `unittest.TestCase`
subclass) is preserved. For qiskit-experiments, the ability is actually
restored because the timeout feature added in
[qiskit-community#1246](qiskit-community#1246)
had introduced a hard dependence on `testtools`.

Specific changes:

* Add `testtools` and `fixtures` to `requirements-dev.txt` as required
  test dependencies.
* Use `QE_USE_TESTTOOLS` environment variable to control whether tests
  are based on `testtools.TestCase` rather than checking if `testtools`
is installed.
* Remove some checks for test writing best practices. `QiskitTestCase`
  used extra code to ensure that `setUp` and other test class methods
always called their parents and that those methods are not called from
individual tests.  `testtools.TestCase` does these checks as well. Since
qiskit-experiments always uses `testtools` in CI, it can rely on
`testtools` for these checks and just not do them for the alternate
`unittest` execution.
* Generate `QiskitExperimentsTestCase` from a `create_base_test_case`
  function. This function allows the base test class to be generated
based on either `testtools.TestCase` or `unittest.TestCase` so that the
`unittest` variant can be tested for regressions even when the
`testtools` variant is enabled.
github-merge-queue bot pushed a commit that referenced this pull request Nov 10, 2023
This change removes the dependence on `QiskitTestCase`, replacing it
with a direct dependence on `unittest.TestCase` and
`testtools.TestCase`.

As with `QiskitTestCase`, the ability to run the tests based either on
`unittest.TestCase` or `testtools.TestCase` (a `unittest.TestCase`
subclass) is preserved. For qiskit-experiments, the ability is actually
restored because the timeout feature added in

[#1246](#1246)
had introduced a hard dependence on `testtools`.

Specific changes:

* Add `testtools` and `fixtures` to `requirements-dev.txt` as required
  test dependencies.
* Use `QE_USE_TESTTOOLS` environment variable to control whether tests
  are based on `testtools.TestCase` rather than checking if `testtools`
is installed.
* Remove some checks for test writing best practices. `QiskitTestCase`
  used extra code to ensure that `setUp` and other test class methods
always called their parents and that those methods are not called from
individual tests.  `testtools.TestCase` does these checks as well. Since
qiskit-experiments always uses `testtools` in CI, it can rely on
`testtools` for these checks and just not do them for the alternate
`unittest` execution.
* Generate `QiskitExperimentsTestCase` from a `create_base_test_case`
  function. This function allows the base test class to be generated
based on either `testtools.TestCase` or `unittest.TestCase` so that the
`unittest` variant can be tested for regressions even when the
`testtools` variant is enabled.

Closes
[#1282](#1282).
nkanazawa1989 pushed a commit to nkanazawa1989/qiskit-experiments that referenced this pull request Jan 10, 2024
### Summary

This PR does a few things to make tests run faster:

- [x] Only test on the lowest and highest supported python versions
- [x] Set MacOS build options to be the same as the other OSes (and
remove coverage)
- [x] Add `.stestr` to the cache as suggested by @mtreinish. This
doesn't seem to improve the Ubuntu and Windows runtimes but
significantly improves MacOS's.
- [x] Group tests in stestr by class name. This might avoid large
parallelized tests being run on multiple workers simultaneously and
slowing each test down.
- [x] Fail a test automatically if it takes longer than 60 seconds
(hopefully this can be shortened in the future, but for now it mostly
prevents a very long test from being added)
- [x] Shorten long tests by decreasing shots and generating smaller
circuits where the size isn't relevant (such as the roundtrip
serialization tests)
- [x] Also fixes a bug in the DRAG experiment where integer `beta`
values caused a serialization error.

Test are currently 20-40 minutes for Windows/Ubuntu and 50+ minutes for
MacOS. With this PR, all tests go down to ~10 minutes.

---------

Co-authored-by: Will Shanks <wshaos@posteo.net>
nkanazawa1989 pushed a commit to nkanazawa1989/qiskit-experiments that referenced this pull request Jan 17, 2024
This change removes the dependence on `QiskitTestCase`, replacing it
with a direct dependence on `unittest.TestCase` and
`testtools.TestCase`.

As with `QiskitTestCase`, the ability to run the tests based either on
`unittest.TestCase` or `testtools.TestCase` (a `unittest.TestCase`
subclass) is preserved. For qiskit-experiments, the ability is actually
restored because the timeout feature added in

[qiskit-community#1246](qiskit-community#1246)
had introduced a hard dependence on `testtools`.

Specific changes:

* Add `testtools` and `fixtures` to `requirements-dev.txt` as required
  test dependencies.
* Use `QE_USE_TESTTOOLS` environment variable to control whether tests
  are based on `testtools.TestCase` rather than checking if `testtools`
is installed.
* Remove some checks for test writing best practices. `QiskitTestCase`
  used extra code to ensure that `setUp` and other test class methods
always called their parents and that those methods are not called from
individual tests.  `testtools.TestCase` does these checks as well. Since
qiskit-experiments always uses `testtools` in CI, it can rely on
`testtools` for these checks and just not do them for the alternate
`unittest` execution.
* Generate `QiskitExperimentsTestCase` from a `create_base_test_case`
  function. This function allows the base test class to be generated
based on either `testtools.TestCase` or `unittest.TestCase` so that the
`unittest` variant can be tested for regressions even when the
`testtools` variant is enabled.

Closes
[qiskit-community#1282](qiskit-community#1282).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants