Make CI tests faster #1246

coruscating · 2023-08-08T16:46:11Z

Summary

This PR does a few things to make tests run faster:

Only test on the lowest and highest supported python versions
Set MacOS build options to be the same as the other OSes (and remove coverage)
Add .stestr to the cache as suggested by @mtreinish. This doesn't seem to improve the Ubuntu and Windows runtimes but significantly improves MacOS's.
Group tests in stestr by class name. This might avoid large parallelized tests being run on multiple workers simultaneously and slowing each test down.
Fail a test automatically if it takes longer than 60 seconds (hopefully this can be shortened in the future, but for now it mostly prevents a very long test from being added)
Shorten long tests by decreasing shots and generating smaller circuits where the size isn't relevant (such as the roundtrip serialization tests)
Also fixes a bug in the DRAG experiment where integer beta values caused a serialization error.

Test are currently 20-40 minutes for Windows/Ubuntu and 50+ minutes for MacOS. With this PR, all tests go down to ~10 minutes.

wshanks

This is really nice. I like all the changes to trim the tests. I had just a few small comments.

wshanks · 2023-09-06T22:22:12Z

.github/workflows/main.yml

+            .stestr/m*
+            .stestr/n*
+            .stestr/t*
+          key: ${{ runner.os }}-${{ matrix.python-version }}-stestr-tests-${{ hashFiles('setup.py','requirements.txt','requirements-extras.txt','requirements-dev.txt','constraints.txt') }}


I don't think we want to hash the same files as for the pip cache above. I would think that we want to hash the test files. I don't see any files being hashed in the qiskit Azure pipelines equivalent here. Maybe we should check with mtreinish. I am not sure how the cache is used by stestr (like what changes invalidate it).

It looks like Qiskit is caching the whole .stestr directory. @mtreinish had told me the cache get large so the number files above 0 should be trimmed, so the list of paths I listed out should be every .stestr file except for number files above 0. But I'm not sure if keeping a few latest runs would also be useful.

The numbered files in the .stestr directory are the full result stream (in the subunit format) from the previous test runs. At the end of a test run it uses the next integer as the filename. For timing purposes there is a collection of times.dbm.* files which actually contain the timing data from the most recent run, it's just a key value store of test id keys to floats of elapsed time for the value. For scheduling/balancing the times dbm files are all that's needed.

For terra's CI we wipe the numbered files to avoid the growth of the cache size: https://github.com/Qiskit/qiskit/blob/main/.azure/test-linux.yml#L137 before the cache action collects the files. But I think what we're doing in terra may not be working correctly (also honestly we should just change that line to stestr history remove it predates the history subcommand iirc) as things still appear to be running in alphabetical order in terra's ci.

There is a brief description of the repository directory's contents in the docs here: https://stestr.readthedocs.io/en/latest/MANUAL.html#repositories

What does caching .stestr do that helps CI? Is it just that it saves writing some default files that happen to be slow to write in some CI systems? Should we not cache .stestr/0? Or just cache .stestr but run stestr history remove all at the end before the cache is saved?

I had thought the cache might help make test discovery faster, but I think there is not a good way to do that (since even a test file that is unchanged could import a factory function that generates tests from a dependency that was updated since the previous run).

It's for test scheduling, if there is timing data available then stestr will partition the tests across the parallel workers using that information to try and maximize throughput. The default scheduler sorts all the tests by their duration in the timing database and will then try to pack the test workers to balance the runtime evenly and minimize runtime.

If you've been running stestr locally you can experiment with this pretty easily if you do rm -rf .stestr && tox -epy && tox -epy, you can observe the difference in the runtime between the first and second execution of the unit tests (and also see the order of execution should be different).

Thanks @mtreinish, I didn't scroll down enough to see the deletion of numbered files. I tried caching only the times.* files, but I also needed to include format or I'd get the error "The specified repository directory ./stestr already exists. Please check if the repository already exists or select a different path". I also updated the cache name to use the run number instead of the hash of requirement files the pip cache is using.

If I follow the documentation for the cache action and for run_number correctly, I don't think you need to add run_number to the cache key. The cache action documentation says that the cache is scoped to a branch and has read only access to the cache of the default branch. The run number is incremented every time you run a workflow for a branch (not counting re-runs). So if you push a new commit, the run number on a PR will increment, but I think you would still want the cache from the previous commit on the PR. With your restore keys, you will still be able to fall back to the last commit's cache, but I don't see a benefit to keying on the number. In either case, you will use the cache from the previous commit when you push and will reuse the current commit's cache if you re-run. There is not much downside to having the build number other than that it will keep extra old runs in the cache.

test/library/calibration/test_drag.py

wshanks · 2023-09-06T22:32:03Z

test/library/calibration/test_drag.py

@@ -113,6 +97,7 @@ def test_nasty_data(self, freq, amp, offset, reps, betas, tol):

        drag = RoughDrag([0], self.x_plus, betas=betas)
        drag.set_experiment_options(reps=reps)
+        drag.set_run_options(shots=500)


I wonder if we should try to refactor MockIQBackend if reducing shots like this makes a difference. We are working with probabilities, so drawing shots from them shouldn't be too slow, but maybe the Result format is just inefficient (lots of nested dictionaries instead of numpy arrays)?

wshanks · 2023-09-06T23:58:53Z

test/library/characterization/test_qubit_spectroscopy.py

@@ -160,7 +160,7 @@ def test_experiment_config(self):

    def test_roundtrip_serializable(self):
        """Test round trip JSON serialization"""
-        exp = QubitSpectroscopy([1], np.linspace(int(100e6), int(150e6), int(20e6)))
+        exp = QubitSpectroscopy([1], np.linspace(int(100e6), int(150e6), 4))


wshanks · 2023-09-07T00:00:55Z

test/library/characterization/test_qubit_spectroscopy.py

@@ -270,7 +270,9 @@ def test_parallel_experiment(self):
        par_experiment = ParallelExperiment(
            exp_list, flatten_results=False, backend=parallel_backend
        )
-        par_experiment.set_run_options(meas_level=MeasLevel.KERNELED, meas_return="single")
+        par_experiment.set_run_options(


This change is fine, but I wonder why we test parallel experiments here. Ideally, each experiment works okay on its own and then there are specific tests for ParallelExperiment. I don't see why individual experiments need to test parallel execution.

I think you're right. @ItamarGoldman do you think this test is still necessary? Seems like we can remove it since parallel experiments are tested elsewhere.

tox.ini

Co-authored-by: Will Shanks <wshaos@posteo.net>

coruscating · 2023-09-07T15:37:38Z

test/base.py


 class QiskitExperimentsTestCase(QiskitTestCase):
    """Qiskit Experiments specific extra functionality for test cases."""

+    def setUp(self):
+        super().setUp()
+        self.useFixture(fixtures.Timeout(TEST_TIMEOUT, gentle=True))


I tried both gentle options. False will cause the tests to exit earlier when there's a timeout, but the message given is "The following tests exited without returning a status and likely segfaulted or crashed Python", which is cryptic. Since tests are relatively fast now, I think it's okay to use True which will show the timeout exception on tests that take too long and not stop running tests upon failure.

Yeah the gentle=False send SIGALRM to the process without setting a handler which will kill the process by default. If the process exits before the test worker returns an event for it's final status stestr prints an error saying it never received a status for the test that executed (which is what a segfault looks like to stestr).

not having next-stream seems to cause errors

…eriments into make-tests-faster

instead of manually caching a subset of files

wshanks

Looks good!

because tox 4 doesn't allow sharing environments, the separate env to run stestr-clean was causing over a minute to be added to the CI.

This change removes the dependence on `QiskitTestCase`, replacing it with a direct dependence on `unittest.TestCase` and `testtools.TestCase`. As with `QiskitTestCase`, the ability to run the tests based either on `unittest.TestCase` or `testtools.TestCase` (a `unittest.TestCase` subclass) is preserved. For qiskit-experiments, the ability is actually restored because the timeout feature added in [qiskit-community#1246](qiskit-community#1246) had introduced a hard dependence on `testtools`. Specific changes: * Add `testtools` and `fixtures` to `requirements-dev.txt` as required test dependencies. * Use `QE_USE_TESTTOOLS` environment variable to control whether tests are based on `testtools.TestCase` rather than checking if `testtools` is installed. * Remove some checks for test writing best practices. `QiskitTestCase` used extra code to ensure that `setUp` and other test class methods always called their parents and that those methods are not called from individual tests. `testtools.TestCase` does these checks as well. Since qiskit-experiments always uses `testtools` in CI, it can rely on `testtools` for these checks and just not do them for the alternate `unittest` execution. * Generate `QiskitExperimentsTestCase` from a `create_base_test_case` function. This function allows the base test class to be generated based on either `testtools.TestCase` or `unittest.TestCase` so that the `unittest` variant can be tested for regressions even when the `testtools` variant is enabled.

This change removes the dependence on `QiskitTestCase`, replacing it with a direct dependence on `unittest.TestCase` and `testtools.TestCase`. As with `QiskitTestCase`, the ability to run the tests based either on `unittest.TestCase` or `testtools.TestCase` (a `unittest.TestCase` subclass) is preserved. For qiskit-experiments, the ability is actually restored because the timeout feature added in [#1246](#1246) had introduced a hard dependence on `testtools`. Specific changes: * Add `testtools` and `fixtures` to `requirements-dev.txt` as required test dependencies. * Use `QE_USE_TESTTOOLS` environment variable to control whether tests are based on `testtools.TestCase` rather than checking if `testtools` is installed. * Remove some checks for test writing best practices. `QiskitTestCase` used extra code to ensure that `setUp` and other test class methods always called their parents and that those methods are not called from individual tests. `testtools.TestCase` does these checks as well. Since qiskit-experiments always uses `testtools` in CI, it can rely on `testtools` for these checks and just not do them for the alternate `unittest` execution. * Generate `QiskitExperimentsTestCase` from a `create_base_test_case` function. This function allows the base test class to be generated based on either `testtools.TestCase` or `unittest.TestCase` so that the `unittest` variant can be tested for regressions even when the `testtools` variant is enabled. Closes [#1282](#1282).

@mtreinish

### Summary This PR does a few things to make tests run faster: - [x] Only test on the lowest and highest supported python versions - [x] Set MacOS build options to be the same as the other OSes (and remove coverage) - [x] Add `.stestr` to the cache as suggested by @mtreinish. This doesn't seem to improve the Ubuntu and Windows runtimes but significantly improves MacOS's. - [x] Group tests in stestr by class name. This might avoid large parallelized tests being run on multiple workers simultaneously and slowing each test down. - [x] Fail a test automatically if it takes longer than 60 seconds (hopefully this can be shortened in the future, but for now it mostly prevents a very long test from being added) - [x] Shorten long tests by decreasing shots and generating smaller circuits where the size isn't relevant (such as the roundtrip serialization tests) - [x] Also fixes a bug in the DRAG experiment where integer `beta` values caused a serialization error. Test are currently 20-40 minutes for Windows/Ubuntu and 50+ minutes for MacOS. With this PR, all tests go down to ~10 minutes. --------- Co-authored-by: Will Shanks <wshaos@posteo.net>

This change removes the dependence on `QiskitTestCase`, replacing it with a direct dependence on `unittest.TestCase` and `testtools.TestCase`. As with `QiskitTestCase`, the ability to run the tests based either on `unittest.TestCase` or `testtools.TestCase` (a `unittest.TestCase` subclass) is preserved. For qiskit-experiments, the ability is actually restored because the timeout feature added in [qiskit-community#1246](qiskit-community#1246) had introduced a hard dependence on `testtools`. Specific changes: * Add `testtools` and `fixtures` to `requirements-dev.txt` as required test dependencies. * Use `QE_USE_TESTTOOLS` environment variable to control whether tests are based on `testtools.TestCase` rather than checking if `testtools` is installed. * Remove some checks for test writing best practices. `QiskitTestCase` used extra code to ensure that `setUp` and other test class methods always called their parents and that those methods are not called from individual tests. `testtools.TestCase` does these checks as well. Since qiskit-experiments always uses `testtools` in CI, it can rely on `testtools` for these checks and just not do them for the alternate `unittest` execution. * Generate `QiskitExperimentsTestCase` from a `create_base_test_case` function. This function allows the base test class to be generated based on either `testtools.TestCase` or `unittest.TestCase` so that the `unittest` variant can be tested for regressions even when the `testtools` variant is enabled. Closes [qiskit-community#1282](qiskit-community#1282).

coruscating added 17 commits August 8, 2023 11:26

remove coveralls and intermediate python versions

289f6e7

add stestr cache

332d5cb

fix indent

b7caef6

fix workflow file

61f54e8

try reintroducing coverage

1912610

add test timeout

d1e5e59

increase timeout and shorten test

34a4bf8

Merge remote-tracking branch 'upstream/main' into make-tests-faster

ce5a7cc

increase timeout and shorten tests

b0b391a

lint

4e4a960

increase shots for CRHam test

27f1f70

shorten serialization tests

6d7c3bc

adjust test params

83df5a0

try grouping tests

930739f

refactor drag end to end test

746a96c

try setting macos workflow same to other OSes

cc2a362

increase timeout

560cd72

coruscating marked this pull request as ready for review September 6, 2023 21:50

wshanks requested changes Sep 7, 2023

View reviewed changes

coruscating and others added 4 commits September 7, 2023 10:11

fixed bad test parameter

9a08d44

Co-authored-by: Will Shanks <wshaos@posteo.net>

move stestr config

16b9d92

add seed to interleaved RB test

25ca37c

add contributor note and update timeout behavior

5118985

coruscating commented Sep 7, 2023

View reviewed changes

coruscating and others added 6 commits September 7, 2023 15:48

update stestr cache

59aecd9

fix stestr cache settings

164413a

Merge branch 'main' into make-tests-faster

2c02209

fix typo and add file to cache

e6d1f4d

not having next-stream seems to cause errors

Merge branch 'make-tests-faster' of github.com:coruscating/qiskit-exp…

9021092

…eriments into make-tests-faster

add tox env for cleaning stestr cache

0ef0745

instead of manually caching a subset of files

update cache key

6757ea7

wshanks approved these changes Sep 8, 2023

View reviewed changes

wshanks added this pull request to the merge queue Sep 8, 2023

move stestr-clean to github workflow

ddb9248

because tox 4 doesn't allow sharing environments, the separate env to run stestr-clean was causing over a minute to be added to the CI.

coruscating removed this pull request from the merge queue due to a manual request Sep 8, 2023

coruscating added this pull request to the merge queue Sep 8, 2023

Merged via the queue into qiskit-community:main with commit f2e5d8b Sep 8, 2023

coruscating deleted the make-tests-faster branch September 8, 2023 22:02

coruscating mentioned this pull request Sep 13, 2023

Investigate MockIQBackend performance #1269

Open

wshanks mentioned this pull request Nov 2, 2023

Remove usage of QiskitTestCase #1285

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make CI tests faster #1246

Make CI tests faster #1246

coruscating commented Aug 8, 2023 •

edited

Loading

wshanks left a comment

wshanks Sep 6, 2023

coruscating Sep 7, 2023

mtreinish Sep 7, 2023

mtreinish Sep 7, 2023

wshanks Sep 7, 2023

mtreinish Sep 7, 2023

coruscating Sep 8, 2023

wshanks Sep 8, 2023

wshanks Sep 6, 2023

wshanks Sep 6, 2023

wshanks Sep 7, 2023

coruscating Sep 7, 2023

coruscating Sep 7, 2023

mtreinish Sep 7, 2023

wshanks left a comment

Make CI tests faster #1246

Make CI tests faster #1246

Conversation

coruscating commented Aug 8, 2023 • edited Loading

Summary

wshanks left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wshanks left a comment

Choose a reason for hiding this comment

coruscating commented Aug 8, 2023 •

edited

Loading