-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make CI tests faster #1246
Make CI tests faster #1246
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice. I like all the changes to trim the tests. I had just a few small comments.
.github/workflows/main.yml
Outdated
.stestr/m* | ||
.stestr/n* | ||
.stestr/t* | ||
key: ${{ runner.os }}-${{ matrix.python-version }}-stestr-tests-${{ hashFiles('setup.py','requirements.txt','requirements-extras.txt','requirements-dev.txt','constraints.txt') }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to hash the same files as for the pip cache above. I would think that we want to hash the test files. I don't see any files being hashed in the qiskit Azure pipelines equivalent here. Maybe we should check with mtreinish. I am not sure how the cache is used by stestr (like what changes invalidate it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like Qiskit is caching the whole .stestr directory. @mtreinish had told me the cache get large so the number files above 0 should be trimmed, so the list of paths I listed out should be every .stestr file except for number files above 0. But I'm not sure if keeping a few latest runs would also be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numbered files in the .stestr
directory are the full result stream (in the subunit format) from the previous test runs. At the end of a test run it uses the next integer as the filename. For timing purposes there is a collection of times.dbm.*
files which actually contain the timing data from the most recent run, it's just a key value store of test id keys to floats of elapsed time for the value. For scheduling/balancing the times dbm files are all that's needed.
For terra's CI we wipe the numbered files to avoid the growth of the cache size: https://github.com/Qiskit/qiskit/blob/main/.azure/test-linux.yml#L137 before the cache action collects the files. But I think what we're doing in terra may not be working correctly (also honestly we should just change that line to stestr history remove
it predates the history
subcommand iirc) as things still appear to be running in alphabetical order in terra's ci.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a brief description of the repository directory's contents in the docs here: https://stestr.readthedocs.io/en/latest/MANUAL.html#repositories
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does caching .stestr
do that helps CI? Is it just that it saves writing some default files that happen to be slow to write in some CI systems? Should we not cache .stestr/0
? Or just cache .stestr
but run stestr history remove all
at the end before the cache is saved?
I had thought the cache might help make test discovery faster, but I think there is not a good way to do that (since even a test file that is unchanged could import a factory function that generates tests from a dependency that was updated since the previous run).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's for test scheduling, if there is timing data available then stestr will partition the tests across the parallel workers using that information to try and maximize throughput. The default scheduler sorts all the tests by their duration in the timing database and will then try to pack the test workers to balance the runtime evenly and minimize runtime.
If you've been running stestr locally you can experiment with this pretty easily if you do rm -rf .stestr && tox -epy && tox -epy
, you can observe the difference in the runtime between the first and second execution of the unit tests (and also see the order of execution should be different).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mtreinish, I didn't scroll down enough to see the deletion of numbered files. I tried caching only the times.* files, but I also needed to include format
or I'd get the error "The specified repository directory ./stestr already exists. Please check if the repository already exists or select a different path". I also updated the cache name to use the run number instead of the hash of requirement files the pip cache is using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I follow the documentation for the cache action and for run_number
correctly, I don't think you need to add run_number
to the cache key. The cache action documentation says that the cache is scoped to a branch and has read only access to the cache of the default branch. The run number is incremented every time you run a workflow for a branch (not counting re-runs). So if you push a new commit, the run number on a PR will increment, but I think you would still want the cache from the previous commit on the PR. With your restore keys, you will still be able to fall back to the last commit's cache, but I don't see a benefit to keying on the number. In either case, you will use the cache from the previous commit when you push and will reuse the current commit's cache if you re-run. There is not much downside to having the build number other than that it will keep extra old runs in the cache.
@@ -113,6 +97,7 @@ def test_nasty_data(self, freq, amp, offset, reps, betas, tol): | |||
|
|||
drag = RoughDrag([0], self.x_plus, betas=betas) | |||
drag.set_experiment_options(reps=reps) | |||
drag.set_run_options(shots=500) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should try to refactor MockIQBackend if reducing shots like this makes a difference. We are working with probabilities, so drawing shots from them shouldn't be too slow, but maybe the Result format is just inefficient (lots of nested dictionaries instead of numpy arrays)?
@@ -160,7 +160,7 @@ def test_experiment_config(self): | |||
|
|||
def test_roundtrip_serializable(self): | |||
"""Test round trip JSON serialization""" | |||
exp = QubitSpectroscopy([1], np.linspace(int(100e6), int(150e6), int(20e6))) | |||
exp = QubitSpectroscopy([1], np.linspace(int(100e6), int(150e6), 4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
20e6 🤣
@@ -270,7 +270,9 @@ def test_parallel_experiment(self): | |||
par_experiment = ParallelExperiment( | |||
exp_list, flatten_results=False, backend=parallel_backend | |||
) | |||
par_experiment.set_run_options(meas_level=MeasLevel.KERNELED, meas_return="single") | |||
par_experiment.set_run_options( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is fine, but I wonder why we test parallel experiments here. Ideally, each experiment works okay on its own and then there are specific tests for ParallelExperiment. I don't see why individual experiments need to test parallel execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right. @ItamarGoldman do you think this test is still necessary? Seems like we can remove it since parallel experiments are tested elsewhere.
Co-authored-by: Will Shanks <wshaos@posteo.net>
|
||
class QiskitExperimentsTestCase(QiskitTestCase): | ||
"""Qiskit Experiments specific extra functionality for test cases.""" | ||
|
||
def setUp(self): | ||
super().setUp() | ||
self.useFixture(fixtures.Timeout(TEST_TIMEOUT, gentle=True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried both gentle
options. False
will cause the tests to exit earlier when there's a timeout, but the message given is "The following tests exited without returning a status and likely segfaulted or crashed Python", which is cryptic. Since tests are relatively fast now, I think it's okay to use True
which will show the timeout exception on tests that take too long and not stop running tests upon failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the gentle=False
send SIGALRM
to the process without setting a handler which will kill the process by default. If the process exits before the test worker returns an event for it's final status stestr prints an error saying it never received a status for the test that executed (which is what a segfault looks like to stestr).
not having next-stream seems to cause errors
…eriments into make-tests-faster
instead of manually caching a subset of files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
because tox 4 doesn't allow sharing environments, the separate env to run stestr-clean was causing over a minute to be added to the CI.
This change removes the dependence on `QiskitTestCase`, replacing it with a direct dependence on `unittest.TestCase` and `testtools.TestCase`. As with `QiskitTestCase`, the ability to run the tests based either on `unittest.TestCase` or `testtools.TestCase` (a `unittest.TestCase` subclass) is preserved. For qiskit-experiments, the ability is actually restored because the timeout feature added in [qiskit-community#1246](qiskit-community#1246) had introduced a hard dependence on `testtools`. Specific changes: * Add `testtools` and `fixtures` to `requirements-dev.txt` as required test dependencies. * Use `QE_USE_TESTTOOLS` environment variable to control whether tests are based on `testtools.TestCase` rather than checking if `testtools` is installed. * Remove some checks for test writing best practices. `QiskitTestCase` used extra code to ensure that `setUp` and other test class methods always called their parents and that those methods are not called from individual tests. `testtools.TestCase` does these checks as well. Since qiskit-experiments always uses `testtools` in CI, it can rely on `testtools` for these checks and just not do them for the alternate `unittest` execution. * Generate `QiskitExperimentsTestCase` from a `create_base_test_case` function. This function allows the base test class to be generated based on either `testtools.TestCase` or `unittest.TestCase` so that the `unittest` variant can be tested for regressions even when the `testtools` variant is enabled.
This change removes the dependence on `QiskitTestCase`, replacing it with a direct dependence on `unittest.TestCase` and `testtools.TestCase`. As with `QiskitTestCase`, the ability to run the tests based either on `unittest.TestCase` or `testtools.TestCase` (a `unittest.TestCase` subclass) is preserved. For qiskit-experiments, the ability is actually restored because the timeout feature added in [#1246](#1246) had introduced a hard dependence on `testtools`. Specific changes: * Add `testtools` and `fixtures` to `requirements-dev.txt` as required test dependencies. * Use `QE_USE_TESTTOOLS` environment variable to control whether tests are based on `testtools.TestCase` rather than checking if `testtools` is installed. * Remove some checks for test writing best practices. `QiskitTestCase` used extra code to ensure that `setUp` and other test class methods always called their parents and that those methods are not called from individual tests. `testtools.TestCase` does these checks as well. Since qiskit-experiments always uses `testtools` in CI, it can rely on `testtools` for these checks and just not do them for the alternate `unittest` execution. * Generate `QiskitExperimentsTestCase` from a `create_base_test_case` function. This function allows the base test class to be generated based on either `testtools.TestCase` or `unittest.TestCase` so that the `unittest` variant can be tested for regressions even when the `testtools` variant is enabled. Closes [#1282](#1282).
### Summary This PR does a few things to make tests run faster: - [x] Only test on the lowest and highest supported python versions - [x] Set MacOS build options to be the same as the other OSes (and remove coverage) - [x] Add `.stestr` to the cache as suggested by @mtreinish. This doesn't seem to improve the Ubuntu and Windows runtimes but significantly improves MacOS's. - [x] Group tests in stestr by class name. This might avoid large parallelized tests being run on multiple workers simultaneously and slowing each test down. - [x] Fail a test automatically if it takes longer than 60 seconds (hopefully this can be shortened in the future, but for now it mostly prevents a very long test from being added) - [x] Shorten long tests by decreasing shots and generating smaller circuits where the size isn't relevant (such as the roundtrip serialization tests) - [x] Also fixes a bug in the DRAG experiment where integer `beta` values caused a serialization error. Test are currently 20-40 minutes for Windows/Ubuntu and 50+ minutes for MacOS. With this PR, all tests go down to ~10 minutes. --------- Co-authored-by: Will Shanks <wshaos@posteo.net>
This change removes the dependence on `QiskitTestCase`, replacing it with a direct dependence on `unittest.TestCase` and `testtools.TestCase`. As with `QiskitTestCase`, the ability to run the tests based either on `unittest.TestCase` or `testtools.TestCase` (a `unittest.TestCase` subclass) is preserved. For qiskit-experiments, the ability is actually restored because the timeout feature added in [qiskit-community#1246](qiskit-community#1246) had introduced a hard dependence on `testtools`. Specific changes: * Add `testtools` and `fixtures` to `requirements-dev.txt` as required test dependencies. * Use `QE_USE_TESTTOOLS` environment variable to control whether tests are based on `testtools.TestCase` rather than checking if `testtools` is installed. * Remove some checks for test writing best practices. `QiskitTestCase` used extra code to ensure that `setUp` and other test class methods always called their parents and that those methods are not called from individual tests. `testtools.TestCase` does these checks as well. Since qiskit-experiments always uses `testtools` in CI, it can rely on `testtools` for these checks and just not do them for the alternate `unittest` execution. * Generate `QiskitExperimentsTestCase` from a `create_base_test_case` function. This function allows the base test class to be generated based on either `testtools.TestCase` or `unittest.TestCase` so that the `unittest` variant can be tested for regressions even when the `testtools` variant is enabled. Closes [qiskit-community#1282](qiskit-community#1282).
Summary
This PR does a few things to make tests run faster:
.stestr
to the cache as suggested by @mtreinish. This doesn't seem to improve the Ubuntu and Windows runtimes but significantly improves MacOS's.beta
values caused a serialization error.Test are currently 20-40 minutes for Windows/Ubuntu and 50+ minutes for MacOS. With this PR, all tests go down to ~10 minutes.