Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change random test shuffling technique #1908

Merged

Conversation

jbytheway
Copy link
Contributor

Description

Previously a random test ordering was obtained by applying std::shuffle to the tests in declaration order. This has two problems:

  • It depends on the declaration order, so the order in which the tests will be run will be platform-specific.
  • When trying to debug accidental inter-test dependencies, it is helpful to be able to find a minimal subset of tests which exhibits the issue. However, any change to the set of tests being run will completely change the test ordering, making it difficult or impossible to reduce the set of tests being run in any reasonably efficient manner.

Therefore, change the randomization approach to resolve both these issues.

Generate a random value based on the user-provided RNG seed. Convert every test case to an integer by hashing a combination of that value with the test name. Sort the test cases by this integer.

The test names and RNG are platform-independent, so this should be consistent across platforms. Also, removing one test does not change the integer value associated with the remaining tests, so they remain in the same order.

To hash, use the FNV-1a hash, except with the basis being our randomly selected value rather than the fixed basis set in the algorithm. Cannot use std::hash, because it is important that the result be platform-independent.

@jbytheway jbytheway force-pushed the use_more_helpful_random_test_ordering branch from c7cd0d5 to 293ad88 Compare April 10, 2020 14:03
@jbytheway
Copy link
Contributor Author

After reading some of the commit history in this area (specifically 535da5c) I realise that this is not entirely platform independent yet, because std::uniform_int_distribution may vary. But at least the linker no longer affects it; it's a step in the right direction.

@codecov
Copy link

codecov bot commented Apr 10, 2020

Codecov Report

Merging #1908 into master will increase coverage by 0.33%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1908      +/-   ##
==========================================
+ Coverage   86.49%   86.82%   +0.33%     
==========================================
  Files         131      131              
  Lines        3900     3922      +22     
==========================================
+ Hits         3373     3405      +32     
+ Misses        527      517      -10     

@jbytheway
Copy link
Contributor Author

I'd be happy to add a test for this, but after a brief glance at your testing infrastructure it wasn't clear to me where a test of command-line parameter like this belongs.

@horenmar
Copy link
Member

I like the general idea, but I will have to think about the approach a bit.

I have two misgivings right now:

  • calculating two hashes on every comparison can prove slow when used for codebase with significant number of test cases
  • I am not sure how to check how much randomness there will be in the order of the tests.

The first one should be easy enough to fix -- we can compute each hash once, sort those and then project the order back to test cases -- or we can benchmark say 100k test cases and see what it does.

The second one is obviously a bit harder and I will have to think about it first.


As to tests, generally everything possible should be tested via CTest. I do not see how to easily test this specific feature using just the facilities provided by CTest, so it will likely require a Python script that performs the actual test, and reports the result back.

The way I see it, the test should work something like

  1. Invoke the SelfTest executable with --list-tests --order rand --rng-seed <magic-value>, record the order of listed tests
  2. Invoke the SelfTest executable again, but limit the listing to a subset of tests, e.g. using [approx] and [matchers] tags. Check that the listed tests are in the same relative order as in step 1.
  3. Do this once again, with just a single tag, and check once again.

The script will also need some sanity checks to avoid false negatives. The ones I can think of on top of my head are that the number of tests returned in step 1 should more than step 2, and step 2 should return more tests than step 3, and that there should always be at least 2 returned tests.

@horenmar horenmar added the Tweak label Apr 10, 2020
@jbytheway jbytheway force-pushed the use_more_helpful_random_test_ordering branch from 293ad88 to 2e7eb0b Compare April 11, 2020 00:58
@jbytheway
Copy link
Contributor Author

jbytheway commented Apr 11, 2020

I've added a test and rewritten the sorting code to only compute O(n) hashes rather than O(n log(n)).

Regarding your other concern: I see no reason to believe there will be any less randomness than there was with std::shuffle. They're both derived from the same seed. The limited entropy in that seed will be the limiting factor (currently that's just 32 bits).

Should I move struct HashTest into an internal namespace of some kind?

@jbytheway jbytheway force-pushed the use_more_helpful_random_test_ordering branch 3 times, most recently from 08515cc to 4991e0e Compare April 11, 2020 15:05
@horenmar
Copy link
Member

My misgiving with this approach is that std::shuffle guarantees that every permutation of the input is equally likely (at least given true URBG and properly implemented uniform distribution), while I do not know the statistical properties of randomly-seeded FNV1a 😃

That is why I want to play with it a bit first -- generate some random seeds, check if there are suspicious results.

On a related note, to ensure stability properly, the code needs to handle hash collisions. I suggest just using test case name as the fallback comparison if hashes happen to be the same.

@jbytheway jbytheway force-pushed the use_more_helpful_random_test_ordering branch from 4991e0e to 4dd47c4 Compare April 11, 2020 17:15
Previously a random test ordering was obtained by applying std::shuffle
to the tests in declaration order.  This has two problems:

- It depends on the declaration order, so the order in which the tests
  will be run will be platform-specific.
- When trying to debug accidental inter-test dependencies, it is helpful
  to be able to find a minimal subset of tests which exhibits the issue.
  However, any change to the set of tests being run will completely
  change the test ordering, making it difficult or impossible to reduce
  the set of tests being run in any reasonably efficient manner.

Therefore, change the randomization approach to resolve both these
issues.

Generate a random value based on the user-provided RNG seed.  Convert
every test case to an integer by hashing a combination of that value
with the test name.  Sort the test cases by this integer.

The test names and RNG are platform-independent, so this should be
consistent across platforms.  Also, removing one test does not change
the integer value associated with the remaining tests, so they remain in
the same order.

To hash, use the FNV-1a hash, except with the basis being our randomly
selected value rather than the fixed basis set in the algorithm.  Cannot
use std::hash, because it is important that the result be
platform-independent.
@jbytheway jbytheway force-pushed the use_more_helpful_random_test_ordering branch from 4dd47c4 to f5d0139 Compare April 12, 2020 01:18
@jbytheway
Copy link
Contributor Author

My misgiving with this approach is that std::shuffle guarantees that every permutation of the input is equally likely (at least given true URBG and properly implemented uniform distribution), while I do not know the statistical properties of randomly-seeded FNV1a smiley

That is why I want to play with it a bit first -- generate some random seeds, check if there are suspicious results.

FWIW, I have added a test which verifies that the seeds from 1 to 100 generate different orderings. Obviously that's not proof of much, but it does at least demonstrate that the distribution isn't utterly awful.

On a related note, to ensure stability properly, the code needs to handle hash collisions. I suggest just using test case name as the fallback comparison if hashes happen to be the same.

Done.

@horenmar horenmar merged commit da9e3ee into catchorg:master Apr 14, 2020
@horenmar
Copy link
Member

As I said before, thanks for this -- the idea is quite useful. I am going to do a little further refactoring later, but it is nothing that stops this from being merged.

@jbytheway jbytheway deleted the use_more_helpful_random_test_ordering branch April 14, 2020 12:43
horenmar added a commit that referenced this pull request Apr 21, 2020
--- Improvements ---
* Running tests in random order (`--order rand`) has been reworked significantly (#1908)
  * Given same seed, all platforms now produce the same order
  * Given same seed, the relative order of tests does not change if you select only a subset of them
* Vector matchers support custom allocators (#1909)
* `|` and `&` (bitwise or and bitwise and) are now supported in `CHECK` and `REQUIRE`
  * The resulting type must be convertible to `bool`

--- Fixes ---
* Fixed computation of benchmarking column widths in ConsoleReporter (#1885, #1886)
* Suppressed clang-tidy's `cppcoreguidelines-pro-type-vararg` in assertions (#1901)
  * It was a false positive trigered by the new warning support workaround
* Fixed bug in test specification parser handling of OR'd patterns using escaping (#1905)

--- Miscellaneous ---
* Worked around IBM XL's codegen bug (#1907)
  * It would emit code for _destructors_ of temporaries in an unevaluated context
* Improved detection of stdlib's support for `std::uncaught_exceptions` (#1911)
kevingranade added a commit to CleverRaven/Cataclysm-DDA that referenced this pull request Sep 11, 2020
This workaround was added by #27165
The need for it was removed by catchorg/Catch2#1908 and the import of the new code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants