Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add 1% tolerance to visual tests (#10422)
In the recently merged #9961 the visual comparison tests were re-added to run a visual diff on the visualization output against a known reference image. These improved tests provide us missing coverage and have already found potential bugs in proposed changes, see: #10208 (comment) However, the comparison were looking for a 100% match between the reference image and the generated output. In an ideal world we'd be able to rely on this. But in practice an exact comparisons of the images are quite flaky. This is because the actual visualization output is a function of not only our python code in Qiskit, but also upstream python dependencies (such as matplotlib, seaborn, etc) and more importantly those libraries leverage many system libraries and packages. This includes the C libraries for image formats (e.g. libpng), but also things like fonts. A version bump in our CI image's package versions can cause subtle differences in the output of visualizations. The CI images are controlled by Microsoft/Github and isn't something we we can't really depend on being constant forever (even if we used docker for a base test image the upstream images we would build off of that aren't 100% fixed either and could cause similar issues). We've had numerous issues with this in the past with image comparison tests (especially when you start running them cross-platform which isn't the case here) and is why we've oscillated between having them in the test suite and not throughout the development history of Qiskit. This commit adds a 1% tolerance to the comparison ratio so instead of needing a 100% match a 99% match is good enough. While this does technically reduce the coverage and a potentially invalid image could slip in that 1% difference, the chance of that happening are fairly unlikely especially weighed against the likelihood of a system change causing a CI blockage (which happened within one day of merging #9961). The only other option is to disable these tests as voting in the CI job, which would all but remove their utility because they will likely go stale (as the testing harness before this did). We may still end up making them non-voting anyway despite the coverage gains if we can't reliably run the tests in CI.
- Loading branch information