Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BackendCannotProceed to improve integration #4092

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Zac-HD
Copy link
Member

@Zac-HD Zac-HD commented Aug 25, 2024

Sometimes, an alternative backend will just be done, and this new exception (and handlers) provides an interface to communicate that. There are several reasons this could happen, including:

  • not_realizable: e.g. hitting a timeout while attempting to .realize() some value, as in Rare PathTimeout errors in provider.realize(...) pschanely/hypothesis-crosshair#21
    • if we hit more than ten of these, we switch to the Hypothesis backend to preserve good performance.
  • exhausted: running out of useful work to do, e.g. if the test function immediately invokes compiled code then Crosshair doesn't have much to do, and can hand back to the Hypothesis backend
  • verified: for the cases where we've estabilished that the test will never fail, similar to Hypothesis' own ExitReason.finished if the datatree is exhausted (e.g. when st.booleans() has generated both True and False).
    • Since alternative backends are generally pretty new and might miss bugs, we treat this as for exhausted and continue testing on the Hypothesis backend for now.

TODO: lots of tests

  • that we handle it reasonably if a provider raises on the first interaction, second interaction, etc.
  • that it works with observability
  • that incorrect verification reports are reported
  • that if too many not_realized reports are made we switch back; also maybe improve the heuristic?

cc @pschanely; I'm probably not going to get back to this for a week or two but if you want to try it out I'd love to hear your thoughts 🙂

@pschanely
Copy link
Contributor

@Zac-HD Easy to integrate, and it makes the plugin code less fragile! 🎉 (my changes)

  • I have a few ignored-path cases that aren't technically not_realizable (e.g. timeout/unknown sat during body execution or even during the draws). Maybe we rename to something a little more generic?
  • The idea behind your exhausted reason is fascinating - I'm not producing it yet, but it would be possible to detect at least some cases where we simply aren't going to make progress.
  • Just spot checking test_large_data_will_fail_a_health_check, I am now getting a FlakyReplay (BackendCannotProceed vs overrun) - maybe we can add an explicit check for this exception there?

@Zac-HD
Copy link
Member Author

Zac-HD commented Sep 5, 2024

@Zac-HD Easy to integrate, and it makes the plugin code less fragile! 🎉 (my changes)

Nice!

  • I have a few ignored-path cases that aren't technically not_realizable (e.g. timeout/unknown sat during body execution or even during the draws). Maybe we rename to something a little more generic?

How about discard_test_case?

  • The idea behind your exhausted reason is fascinating - I'm not producing it yet, but it would be possible to detect at least some cases where we simply aren't going to make progress.

I was thinking we'd try to use verified only when we thought that the result was sound, e.g. if you ever hit an IgnoreAttempt, UnexploredPath, or suspected_proxy_intolerance_exception then you'd raise exhausted instead there.

Ideally this would allow us to support sound verification of e.g. code that just does SMT-friendly integer logic, whereas tests that hit timeouts or whatever else will fall back to Hypothesis' own backend for the remainder of the test budget. At the extremely ambitious end, we might eventually try to focus Hypothesis on the subspaces that Crosshair couldn't handle as per the last few ideas in #3914. That's definitely a far-future thing though, and I'd rather start with using Crosshair to explore specific branches that HypoFuzz can't cover...

  • Just spot checking test_large_data_will_fail_a_health_check, I am now getting a FlakyReplay (BackendCannotProceed vs overrun) - maybe we can add an explicit check for this exception there?

Hmm, I don't think BackendCannotProceed should ever be raised all the way to the user, but that just suggests that I need to handle it somewhere else 🙂

@pschanely
Copy link
Contributor

pschanely commented Sep 7, 2024

  • I have a few ignored-path cases that aren't technically not_realizable (e.g. timeout/unknown sat during body execution or even during the draws). Maybe we rename to something a little more generic?

How about discard_test_case?

That works!

  • The idea behind your exhausted reason is fascinating - I'm not producing it yet, but it would be possible to detect at least some cases where we simply aren't going to make progress.

I was thinking we'd try to use verified only when we thought that the result was sound, e.g. if you ever hit an IgnoreAttempt, UnexploredPath, or suspected_proxy_intolerance_exception then you'd raise exhausted instead there.

Ideally this would allow us to support sound verification of e.g. code that just does SMT-friendly integer logic, whereas tests that hit timeouts or whatever else will fall back to Hypothesis' own backend for the remainder of the test budget. At the extremely ambitious end, we might eventually try to focus Hypothesis on the subspaces that Crosshair couldn't handle as per the last few ideas in #3914. That's definitely a far-future thing though, and I'd rather start with using Crosshair to explore specific branches that HypoFuzz can't cover...

Over-sharing a little bit, but this is CrossHair's exception taxonomy for early path exits:

* CrossHairInternal - We hit a bug in CrossHair.
* IgnoreAttempt - This path is impossible (we made an assumption that was later invalidated)
* UnexploredPath - This path is viable, but CrossHair could not complete it, because:
  * UnknownSatisfiability - The solver could not make some decision in time.
  * PathTimeout - The path, as a whole, took too long.
  * CrosshairUnsupported - Some part of the code is not supported by CrossHair.

Notably, IgnoreAttempt doesn't mean that we're skipping some portion of the input space (though all the others do). One challenge is that a single UnexploredPath doesn't suggest much about what will happen on future paths. We probably want to track a ratio of completion statuses over a few runs, like (# of successful completions) / (# of UnexploredPaths + successful completions). It might be easiest to do this on the plugin side, and have me to raise a "I'm not making enough progress" exception at the right time.

The other case that's not covered here is when we have plenty of successful completions, but too much realization is happening. This could also potentially be detected in the plugin.

And, yes, I share your vision for expanding information sharing between the backends. There is so much we could do.

@Zac-HD
Copy link
Member Author

Zac-HD commented Sep 11, 2024

  • OK, sounds like you could still finish verified after also seeing IgnoreAttempt paths - happily the choice of finish status is taken by the plugin, so Hypothesis doesn't need to know about this taxonomy 🙂
  • Similarly, if you're seeing too much realization or overly slow progress for other reasons, you can finish with exhausted and Hypothesis will continue with our standard backend
    • I think Hypothesis should also have something here, but likely much more conservative than the if self.__failed_realize_count > 10: in the current draft. Maybe >20% of the last thousand attempts?

@Zac-HD Zac-HD force-pushed the backend-cannot-proceed branch 2 times, most recently from 5fa4dab to e9e13f4 Compare September 30, 2024 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants