-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proptest based differential contract testing #578
proptest based differential contract testing #578
Conversation
1e25bbb
to
98679d5
Compare
98679d5
to
ff6540e
Compare
Codecov Report
@@ Coverage Diff @@
## master #578 +/- ##
==========================================
+ Coverage 87.03% 88.13% +1.10%
==========================================
Files 87 87
Lines 6432 6998 +566
==========================================
+ Hits 5598 6168 +570
+ Misses 834 830 -4
Continue to review full report at Codecov.
|
ff6540e
to
a6b2a5b
Compare
This adds about ~3 minutes to the CI run. I can explore how to run that on a Github Actions cronjob instead of with each PR if you like? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we run them based on a cronjob (GitHub Actions supports that) instead of running them on every PR.
The ideal setup imo would be to have a suite of these proposition tests running at all times against master. It would also be nice to have some sort of notification or issue creation action when invalidating inputs are found. In response, we would add the input to a set of regression checks and fix the bug.
fn math_u8(val in 0u8..=255, val2 in 0u8..=255) { | ||
with_executor(&|mut executor| { | ||
|
||
let harness = DualHarness::from_fixture(&mut executor, "math_u8", "Foo", &[]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly then the executor is holding on to all the chain state so reusing the same executor across multiple tests could make the tests become non deterministic where equal inputs to a test could mean different results depending on which tests and inputs ran before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this would only be feasible with contracts that don't mutate state. So if we end up running alot of tests that meet this criteria, then it might be worth looking into.
It would be nice to know how long it takes to initialize the executor and harness vs how long it takes to run the tests..
Yeah, that's kind of what's happening with that
So, I understand that as: We would not have it run against each PR (if it adds too much time to that) but we would invoke the run each time that |
Ok, I'm going to merge this as is but as I keep playing with these tests a bit more I will also look into how to run these tests independently from PRs and change that with a follow up PR. |
What was wrong?
We could do more differential contract testing. The idea is to write certain contracts in both Fe and Solidity and then fuzz them with random data and compare if we find cases where they return different results.
How was it fixed?
math_u8.fe
+math_u8.sol
proptest
based on strategies and we can then assert that they performed the sameOpen questions
The tests take a considerable amount of execution time. Should we run them based on a cronjob (GitHub Actions supports that) instead of running them on every PR. Alternatively we could also run them on every second, third, .., tenth PR or something like that.
I guess that with some extra work we could generate the whole test setup if we inspect the ABI of the test contracts and generate strategies suitable for each public method. But I'm not yet sure if it's really that helpful in practice or if a bit more manual moderation is actually better.
I've started with
math_u8
andmath_i8
tests because the range of that datatype is quite tiny but the bugs that are caught foru8
/i8
are often transferable to the bigger numerics. The tests already caught a bug where negations (-x
) where not properly checked and the fix is included in this PR.