Skip to content

Commit

Permalink
Add coverage API test (#33)
Browse files Browse the repository at this point in the history
Test aims to cover file `nlsat/nlsat_explain.cpp`, function
`normalize()`, within `if(m_atoms[b]->is_ineq_atom())` block, within
`for` loop itering until `sz`, `else` branch of `(is_const(p) ||
max_var(p) < max)` check, when `p != a->p(i)` holds.

Co-authored-by: Andrei Lascu <andrei.lascu10@imperial.ac.uk>
  • Loading branch information
AndreiL and Andrei Lascu authored Sep 10, 2021
1 parent 056135c commit 75863fa
Showing 1 changed file with 87 additions and 0 deletions.
87 changes: 87 additions & 0 deletions coverage/cpp/cov6.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
#include <vector>

This comment has been minimized.

Copy link
@NikolajBjorner

NikolajBjorner Jan 3, 2022

Contributor

This one now times out.
It takes a few seconds when I compile and run it in debug mode on my own so I am not sure what is the root cause of the regression.
@0152la

This comment has been minimized.

Copy link
@0152la

0152la Jan 4, 2022

Contributor

This is the same test that I flagged as having a greatly increased execution time in the coverage environment [1]. It might require both debug + coverage enabled to show this high execution time. Maybe recent changes pushed this execution time even higher, going over the 1800s timeout limit [2]. The naive option to have it not timeout is to further push up this limit. I'll try to run locally and find the new execution time, and see if its viable to just push up the timeout.

[1] #33 (comment)
[2] https://github.com/Z3Prover/z3test/blob/master/scripts/test_coverage_tests.py#L4

This comment has been minimized.

Copy link
@NikolajBjorner

NikolajBjorner Jan 4, 2022

Contributor

The breaking change was to update how configuration parameters are maintained.
So, the same solvers are used, but with different parameter settings. It suggests the example is brittle with respect to changes in the backend. Since the debug only version of this code is not overly time consuming it doesn't suggest directly a bug in z3 but an issue with how it can be tested in a stable way.

This comment has been minimized.

Copy link
@0152la

0152la Jan 5, 2022

Contributor

That would make sense. So maybe fixing certain parameters in the test itself would maybe help improve execution time, and hopefully be more robust to future changes. I'll play around with a few parameter combinations.

As an aside, it seems the execution time has jumped to about 33 minutes, which I would think is too high to keep as is.

This comment has been minimized.

Copy link
@NikolajBjorner

NikolajBjorner Jan 5, 2022

Contributor

Not sure if I was able to say it well: The point I am trying to make is that some inputs, especially those that use non-linear arithmetic, strings, quantifiers, can be show brittle behavior across smaller, unrelated, code changes.
The this is not necessarily because the code change is at fault, or the parameters are wrong, but because it is simply not easy to write a solver for combinatorial problems that has uniform performance.
A way to deal with this situation when it comes to regression tests is to ensure the tests are so small that a drastic change in performance is so clearly an actionable bug.
Unless I understand why cov6 takes 33 minutes under some settings, it takes 6s in debug mode on my machine, and can translate it back to a fix or improvement, the test has to change. Changing parameters is too brittle. Changing the size of the test is less brittle.

This comment has been minimized.

Copy link
@0152la

0152la Jan 7, 2022

Contributor

I agree the test should be shorter, but at the time, I was unable to reduce the test size further while maintaining coverage of the code of interest (as indicated in the commit). What I can do is see if there have been changes in achieved coverage since, and potentially adapt the test accordingly, or maybe search for another test to achieve the same coverage (and essentially replace this). But if I were to try to recreate the coverage on the same Z3 commit that this test was produced on, I expect I won't be able to minimize this particular instance further.

This comment has been minimized.

Copy link
@NikolajBjorner

NikolajBjorner Jan 7, 2022

Contributor

Do you know where it spends all the time? If we have an idea what it is that takes 33 minutes with coverage turned on vs 6s in plain debug mode it may be possible to simply adapt internals to make coverage possible.

This comment has been minimized.

Copy link
@NikolajBjorner

NikolajBjorner Jan 7, 2022

Contributor
#include "z3++.h"

void
check_expr_same_sat(z3::context& c, z3::expr e1, z3::expr e2)
{
z3::solver solver(c);
z3::expr conjecture = z3::operator==(e1, e2);
solver.add(!conjecture);
assert(solver.check() != z3::sat);
assert(c.check_error() == Z3_OK);
}

z3::expr
obfuscated_identity(z3::expr t)
{
z3::expr abs_t = z3::ite(t > t.ctx().int_val(0), t, z3::abs(t));
return z3::ite(t == abs_t, abs_t, t);
}

z3::expr
obfuscated_sub(z3::expr t1, z3::expr t2)
{
return -obfuscated_identity(-t1) - t2;
}

int main()
{
z3::context ctx;
std::vector<z3::expr> vars;
for (size_t i = 0; i < 85; ++i) {
std::string new_var_name = "x" + std::to_string(i);
z3::expr new_expr = ctx.int_const(new_var_name.c_str());
vars.push_back(new_expr);
}

z3::expr x_1 = vars.at(1);
z3::expr x_84 = vars.at(84);
z3::expr zero = ctx.int_val(0);

z3::expr t_1 = z3::ite(x_84 != 0 && x_84 != 0, ctx.int_val(z3::pw(x_84, x_84)), x_84);
z3::expr t_2 = z3::ite(t_1 != 0, t_1 / t_1, t_1);
z3::expr t_3 = t_2 * t_2;
z3::expr t_4 = t_3 + t_1;
z3::expr t_5 = t_4 - t_3;
z3::expr t_6 = z3::ite(t_2 != 0, x_84 / t_2, x_84);
z3::expr t_7 = z3::ite(t_3 != 0, t_5 / t_3, t_5);
z3::expr t_8 = z3::ite(t_7 != 0, t_6 / t_7, t_6);
z3::expr t_9 = t_5 + t_8;
z3::expr t_10 = t_9 - zero;
z3::expr t_11 = t_5 + t_2;
z3::expr t_12 = t_11 + t_4;
z3::expr t_13 = -t_12;
z3::expr t_14 = z3::ite(t_13 != 0, z3::rem(t_8, t_13), t_8);
z3::expr t_15 = z3::ite(t_14 != 0 && t_9 != 0, ctx.int_val(z3::pw(t_14, t_9)), t_14);
z3::expr t_16 = z3::ite(t_15 != 0, t_3 / t_15, t_3);
z3::expr t_17 = t_16 + t_7;
z3::expr t_18 = z3::min(t_8, t_17);
z3::expr t_19 = -t_18;
z3::expr t_20 = z3::ite(t_18 != 0, z3::rem(t_11, t_18), t_11);
z3::expr t_21 = z3::ite(t_19 != 0 && t_20 != 0, ctx.int_val(z3::pw(t_19, t_20)), t_19);
z3::expr t_22 = z3::ite(t_12 != 0 && t_21 != 0, ctx.int_val(z3::pw(t_12, t_21)), t_12);
z3::expr t_23 = z3::max(t_10, t_22);
z3::expr in_0 = z3::ite(x_1 != 0, z3::rem(t_23, x_1), t_23);

/* Template initialisation for output var 1 */
z3::expr t_24 = z3::max(x_84, zero);
z3::expr t_25 = z3::ite(t_24 != 0 && zero != 0, ctx.int_val(z3::pw(t_24, zero)), t_24);
z3::expr t_26 = z3::min(zero, t_25);
z3::expr t_28 = z3::ite(zero != 0 && zero != 0, ctx.int_val(z3::pw(zero, zero)), zero);
z3::expr t_29 = z3::ite(t_28 != 0, z3::mod(t_25, t_28), t_25);
z3::expr in_1 = t_26 + t_29;

assert(ctx.check_error() == Z3_OK);

// Meta test 0
z3::expr r_0 = z3::ite(in_1 == ctx.int_val(0), in_0, in_0 / in_1);
r_0 = obfuscated_sub(r_0, in_1);
r_0 = z3::ite(in_1 == ctx.int_val(0), r_0, r_0 / in_1);

// Meta test 1
z3::expr r_1 = z3::ite(in_1 == ctx.int_val(0), in_0, in_0 / in_1);
r_1 = r_1 - in_1;
r_1 = z3::ite(in_1 == ctx.int_val(0), r_1, r_1 / in_1);

check_expr_same_sat(ctx, r_1, r_0);
}

3 comments on commit 75863fa

@veanes
Copy link
Contributor

@veanes veanes commented on 75863fa Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that one of the regression tests that fails is 5731.smt2, that is an unsat regex test that uses very large loop bounds but relies on an optimization -- loop subsumbption check for regexes added in (Z3Prover/z3#5731)). This optimization should always be on and not something that is turned on or off by an option as far as I know, so at this point I don't know what might be causing it to diverge, needs to be looked into.

@veanes
Copy link
Contributor

@veanes veanes commented on 75863fa Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked that the 5731.smt2 regression test for (Z3Prover/z3#5731) is fine with the latest bits, not sure why it timed out above.

@NikolajBjorner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely a fluke: the pipelines also build branches. The smtmus branch was behind.

Please sign in to comment.