Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated regex derivative engine #5567

Merged
merged 16 commits into from
Oct 8, 2021
Merged

Conversation

veanes
Copy link
Collaborator

@veanes veanes commented Sep 24, 2021

Rewrote/updated several derivate related algos both for correctness and performance.
Main changes involve more efficient use of algebraic laws and avoidance of unnecessary rewrites that in many cases were causing orders of magnitude of overhead.
Several open issues should be addressed by the updated engine.
Some old code is still there for convenience to debug possible regressions, that will eventually be removed, once any new regression bugs have been flushed out.

@NikolajBjorner
Copy link
Contributor

We lost some rewrites

From re.smt2: These should have been rewritten to true:

(str.in_re "" (re.* (str.to_re "a")))
(str.in_re "" (re.* (str.to_re "a")))
(str.in_re "" (str.to_re ""))
(not (str.in_re "" (re.+ re.allchar)))
(not (str.in_re "" (re.+ re.allchar)))

@NikolajBjorner
Copy link
Contributor

re_rewrites.smt2 exposes a reference count leak.

E.g., fix by taking a reference count in callee.

 expr_ref seq_rewriter::mk_antimirov_deriv(expr* e, expr* r, expr* path) {
+    expr_ref _path(path, m()), _r(r, ());
     expr_ref result(m_op_cache.find(OP_RE_DERIVATIVE, e, r, path), m());
     if (!result) {
         mk_antimirov_deriv_rec(e, r, path, result);
@@ -3204,6 +3205,7 @@ expr_ref seq_rewriter::mk_antimirov_deriv_intersection(expr* d1, expr* d2, expr*
 
 expr_ref seq_rewriter::mk_antimirov_deriv_concat(expr* d, expr* r) {
     expr_ref result(m());
+    expr_ref _r(r, m()), _d(d, m());
     expr* c, * t, * e;
     if (m().is_ite(d, c, t, e))
         result = m().mk_ite(c, mk_antimirov_deriv_concat(t, r), mk_antimirov_deriv_concat(e, r));

@NikolajBjorner
Copy link
Contributor

    else if (re().is_of_pred(r, r1)) {
        array_util array(m());
        expr* args[2] = { r1, e };
        result = array.mk_select(2, args);
        // Use mk_der_cond to normalize
        STRACE("seq_verbose", tout << "deriv of_pred" << std::endl;);
        result = mk_der_cond(result, e, seq_sort);
    }
    else
        // stuck cases
        result = re().mk_derivative(e, r);

Need case for is_of_pred.
Some functionality is shared between mk_antimirov_derivative_rec and mk_deriv_rec.

@NikolajBjorner
Copy link
Contributor

Also need REWRITE_FULL, REWRITE1 is not enough


    if (get_head_tail(a, hd, tl)) {
        //result = re().mk_in_re(tl, re().mk_derivative(hd, b));
        //result = re().mk_in_re(tl, mk_derivative(hd, b));
        result = mk_in_antimirov(tl, mk_antimirov_deriv(hd, b, m().mk_true()));
        return BR_REWRITE_FULL;
    }

@veanes
Copy link
Collaborator Author

veanes commented Oct 7, 2021

I addressed the above comments, but I probably need to revisit where I need to use mk_der_cond as in the missing case of .is_of_pred, at some point I omitted relying on the BDD-like structure -- wanted to do this in a more principled way I wanted instead to use BDDs based on bits in the characters. But I understand this heuristic helps in general with simplifications and detecting unsat cases.

@NikolajBjorner
Copy link
Contributor

bdd "structure" shouldn't matter (much) for of-pred.

@veanes
Copy link
Collaborator Author

veanes commented Oct 7, 2021

Your suggested fixes actually took care of all the failing builds. I was expecting more failures. I guess something I don't understand (and essentially violated the "protocol") is the BR_REWRITE... status flags. We could maybe spend a meeting (say an hour or so) on these and go over the overarching logic, could also be in person at work.

rational jv;
if (str().is_extract(t, s, j, k) && m_autil.is_numeral(j, jv) && jv.is_nonneg()) {
unsigned i;
expr_ref k_min_1(str().is_len_sub(k, l, s_, i) && s == s_ ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to rewrite this

  • pre-condition jv.is_unsigned() allows jv.get_unsigned(). Pre-condition jv.is_nonneg() does not allow jv.get_uint32()
  • review for when indices are negative

if (str().is_extract(t, s, j, k) && m_autil.is_numeral(j, jv) && jv.is_nonneg()) {
unsigned i;
// k = |s_| - i, l = |s_| implies that lastpos = j + k - 1 = k + j - 1 = j + l - i - 1 = l + j - i - 1
expr_ref lastpos(str().is_len_sub(k, l, s_, i) && s == s_ ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to rewrite this

pre-condition jv.is_unsigned() allows jv.get_unsigned(). Pre-condition jv.is_nonneg() does not allow jv.get_uint32()
review for when indices are negative

expr_ref one(m_autil.mk_int(1), m());
if (str().is_extract(t, s, j, k))
result = str().mk_substr(s, j, m_autil.mk_sub(k, one));
if (str().is_extract(t, s, j, k)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something required inline simplification here. It might be too brittle to assume results of such functions are simplified.

@@ -2881,7 +2896,8 @@ br_status seq_rewriter::mk_re_reverse(expr* r, expr_ref& result) {
br_status seq_rewriter::mk_re_derivative(expr* ele, expr* r, expr_ref& result) {
result = mk_derivative(ele, r);
// TBD: we may even declare BR_DONE here and potentially miss some simplifications
return re().is_derivative(result) ? BR_DONE : BR_REWRITE_FULL;
// return re().is_derivative(result) ? BR_DONE : BR_REWRITE_FULL;
return BR_DONE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was changed, because...

expr_ref seq_rewriter::mk_derivative(expr* ele, expr* r) {
STRACE("seq_verbose", tout << "derivative: " << mk_pp(ele, m())
/*STRACE("seq_verbose", tout << "derivative: " << mk_pp(ele, m())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented out code

}

expr_ref seq_rewriter::mk_antimirov_deriv(expr* e, expr* r, expr* path) {
// Take reference count of path and r
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not reference count on e?

result = re().mk_ite_simplify(m().mk_eq(e, h), re().mk_to_re(t), nothing());
}
else
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting

result = nothing();
else
// observe that the precondition |r1|>0 of mk_seq_rest is implied by c1
result = re().mk_ite_simplify(c1, re().mk_reverse(re().mk_to_re(mk_seq_butlast(r1))), nothing());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_br.mk_ite

result = mk_antimirov_deriv_union(c1, re().mk_ite_simplify(r1nullable, mk_antimirov_deriv(e, r2, path), nothing()));
}
else if (m().is_ite(r, c, r1, r2)) {
c1 = simplify_path(m().mk_and(c, path));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sometimes called, other times using m_bv.mk_and.
Should the path expression be considered as an ADT?

result = mk_antimirov_deriv_intersection(
mk_antimirov_deriv(e, r1, path),
mk_antimirov_deriv_negate(mk_antimirov_deriv(e, r2, path)), m().mk_true());
else if (re().is_of_pred(r, r1)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hoist to self-contained function that gets used also the other place

// Take reference count of r and d
expr_ref _r(r, m()), _d(d, m());
expr* c, * t, * e;
if (m().is_ite(d, c, t, e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is cost model for this rewrite?

}

expr_ref seq_rewriter::mk_antimirov_deriv_union(expr* d1, expr* d2)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format

// s in [] <==> false, also: s in () <==> false when |s|>0
result = m().mk_false();
else if (m().is_ite(d, c, d1, d2))
result = re().mk_ite_simplify(c, mk_in_antimirov_rec(s, d1), mk_in_antimirov_rec(s, d2));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_br.mk_ite

return result;
}

/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems more reasonable to have a path.push method that does this instead of creating mk_and then simplifying

Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
@NikolajBjorner NikolajBjorner merged commit 146f462 into Z3Prover:master Oct 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants