Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip SMT-LIB2 queries before parsing #401

Open
wants to merge 37 commits into
base: master
Choose a base branch
from

Conversation

daniel-raffler
Copy link
Contributor

Hello,
this MR adds some basic preprocessing for SMT-LIB2 queries before they are sent off to the solver for parsing. Most of the functionality has already been included as part of the OpenSMT/Bitwuzla backends, but it has now been improved and moved directly to AbstractFormulaManger where it can be used by all solvers. This should avoid further redundancies in the code and will help us solves the issues described in #372.

The idea behind the MR is to restrict ourselves to a simplified version of the SMT-LIB2 format that is sufficient to parse (and print) SMT-LIB2 queries. For this we remove comments and commands like (set-logic ...) and (exit) as they keep us from reusing the solver context. Operations like (push) or (get-value ..) are also removed, and we only keep declarations and assertions in our query. These simplified SMT-LIB2 scripts are then passed on to the solver where the actual parsing is done. We also use the same transformation to clean up the output of dumpFormula in a solver independent way.

BaierD and others added 21 commits April 15, 2024 10:15
…andle comments and logics) TODO: escaped chars etc.
…e will first clean up the input string in parse(), and then call parseImpl() to have it parsed by the solver.
… This will make the implementation less lazy. However, most (native) solvers just dump the String anyway and we need this to add postprocessing later.
Copy link
Member

@kfriedberger kfriedberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this PR is quite well-written and very useful. I like this approach.

I added some comments.
Main point: Could you add a few more tests for the parser?
Thanks.

BooleanFormula parsedForm = mgr.parse(BOOL_VARS_W_LOGIC_AND_COMMENT);
assertThatFormula(parsedForm).isEquivalentTo(expr);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a few more tests:

  • remove/sanitize comments from SMTLIB
  • allow/no sanitize weird names like with escaped keywords, if allowed.

return new Appenders.AbstractAppender() {

StringBuilder builder = new StringBuilder();
new Appenders.AbstractAppender() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this Appender still required? Can't we just use the StringBuilder?

String raw = dumpFormulaImpl(formulaCreator.extractInfo(t));
out.append(sanitize(raw));
}
};
}
Copy link
Member

@kfriedberger kfriedberger Sep 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Lambda possible in this place? Maybe:

return out -> {
    String raw = dumpFormulaImpl(formulaCreator.extractInfo(t));
    out.append(sanitize(raw));
};

If not, looks also good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, not possible. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we need an actual AbstractAppender object for the toString overload. When I changed the code it created an Appender instance, but toString would just return the Object name and not the printed String.

protected static boolean isSetLogicToken(String token) {
return token.matches("\\(\\s*set-logic.*");
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the visibility required? Maybe make it "private"?

protected static boolean isAssertToken(String token) {
return token.matches("\\(\\s*assert.*");
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the visibility required? Maybe make it "private"?

protected static boolean isDeclarationToken(String token) {
return token.matches("\\(\\s*(declare-const|declare-fun).*");
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the visibility required? Maybe make it "private"?

Copy link
Contributor Author

@daniel-raffler daniel-raffler Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isDeclarationToken is actually used in BitwuzlaFormulaManger here. But I can make the other ones private.

protected static boolean isDefinitionToken(String token) {
return token.matches("\\(\\s*define-fun.*");
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the visibility required? Maybe make it "private"?

}
return builder.build();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parser looks impressive, and on first look it should work well.

Are there tests for all cases:

  • valid brackets
  • invalid brackets
  • nested brackets
  • empty brackets
  • comments
  • nested comments
  • newlines for Unix \n and Windows \r or \r\n (maybe use System.lineSeparator())

@daniel-raffler
Copy link
Contributor Author

Overall, this PR is quite well-written and very useful. I like this approach.

I added some comments. Main point: Could you add a few more tests for the parser? Thanks.

Hi Karlheinz,
thanks for the feedback!
I'll be quite busy these next few days, but I can add some more test later this week.

… of the string. The original expression ".*" is problematic as "." does not include newline in Java. We worked around this by stripping newlines from the smtlib string before matching, but this won't work when the newline is escaped in a string literal or a quoted symbol.
@daniel-raffler
Copy link
Contributor Author

@kfriedberger
I've added a new test class TokenizerTest. Could you have another look and tell me if there is still something missing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Problems with common SMTLIB2 Strings for OpenSMT2 and Bitwuzla
2 participants