New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Strip SMT-LIB2 queries before parsing #401

Open

daniel-raffler wants to merge 37 commits into master from improve_opensmt2_parsing

Contributor

daniel-raffler commented Sep 28, 2024

Hello,
this MR adds some basic preprocessing for SMT-LIB2 queries before they are sent off to the solver for parsing. Most of the functionality has already been included as part of the OpenSMT/Bitwuzla backends, but it has now been improved and moved directly to AbstractFormulaManger where it can be used by all solvers. This should avoid further redundancies in the code and will help us solves the issues described in #372.

The idea behind the MR is to restrict ourselves to a simplified version of the SMT-LIB2 format that is sufficient to parse (and print) SMT-LIB2 queries. For this we remove comments and commands like (set-logic ...) and (exit) as they keep us from reusing the solver context. Operations like (push) or (get-value ..) are also removed, and we only keep declarations and assertions in our query. These simplified SMT-LIB2 scripts are then passed on to the solver where the actual parsing is done. We also use the same transformation to clean up the output of dumpFormula in a solver independent way.

BaierD and others added 21 commits

April 15, 2024 10:15


          Add simple stripping of SMTLIB2 queries for OpenSMT2 (as it can not h…

b3bb570

…andle comments and logics) TODO: escaped chars etc.


          Add simple tests for parsing SMTLIB2 Strings w logic selection and/or…

69cc222

… comments


          Fix example string for logicParseTest() commentsParseTest() in Solver…

9fec536

…FormulaIOTest


          Merge remote-tracking branch 'origin/master' into improve_opensmt2_pa…

de872fd

…rsing


          Added a new, abstract method parseImpl to AbstractFormulaManager. W…

5bda67d

…e will first clean up the input string in parse(), and then call parseImpl() to have it parsed by the solver.


          Renamed the native version of dumpFormula() to dumpFormulaImpl() and …

acbe0c3

…made the method protected.


          Moved tokenize() to the AbstractFormulaManager to make it available t…

ba24590

…o all solvers


          Added more methods to check token types.

612948f


          Clean up the input string in parse() before parseImpl() is called

b52bee9


          Add (define-fun ..) to the allowed commands

4c6068a


          Skip newline characters in tokens


          Add some comments in tokenizer()

8076c4b


          Change the return type for dumpFormulaImpl() from Appender to String.…

10d5fc7

… This will make the implementation less lazy. However, most (native) solvers just dump the String anyway and we need this to add postprocessing later.


          Also sanitize output from dumpFormula()

22977d0


          Remove custom SMT-LIB2 sanitizer code for Bitwuzla and OpenSMT

034133b


          Split up function declaration and function definitions to fix a bug i…

454f3b5

…n our Bitwuzla parser code


          Take care of string literals and quoted symbols in the tokenizer

2f098c2


          Fix handling of line-wraps when removing comments

9621e99


          Clean up comments

12a08ac


          Fix escaped double quotes in string literals

115c0ea


          Revert changes to a ModelTest. This one was included by accident.

b706b7b

daniel-raffler linked an issue

that may be closed by this pull request

Problems with common SMTLIB2 Strings for OpenSMT2 and Bitwuzla #372

Open

kfriedberger requested changes

View reviewed changes

Member

kfriedberger left a comment

Overall, this PR is quite well-written and very useful. I like this approach.

I added some comments.
Main point: Could you add a few more tests for the parser?
Thanks.

src/org/sosy_lab/java_smt/test/SolverFormulaIOTest.java

+                  BooleanFormula parsedForm = mgr.parse(BOOL_VARS_W_LOGIC_AND_COMMENT);
+                  assertThatFormula(parsedForm).isEquivalentTo(expr);
+                }

Member

kfriedberger Sep 28, 2024

We need a few more tests:

remove/sanitize comments from SMTLIB
allow/no sanitize weird names like with escaped keywords, if allowed.

src/org/sosy_lab/java_smt/solvers/yices2/Yices2FormulaManager.java Outdated

-                  return new Appenders.AbstractAppender() {
+                  StringBuilder builder = new StringBuilder();
+                  new Appenders.AbstractAppender() {

Member

kfriedberger Sep 28, 2024

Is this Appender still required? Can't we just use the StringBuilder?

src/org/sosy_lab/java_smt/solvers/princess/PrincessEnvironment.java Show resolved Hide resolved

src/org/sosy_lab/java_smt/basicimpl/AbstractFormulaManager.java

+                      String raw = dumpFormulaImpl(formulaCreator.extractInfo(t));
+                      out.append(sanitize(raw));
+                    }
+                  };
                 }

Member

kfriedberger Sep 28, 2024 •

edited

Loading

Is Lambda possible in this place? Maybe:

return out -> {
    String raw = dumpFormulaImpl(formulaCreator.extractInfo(t));
    out.append(sanitize(raw));
};

If not, looks also good.

Member

kfriedberger Sep 29, 2024

Well, not possible. :-)

Contributor Author

daniel-raffler Sep 30, 2024

Yes, I think we need an actual AbstractAppender object for the toString overload. When I changed the code it created an Appender instance, but toString would just return the Object name and not the printed String.

src/org/sosy_lab/java_smt/basicimpl/AbstractFormulaManager.java Outdated Show resolved Hide resolved

src/org/sosy_lab/java_smt/basicimpl/AbstractFormulaManager.java Outdated

+                protected static boolean isSetLogicToken(String token) {
+                  return token.matches("\\(\\s*set-logic.*");
+                }

Member

kfriedberger Sep 28, 2024

Is the visibility required? Maybe make it "private"?

src/org/sosy_lab/java_smt/basicimpl/AbstractFormulaManager.java Outdated

+                protected static boolean isAssertToken(String token) {
+                  return token.matches("\\(\\s*assert.*");
+                }

Member

kfriedberger Sep 28, 2024

Is the visibility required? Maybe make it "private"?

src/org/sosy_lab/java_smt/basicimpl/AbstractFormulaManager.java Outdated

+                protected static boolean isDeclarationToken(String token) {
+                  return token.matches("\\(\\s*(declare-const|declare-fun).*");
+                }

Member

kfriedberger Sep 28, 2024

Is the visibility required? Maybe make it "private"?

Contributor Author

daniel-raffler Sep 29, 2024 •

edited

Loading

isDeclarationToken is actually used in BitwuzlaFormulaManger here. But I can make the other ones private.

src/org/sosy_lab/java_smt/basicimpl/AbstractFormulaManager.java Outdated

+                protected static boolean isDefinitionToken(String token) {
+                  return token.matches("\\(\\s*define-fun.*");
+                }

Member

kfriedberger Sep 28, 2024

Is the visibility required? Maybe make it "private"?

src/org/sosy_lab/java_smt/basicimpl/AbstractFormulaManager.java Outdated

+                  }
+                  return builder.build();
+                }

Member

kfriedberger Sep 28, 2024

The parser looks impressive, and on first look it should work well.

Are there tests for all cases:

valid brackets
invalid brackets
nested brackets
empty brackets
comments
nested comments
newlines for Unix \n and Windows \r or \r\n (maybe use System.lineSeparator())

Contributor Author

daniel-raffler commented Sep 29, 2024

Overall, this PR is quite well-written and very useful. I like this approach.

I added some comments. Main point: Could you add a few more tests for the parser? Thanks.

Hi Karlheinz,
thanks for the feedback!
I'll be quite busy these next few days, but I can add some more test later this week.

daniel-raffler added 6 commits

September 29, 2024 19:14


          Remove Appender in dumpFormulaImpl() for Yices2 and use StringBuilder…

41d6809

… directly


          Use a lambda in AbstractFormulaManager.dumpFormula() as suggested by @…

8ab8c57

…kfriedberger


          Change index base in AbstractFormulaManager.sanitize

ec407e4


          Revert "Use a lambda in AbstractFormulaManager.dumpFormula() as sugge…

fc32aec

…sted by @kfriedberger"

This reverts commit 8ab8c57.


          Add basic tests for the tokenizer in AbstractFormulaManager

4478a38


          Fix handling of String literals in the tokenizer

bce5003

daniel-raffler added 8 commits

October 3, 2024 02:58


          Fix parenthesesInString test

8d2988b


          Add more tests for the tokenizer

d8d94be


          Change regexp for token tests to use "[\S\s]*" to match the remainder…

cd2ee83

… of the string. The original expression ".*" is problematic as "." does not include newline in Java. We worked around this by stripping newlines from the smtlib string before matching, but this won't work when the newline is escaped in a string literal or a quoted symbol.


          Don't strip newlines inside expressions in the tokenizer

038b793


          Renamed tokenTestBroken as its no longer broken

2a01a58


          Split off tokenizer into its own class to make it available to other …

d806573

…tests


          Moved TokenizerTest to test directory

42d394c


          Use tokenizer in SolverFormulaIOTest.checkThatAssertIsInLastLine to f…

5da4e3f

…ix broken MathSAT tests

daniel-raffler requested a review from kfriedberger

October 3, 2024 03:16

Contributor Author

daniel-raffler commented Oct 3, 2024

@kfriedberger
I've added a new test class TokenizerTest. Could you have another look and tell me if there is still something missing?

daniel-raffler added 2 commits

October 3, 2024 15:16


          Added another test class for AbstractFormulaManager.sanitize()

1073d4c


          Simplified tokenizer code

3ba6a02

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet