-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex performance cliff when using InRe
#5648
Comments
It is a nice example of a limitation of derivatives. They fail to leverage equivalence. |
The workaround here is to specify all constraints on one string using the same regex rather than several regexes: from z3 import *
x = String('x')
a = Re('a')
b = Re('b')
r1 = Concat(a, Star(Concat(b, a))) # a(ba)*
r2 = Concat(Star(Concat(a, b)), a) # (ab)*a
diff = Union(
Intersect(r1, Complement(r2)),
Intersect(r2, Complement(r1)),
)
s = Solver()
s.add(InRe(x, diff))
print(s.check()) I remember we had a preprocessing step at some point coded that combines all constraints on IMO this is not a matter of leveraging equivalence, equivalence between regexes is not needed here -- what we need is to notice that there are two constraints on x and simply combine them as a Boolean combination. |
Don't know whether it matters, but I'm actually using an array of type string -> string to model a set of key/value constraints. So my code really looks more like: from z3 import *
a = Re('a')
b = Re('b')
r1 = Concat(a, Star(Concat(b, a))) # a(ba)*
r2 = Concat(Star(Concat(a, b)), a) # (ab)*a
s = Solver()
s.add(r1 != r2)
print(s.check()) # This is nearly instantaneous
labels = Array('labels', StringSort(), StringSort())
lr1 = InRe(Select(labels, String('key')), r1)
lr2 = InRe(Select(labels, String('key')), r2)
s = Solver()
s.add(lr1 != lr2)
print(s.check()) # This times out and there are multiple constraints over I also considered modeling this as a set of 2-tuples but the Array API is significantly easier to use. If you have alternative suggestions for how to model this I am also open to them. We have a set of constraints analogous to: rule1:
allow:
'env' : ['staging', 'prod']
'os' : ['windows', 'linux']
'key' : 'a(ba)*'
deny:
'env' : ['prod']
'os' : 'linux'
rule2:
allow:
'env' : 'staging'
'os' : 'windows'
'key' : '(ab)*a' where deny constraints take precedence over allow constraints. These two rules should be equivalent. |
It matters to the extent that the pre-processing approach is going to be very partial. |
Maybe you don't need arrays, but just uninterpreted functions or even constants. If you don't use "store' then arrays are not really of any use and you waste overhead of extensionality. |
@NikolajBjorner good to know that rule of thumb about Z3 arrays; I've used them in a couple other hobby projects, never used store though. I will look into simplifying my constants. It's possible I could structure my expressions such that I compare regexes directly, bypassing |
Update: looks like uninterpreted functions are a nice drop-in replacement for arrays when you don't use from z3 import *
a = Re('a')
b = Re('b')
r1 = Concat(a, Star(Concat(b, a))) # a(ba)*
r2 = Concat(Star(Concat(a, b)), a) # (ab)*a
s = Solver()
s.add(Distinct(r1, r2))
print(s.check()) # This is nearly instantaneous
map = Function('map', StringSort(), StringSort())
lr1 = InRe(map(String('key')), r1)
lr2 = InRe(map(String('key')), r2)
s = Solver()
s.add(Distinct(lr1, lr2))
print(s.check()) # This times out Of course it still times out solving it but I guess the simplification will be easier. |
I am working on extended regexes modeling and am taking an approach similar to Loring et al., 2018. It relies on equivalence constraints on string variables, and cannot use the workaround described above by @cdstanford. At the moment, z3 times out too frequently for this approach to be viable. I was wondering if there was a plan to tackle this issue. I am unable to come up myself with a solution, but, if there is one, could give a try at an implementation. I can also provide multiple examples of inputs with string equivalence constraints on which z3 times out, if that helps. |
The solver is only trained on what we have access to, and even then, it isn't trained on many of the benchmark sets that are available on SMTLIB. The reason for the second part is that these benchmarks are not all necessarily based on user needs. |
@Swalkyn: I would also be interested, do you have a minimal example or benchmark? |
Apologies for the long time it took me to answer. Here are 2 minimal examples:
This models the PCRE/Java regex
This corresponds to the PCRE/Java regex Both models are unsat. On both examples, Z3 returns unkown with a 10s timeout. These constraints were generated using java-smt. |
@Swalkyn
I don't know how to deal with your second example or, in general, examples like it, but it sounds like a good open question. Would be curious to see if any other current solvers can handle these. |
@cdstanford, in case you're interested.
|
Surprised that Noodler fails! This seems to match their intended use case. |
Similar reaction: it should be in scope, though Noodler is inherently incomplete for unsat. |
What are these? Do they have characteristics that go beyond the basic examples? |
With Z3 4.8.12. The following python code reproduces the issue:
Use case is analyzing a set of ACL constraints, some of which have regexes. Not sure cases like this will actually crop up in the sort of constraints the system has to actually analyze, but thought I'd document it. Feel free to close if this is not a bug.
The text was updated successfully, but these errors were encountered: