Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boolean expressions do not short circuit as expected #2444

Open
AkshayWarrier opened this issue Dec 13, 2023 · 3 comments
Open

Boolean expressions do not short circuit as expected #2444

AkshayWarrier opened this issue Dec 13, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@AkshayWarrier
Copy link
Contributor

I discovered this when I first ran

s: str
s = "    "
print(s.lstrip())

and it gave me this error

String index: 4 is out of Bounds

Looking at the implementation of _lpython_str_lstrip

@overload
def _lpython_str_lstrip(x: str) -> str:
    ind :i32
    ind = 0
    while ind < len(x) and x[ind] == ' ':
        ind += 1
    return x[ind :len(x)]

I realized that the and in while ind < len(x) and x[ind] == ' ': wasn't short circuiting as expected. It becomes more clear when running a minimal example

s: str
s = "    "
ind: i32 = 4
print((ind < 4) and (s[ind] == ' '))

This raises the index out of bounds error, because the second expression is being executed even though the first expression is False.
Similarly this raises an error as well

s: str
s = "    "
ind: i32 = 4
print((ind == 4) or (s[ind] == ' '))
@certik certik added the bug Something isn't working label Dec 13, 2023
@certik
Copy link
Contributor

certik commented Dec 13, 2023

Great catch! Thanks for figuring out the problem.

There is a design issue to figure out: in Fortran these operators do not short circuit. In C and Python they do. There are also proposals to Fortran both in terms of new operators or compiler options or even changes to the standard to make them short circuit.

In terms of performance, it would be nice to not close door to possible optimizations.

In terms of the bug above, I am guessing the ASR->LLVM evaluates all part of the expression and then uses "and" on the result. To short-circuit it, it should first evaluate the first, and only do the second if it is true.

Probably the easiest for now is to set a flag in ASR for LogicalBinOp to designate if they should short-circuit, something like:

--- a/src/libasr/ASR.asdl
+++ b/src/libasr/ASR.asdl
@@ -277,7 +277,7 @@ expr
     | LogicalConstant(bool value, ttype type)
     | LogicalNot(expr arg, ttype type, expr? value)
     | LogicalCompare(expr left, cmpop op, expr right, ttype type, expr? value)
-    | LogicalBinOp(expr left, logicalbinop op, expr right, ttype type, expr? value)
+    | LogicalBinOp(expr left, logicalbinop op, expr right, bool short_circuit, ttype type, expr? value)
 
     | ListConstant(expr* args, ttype type)
     | ListLen(expr arg, ttype type, expr? value)

Then in the backend it would generate the proper code.

When would we not want to short-ciruict? I think never. I think we however want to have the freedom to rearrange the expression sometimes for optimizations. There is also an issue of emitting Fortran code from ASR, if we short-circuit by default, we can't just use .and. in Fortran directly.

This page has a lot of background on this topic: https://en.wikipedia.org/wiki/Short-circuit_evaluation, it seems we most likely require both, so the above diff is probably the best.

@certik
Copy link
Contributor

certik commented Dec 13, 2023

Conceptually it looks like the following will cover all cases:

--- a/src/libasr/ASR.asdl
+++ b/src/libasr/ASR.asdl
@@ -277,7 +277,7 @@ expr
     | LogicalConstant(bool value, ttype type)
     | LogicalNot(expr arg, ttype type, expr? value)
     | LogicalCompare(expr left, cmpop op, expr right, ttype type, expr? value)
-    | LogicalBinOp(expr left, logicalbinop op, expr right, ttype type, expr? value)
+    | LogicalBinOp(expr left, logicalbinop op, expr right, logicalevaltype eval_type, ttype type, expr? value)
 
     | ListConstant(expr* args, ttype type)
     | ListLen(expr arg, ttype type, expr? value)
@@ -435,6 +435,8 @@ binop = Add | Sub | Mul | Div | Pow | BitAnd | BitOr | BitXor | BitLShift | BitR
 
 logicalbinop = And | Or | Xor | NEqv | Eqv
 
+logicalevaltype = EvalShortCircuit | EvalEager | EvalUnspecified
+
 cmpop = Eq | NotEq | Lt | LtE | Gt | GtE
 
 integerboz = Binary | Hex | Octal

Here the Python and C frontends will use EvalShortCircuit, the Fortran frontend will use EvalUnspecified. The EvalEager is an evaluation strategy that we currently use (we evaluate all arguments first, then apply the boolean operators), and our optimizer or the frontend can choose this if needed.

All three are different. EvalShortCircuit and EvalEager all preserve the order of arguments (the first one only evaluates as neeed, the second always evaluates all, in order), while EvalUnspecified allows any order and it might or might not evaluate arguments.

In terms of what rewriting (optimization) one can do on an expression, all three cases above I think have different semantics. It seems that EvalUnspecified can be freely converted to either EvalEager or EvalShortCircuit and it will still work semantically. But EvalEager and EvalShortCircuit can't be converted to any of the other two options, as it will change the semantics. Here is why:

  • This very issue is an example of where EvalShortCircuit should have been set, but EvalEager was used instead.
  • The EvalEager can't be in general changed to EvalShortCircuit since there could have been side effects from the full eager evaluation of all arguments, that now will be skipped, so the program can change meaning.

Finally, our optimizer can analyze an expression, and if all functions/operands are side-effects-free, then I think EvalEager can be converted to EvalShortCircuit (order specific) or EvalUnspecified (order independent).

If operands are side-effects-free and they do not depend on each other (so NOT this issue), then I think EvalShortCircuit can be converted to EvalEager or EvalUnspecified.

@Shaikh-Ubaid
Copy link
Collaborator

This issue still exists. An MRE that fails:

% cat examples/expr2.py 
from lpython import i32

a: i32 = 10
def hi() -> bool:
    global a
    a = a + 1
    return True

def main0():
    if False and hi():
        pass

    print(a)
    assert a == 10

if __name__ == "__main__":
   main0()
% python examples/expr2.py 
10
% lpython examples/expr2.py
11
AssertionError

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants