Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boolean operators in expressions are ignored #667

Closed
timsaucer opened this issue May 7, 2024 · 2 comments · Fixed by #668
Closed

Boolean operators in expressions are ignored #667

timsaucer opened this issue May 7, 2024 · 2 comments · Fixed by #668
Labels
bug Something isn't working

Comments

@timsaucer
Copy link
Contributor

Describe the bug

When attempting to create and expression using operators like and and or, no errors are reported but the resultant operations do not operate as expected. It appears the first expression is evaluated and others are ignored.

To Reproduce
This minimal code will reproduce the behavior:

ctx = SessionContext()

batch = pa.RecordBatch.from_arrays(
    [pa.array([1, 2, 3])],
    names=["a"],
)

df = ctx.create_dataframe([[batch]])

df.with_column("b", col("a") == lit(1) or col("a") == lit(3)).show()
df.with_column("b", col("a") == lit(3) or col("a") == lit(1)).show()

This generates the following results:

DataFrame()
+---+-------+
| a | b     |
+---+-------+
| 1 | true  |
| 2 | false |
| 3 | false |
+---+-------+
DataFrame()
+---+-------+
| a | b     |
+---+-------+
| 1 | false |
| 2 | false |
| 3 | true  |
+---+-------+

Expected behavior
If these types of operations are not supported, an error should be generated. Even better would be to fully support these operations since it will mean a great deal for adoption across the python community.

@timsaucer timsaucer added the bug Something isn't working label May 7, 2024
@Michael-J-Ward
Copy link
Contributor

TLDR: Use the bitwise operators & and |, which get mapped to the magic methods __and__ and __or__.

This is a python quirk see this table.

  • x or y means if x is true, then x, else y
  • x and y means if x is false, then x, else y

Basically, the evaluation mechanics make it impossible for x or y to create the combined expression you're looking for.

a_eq_1 = column("a") == literal(1)
a_eq_3 = column("a") == literal(3)

print("using `or`:",  a_eq_1 or a_eq_3)
print("using `and`:", a_eq_1 and a_eq_3)
print("using `|`:",  a_eq_1 | a_eq_3)
print("using `&`:", a_eq_1 & a_eq_3)
using `or`: Expr(a = Int64(1))
using `and`: Expr(a = Int64(3))
using `|`: Expr(a = Int64(1) OR a = Int64(3))
using `&`: Expr(a = Int64(1) AND a = Int64(3))

@timsaucer
Copy link
Contributor Author

Thank you! I tested and your answer works as expected. I'll put up a PR this morning to expand the documentation so others don't come with the same question. I appreciate the rapid response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants