Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistency with .is_in when filtering string digits #11394

Closed
2 tasks done
igormintz opened this issue Sep 28, 2023 · 2 comments · Fixed by #11427
Closed
2 tasks done

inconsistency with .is_in when filtering string digits #11394

igormintz opened this issue Sep 28, 2023 · 2 comments · Fixed by #11427
Assignees
Labels
accepted Ready for implementation bug Something isn't working python Related to Python Polars

Comments

@igormintz
Copy link

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

df = pl.DataFrame({'a': ["1","2"], 'b': [1,2]})
print(df.filter(pl.col('a').is_in([1, 2])))
print(df.filter(pl.col('a').is_in(["1", "2"])))
print(df.filter(pl.col('a').is_in(["1", 2])))

Log output

No response

Issue description

the digits in column "a" are strings. not sure that the filtering should work when using ints.

Expected behavior

the digits in column "a" are strings. not sure that the filtering should work when using ints.

Installed versions

--------Version info---------
Polars:              0.19.3
Index type:          UInt32
Platform:            macOS-13.5-arm64-arm-64bit
Python:              3.10.5 (main, Nov 29 2022, 16:45:28) [Clang 14.0.0 (clang-1400.0.29.102)]

----Optional dependencies----
adbc_driver_sqlite:  <not installed>
cloudpickle:         2.2.1
connectorx:          <not installed>
deltalake:           <not installed>
fsspec:              2023.6.0
gevent:              <not installed>
matplotlib:          3.8.0
numpy:               1.26.0
pandas:              2.1.1
pyarrow:             13.0.0
pydantic:            2.4.1
sqlalchemy:          <not installed>
xlsx2csv:            <not installed>
xlsxwriter:          3.1.5
@igormintz igormintz added bug Something isn't working python Related to Python Polars labels Sep 28, 2023
@igormintz igormintz changed the title incinsistency with .is_in when filtering string digits inconsistency with .is_in when filtering string digits Sep 28, 2023
@cmdlineluser
Copy link
Contributor

Seems like it may be specific to the LHS being a string.

df.filter(pl.col('b').is_in(['1', '2']))
# ComputeError: cannot compare Int64 to Utf8 type in 'is_in' operation
pl.select(pl.lit(1).is_in(["1", "2"]))
# ComputeError: cannot compare Int32 to Utf8 type in 'is_in' operation
pl.select(pl.lit("1").is_in([1, 2]))
# shape: (1, 1)
# ┌─────────┐
# │ literal │
# │ ---     │
# │ bool    │
# ╞═════════╡
# │ true    │
# └─────────┘

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Sep 29, 2023

I can see where it's happening - it does actually impact some additional dtype pairs, not just string (though lhs string is where you're most likely to hit it. Having a think about how best to handle it as we should be stricter up-front here 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation bug Something isn't working python Related to Python Polars
Projects
Archived in project
3 participants