Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Casting String lit to Categorical #18874

Closed
2 tasks done
josh opened this issue Sep 23, 2024 · 2 comments · Fixed by #18893
Closed
2 tasks done

Casting String lit to Categorical #18874

josh opened this issue Sep 23, 2024 · 2 comments · Fixed by #18893
Assignees
Labels
A-panic Area: code that results in panic exceptions accepted Ready for implementation bug Something isn't working P-low Priority: low python Related to Python Polars

Comments

@josh
Copy link
Contributor

josh commented Sep 23, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

# crashes
df = pl.DataFrame(
    {"a": [1, 2, 3]},
).with_columns(b=pl.lit("foo").cast(pl.Categorical))
print(df)

# workaround, works okay
df = (
    pl.DataFrame(
        {"a": [1, 2, 3]},
    )
    .with_columns(b=pl.lit("foo"))
    .with_columns(b=pl.col("b").cast(pl.Categorical))
)
print(df)

Log output

thread '<unnamed>' panicked at crates/polars-core/src/scalar/mod.rs:46:92:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("unexpected value while building Series of type String; found value of type Categorical(None, Physical): \"foo\""))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "/workspaces/test/test.py", line 25, in <module>
    print(df)
  File "/home/codespace/.python/current/lib/python3.12/site-packages/polars/dataframe/frame.py", line 1171, in __str__
    return self._df.as_str()
           ^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("unexpected value while building Series of type String; found value of type Categorical(None, Physical): \"foo\""))

Issue description

Immediately casting a String pl.lit to categorical crashes. It seems like you can work around it by creating the column first, then casting as a separate step.

Crashes on 1.8.1, 1.8.0, okay on 1.7.1, 1.6.0.

Expected behavior

# shape: (3, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ i64 ┆ cat │
# ╞═════╪═════╡
# │ 1   ┆ foo │
# │ 2   ┆ foo │
# │ 3   ┆ foo │
# └─────┴─────┘

Installed versions

--------Version info---------
Polars:              1.8.1
Index type:          UInt32
Platform:            Linux-6.5.0-1025-azure-x86_64-with-glibc2.31
Python:              3.12.1 (main, Aug 20 2024, 19:28:58) [GCC 9.4.0]

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2024.9.0
gevent               <not installed>
great_tables         <not installed>
matplotlib           3.9.2
nest_asyncio         1.6.0
numpy                2.1.1
openpyxl             <not installed>
pandas               2.2.2
pyarrow              <not installed>
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                2.4.0+cpu
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@josh josh added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Sep 23, 2024
@deanm0000
Copy link
Collaborator

Instead of calling cast on the lit, you can just do

df = pl.DataFrame(
    {"a": [1, 2, 3]},
).with_columns(b=pl.lit("foo",pl.Categorical))

It's still a bug because there's a panic but even if/when the bug is fixed, that'd be the better way.

@deanm0000 deanm0000 added P-low Priority: low A-panic Area: code that results in panic exceptions and removed needs triage Awaiting prioritization by a maintainer labels Sep 23, 2024
@coastalwhite
Copy link
Collaborator

Pretty sure this is regression caused by #18664.

@coastalwhite coastalwhite self-assigned this Sep 24, 2024
coastalwhite added a commit to coastalwhite/polars that referenced this issue Sep 24, 2024
@c-peters c-peters added the accepted Ready for implementation label Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-panic Area: code that results in panic exceptions accepted Ready for implementation bug Something isn't working P-low Priority: low python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants