Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A sequence of when().then() operations over the same set of rows applies the first when() last #10758

Closed
2 tasks done
rcap107 opened this issue Aug 28, 2023 · 5 comments · Fixed by #10759
Closed
2 tasks done
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars

Comments

@rcap107
Copy link

rcap107 commented Aug 28, 2023

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

df = pl.DataFrame(
    {
        "a": [1, 2, 3],
    }
)
m1 = df.select(pl.col("a") == 1).to_series()
m2 = df.select(pl.col("a") == 1).to_series()

df2=df.with_columns(
  pl.when(m1).then("one").when(m2).then("two").alias("c1"),
  pl.when(m1).then("two").when(m2).then("one").alias("c2")
)
df2

Result:

a    c1    c2
i64    str    str
1    "one"    "two"
2    null    null
3    null    null

Issue description

I noticed that when there is a sequence of when().then() operations one after the other with some overlap between the rows in the condition, the final value is decided by the first when().then() rather than the last.

This seems counterintuitive to me, since I would expect that the result of pl.when(m1).then("one").when(m2).then("two").alias("c1") would be two, since the block with m2 should be applied after the block with m1.

Expected behavior

Result:

a    c1    c2
i64    str    str
1    "two"    "one"
2    null    null
3    null    null

Installed versions

--------Version info---------
Polars:      0.18.4
Index type:  UInt32
Platform:    Linux-4.15.0-112-generic-x86_64-with-glibc2.27
Python:      3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:08:06) [GCC 11.3.0]

----Optional dependencies----
numpy:       1.24.3
pandas:      1.5.3
pyarrow:     10.0.1
connectorx:  <not installed>
deltalake:   <not installed>
fsspec:      2023.5.0
matplotlib:  3.7.1
xlsx2csv:    <not installed>
xlsxwriter:  <not installed>
@rcap107 rcap107 added bug Something isn't working python Related to Python Polars labels Aug 28, 2023
@zundertj
Copy link
Collaborator

Polars when-then are the equivalent of Python if x1: ... elif x2: ..., and is documented:

Chained when, thens should be read as if, elif, … elif, not as if, if, … if, i.e. the first condition that evaluates to True will be picked. Note how in the example above for the second row in the dataframe, where foo=3 and bar=4, the first when evaluates to True, and therefore the second when, which is also True, is not evaluated.

So I would not classify this as a bug, but rather a design choice. I am trying to find previous discussions on this, as I think there were, but cannot find it. Leaving open for that reason.

@zundertj zundertj added enhancement New feature or an improvement of an existing feature and removed bug Something isn't working labels Aug 28, 2023
@cmdlineluser
Copy link
Contributor

#7725 was the last discussion I recall regarding this.

@rcap107
Copy link
Author

rcap107 commented Aug 28, 2023

Huh, I did not notice that part in the documentation, my bad for that. I can understand the design choice, but I still find it counterintuitive 🤔

Perhaps it would be worth making a note on that behavior at the beginning of the docstring? I think that would make it clearer and more noticeable.

Something like this would make it clearer IMO:

Expression similar to an if-else statement in Python. Always initiated by a pl.when().then(). Optionally followed by chaining one or more .when().then() statements. Chained when, thens should be read as if, elif, … elif, not as if, if, … if, i.e. the first condition that evaluates to True will be picked.
If none of the conditions are True, an optional .otherwise() can be appended at the end. If not appended, and none of the conditions are True, None will be returned.

@zundertj
Copy link
Collaborator

zundertj commented Aug 28, 2023

Thank you @cmdlineluser , I even added the current explanation to the examples docstring based on that feedback, not sure why I could not find that.

@rcap107 : I agree it can be put more prominent in the docstring, will move it to the upper section.

@lyngc
Copy link

lyngc commented Aug 28, 2023

It’s the same order as np.select() and makes perfect sense to me as an if elif else

@orlp orlp closed this as not planned Won't fix, can't repro, duplicate, stale Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants