Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

depr(python): Rename first qcut parameter to quantiles #10253

Merged
merged 1 commit into from
Aug 2, 2023

Conversation

stinodego
Copy link
Member

These were recently renamed to q, but that does not match our naming policy. Fortunately, it's a simple fix.

@github-actions github-actions bot added deprecation Add a deprecation warning to outdated functionality python Related to Python Polars labels Aug 2, 2023
@stinodego stinodego marked this pull request as draft August 2, 2023 13:37
@stinodego stinodego marked this pull request as ready for review August 2, 2023 13:37
@ritchie46 ritchie46 merged commit 9034735 into main Aug 2, 2023
13 checks passed
@ritchie46 ritchie46 deleted the qcut-param-rename branch August 2, 2023 13:59
@magarick
Copy link
Contributor

magarick commented Aug 4, 2023

I renamed them because it's a standard name for a parameter that does double duty. The problem is that they are emphatically NOT quantiles. I've had similar discussions about this incorrect naming before.
They were called probs before because if you pass in a list of probabilities that's what they are. However, it's standard in qcut to be able to pass in a number of quantiles you want as well. In this case, the parameter is also not quantiles. If you don't want to give it the standard name of q what would you prefer to call it that actually describes what it does?

To be clear, I'm not trying to give you a hard time and I only noticed this because it created a conflict with another change. My issue isn't with not wanting to use single letter names (which I think is the policy you're referring to?) but with using inaccurate names. While I value terseness, correctness and descriptiveness come first.

EDIT: I thought I was going crazy so I just had to check. I explicitly mentioned the name change in the PR because even though it's standard it is short so I wanted to fish for something better. That part was accepted as-is!

@stinodego
Copy link
Member Author

I didn't go through the commit history here - I just ran into this when updating the deprecation utils. I'm not trying to rock your boat.

Indeed, it's harder to adequately name parameters that do 'double duty'. In this case, I think quantiles is OK as it's the parameter that defines what quantiles are used, in both cases. q is definitely not fine. You're free to suggest a better name though.

@magarick
Copy link
Contributor

magarick commented Aug 4, 2023

I dunno, probs_or_n? cuts? I can change it in the label PR. Probably no one's using this by name anyway though 🤷 quantiles is no good because it refers to the output of a quantile function. Calling the input to the quantile function quantiles is like saying the values and indices of an array are the same. Maybe it's my statistics background in this particular case, but calling it that would just reek of a willful disregard for clarity.
You guys really categorically hate single letter names even if they're a standard huh?

@lorentzenchr
Copy link
Contributor

@magarick has a good point. This parameter really specifies the probability level of the (whished for) quantile. The base R function quantile calls it probs.

Unfortunately, there is no common naming:

@magarick
Copy link
Contributor

@stinodego Why don't you just change it to breaks so at least that way it's consistent with cut.
What gets me isn't trying to have a policy, or even wanting to be strict, but applying it inconsistently and having a double standard. There are many outward-facing polars functions with single-letter variable names, yet this is the one you pick out despite it having been explained and approved. You can see why this comes off as trying to give me a hard time. Do you not want me to contribute anymore?

@stinodego
Copy link
Member Author

stinodego commented Aug 11, 2023

The problem is one parameter doing two things. That makes it hard to name.

For sample, we have fraction and n as separate parameters. Maybe we should have something like that for qcut as well?

I agree with you that the current naming is not optimal. I just haven't heard a good alternative yet.

And this has nothing to do with a 'double standard' or anything. It's a big project, we can't fix everything at the same time. Feel free to send a PR to update other parts of the API to be more in line with our naming standards.

@magarick
Copy link
Contributor

The problem is one parameter doing two things. That makes it hard to name.

Funny thing, I agree with you here! I hate the fact that it's one parameter doing two things! But having it in one function was pretty common, so even though there's qcut_uniform on the Rust side, I kept it in one function for Python (also I hated the idea of copypasting a bunch of comments). Similarly, that's where the variable name came from. Consistency is nice and all but sometimes multiple consistencies conflict.

For sample, we have fraction and n as separate parameters. Maybe we should have something like that for qcut as well?

I think that would be a little awkward. In general, I'd prefer separate functions if it's doing something different. Here I see it as a grey area since it's close but not identical.

I agree with you that the current naming is not optimal. I just haven't heard a good alternative yet.

And this has nothing to do with a 'double standard' or anything. It's a big project, we can't fix everything at the same time. Feel free to send a PR to update other parts of the API to be more in line with our naming standards.

In that case, I really don't understand the aversion to single character names. Clarity is what's most important and what's clear is a function of context, prior experiences, and expectations. In general, I agree they're a bad idea but it's situational. Here it comes down to using something that makes sense and what people are likely to expect. I honestly don't think the other single variable names are inappropriate because they make sense in context and longer names would, if anything, hurt clarity.

As for qcut, as long as you're ok with somewhat breaking with expectations, I'd actually go with making it two functions, since that's far more clear than a variable name could be (though you'll probably not like that I think n is a perfectly appropriate parameter for qcut_uniform :-)).

@lorentzenchr
Copy link
Contributor

@magarick While I appreciate your thoughts, I‘m a bit surprised by your - sometimes - harsh language.

Meanwhile, the discussion is in the new issue, this one is closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprecation Add a deprecation warning to outdated functionality python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants