-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check for preserved index column during combine #1460
Comments
MWE would even be:
with the same result. Maybe a lesser evil (at a cost of performance) would be to do what you propose only if the columns have identical contents. If they would not maybe we should throw an error (this will be in the future - and a warning now). |
@nalimilan I see two approaches here:
If we picked option 2., then at the same time we should add |
By "drop them" I mean that we do not append grouping columns to the result. |
Option 2. is #1555 with a different default. I'm not completely sure which option is best. Behavior 1. sounds the most useful in practice, with the possible drawback that one could overwrite grouping columns accidentally and get incorrect results. We had discussed the option of checking whether columns are equal and throw an error if they aren't (possibly with an argument to allow overwriting even if different). |
Ah - right. We also should keep column order in mind (when we add grouping columns they come first). Given the new split-apply-combine API (not using a Also if we agree on this scenario I would not overwrite the columns if the user wants to keep grouping columns and there are duplicate column names, but keep the current behavior. But I am open to other opinions. |
I'm fine with adding an argument, but I don't really like the current behavior with duplicate columns. A safer approach in the perspective of releasing 1.0 would be to throw an error by default when there are duplicate columns that aren't equal to the grouping columns, so that we can switch to any behavior later if we want (or keep that behavior). One argument in favor of this is that I don't think adding the new columns with names generated by |
OK - would you be willing to make a PR (I guess it is better if you do it, as you know the split-apply-combine internals best). #1555 would need heavy rebasing anyway. And it would be great if |
#1938 implements the discussed solution: stop adding grouping columns with |
When using
by
with a function that returns a dataframe with the grouping columns still present,combine
interprets it as a duplicate column and renames it with a warning. It would be nicer if we could check for whether the grouping column is present and all elements equal to the group values, and if so ignore it.current behavior:
desired behavior:
The text was updated successfully, but these errors were encountered: