Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roll apply nonfloat dtypes #11620

Conversation

sandhujasmine
Copy link

Fixed issue #4964 so rolling_apply works on non-numeric columns as well.

I followed instructions here but I'm not sure where to add documentation in the release notes.

Also would like to get feedback on the code and complete any code changes before adding to documentation.

Jasmine Sandhu added 3 commits November 16, 2015 15:03
In fixing GH4964, added an array_to_roll argument to roll_generic().
This will be used by rolling_apply() to work with non-float dtypes.
array_to_roll defaults to None in which case, this function rolls over
input array as it did previously.
Added coercion bool in rolling_apply and _process_data_structure()
It defaults to True in which case the _process_data_structure() converts
arg to float and things work as they did before this change (backwards
compatible). If user wishes to use rolling_apply() with string array,
then set coercion=False.

Default of coercion=True prioritizes performance.
@jreback
Copy link
Contributor

jreback commented Nov 17, 2015

this would be after #11603

but you don't actually want to modify roll_generic, rather you want to have different functions for different dtypes based on there input. So would need to modify the templates to generate object ones (which is prob useful), otherwise coerce to float.

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions labels Nov 17, 2015
@sandhujasmine
Copy link
Author

Are the #11603 changes already in master?

I thought about creating a different rolling_* function for each dtype but the logic would be almost identical to rolling_generic() with changes in the dtype of input and a couple of dtype changes in the function.

It seemed cleaner (and DRY) to always give it a float as 'input' and give it another numpy array called array_to_roll when we have a non-float type we wish to roll over. Are you concerned about performance? I was unsure if coercion should be exposed to user and if it should default to True, but optimized for speed - anyhow, moot points if that's not the route we wish to go.

But curious if you prefer different but almost identical functions to roll_generic for each non float dtype. I can certainly give this a go in another branch, and we can see how similar the functions are then discuss further.

@jreback
Copy link
Contributor

jreback commented Nov 18, 2015

@sandhujasmine we use template generation to handle the differnt dtypes, see src/generate_code.pyx. you would end up with roll_generic_float64 and roll_generic_object. These refer to the input types (you can also use a fused type for this, look for the numeric type). but we usually have a separate routine for object as well, as it sometimes needs customization.
further its possible to have a matrixed set of input/output types (but I don't really like this), and would instead assert that the input type produces the same output type (and if it doesn't then it would raise and we would just use the roll_generic_object anyhow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants