Is numeric na #259

kmdalton · 2024-06-17T21:55:12Z

This PR speeds up dataset creation for large arrays. In working on a new stream file parser, I realized that converting numpy float32 and int32 to MTZDtypes could be very time-consuming for large arrays. After some digging, I found out the culprit was a cython function provided by pandas, pandas._libs.missing.is_numeric_na which makes a mask for missing values in rs

In the case that the input is an int32 or float32 numpy array, this is wholly unnecessary, and it is much faster to use np.isnan to accomplish the same task. This PR just wraps is_numeric_na and adds some control flow to accomplish that. It falls back to the Cython version whenever the input is not an int32 or float32 ndarray. This is a very conservative choice, and more circumstances could probably be included in the control flow down the line.

kmdalton · 2024-06-17T21:55:45Z

This should wait to merge until after #258

JBGreisman

Now that #258 is merged in, I think that some of these changes are unnecessary in this PR. Do you mind updating/rebasing so that the relevant changes aren't overwritten here?

for more information, see https://pre-commit.ci

kmdalton · 2024-06-18T14:54:20Z

Okay, I cleaned up this PR. It should be good now. I will merge after the CI runs.

kmdalton requested a review from JBGreisman June 17, 2024 21:55

JBGreisman reviewed Jun 18, 2024

View reviewed changes

speed up mask creation for numeric dtypes

78d33e5

kmdalton force-pushed the is_numeric_na branch from 3112efc to 78d33e5 Compare June 18, 2024 14:52

[pre-commit.ci] auto fixes from pre-commit.com hooks

01acd26

for more information, see https://pre-commit.ci

kmdalton merged commit 6c9ab8e into main Jun 18, 2024
5 checks passed

kmdalton deleted the is_numeric_na branch June 18, 2024 15:04

kmdalton mentioned this pull request Jun 18, 2024

add manual trigger for build #257

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is numeric na #259

Is numeric na #259

kmdalton commented Jun 17, 2024

kmdalton commented Jun 17, 2024

JBGreisman left a comment

kmdalton commented Jun 18, 2024

Is numeric na #259

Is numeric na #259

Conversation

kmdalton commented Jun 17, 2024

kmdalton commented Jun 17, 2024

JBGreisman left a comment

Choose a reason for hiding this comment

kmdalton commented Jun 18, 2024