Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR speeds up dataset creation for large arrays. In working on a new stream file parser, I realized that converting numpy float32 and int32 to MTZDtypes could be very time-consuming for large arrays. After some digging, I found out the culprit was a cython function provided by pandas,
pandas._libs.missing.is_numeric_na
which makes a mask for missing values inrs
In the case that the input is an int32 or float32 numpy array, this is wholly unnecessary, and it is much faster to use
np.isnan
to accomplish the same task. This PR just wraps is_numeric_na and adds some control flow to accomplish that. It falls back to the Cython version whenever the input is not an int32 or float32 ndarray. This is a very conservative choice, and more circumstances could probably be included in the control flow down the line.