Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: constructing string Series #36317

Merged
merged 2 commits into from
Sep 13, 2020

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Sep 12, 2020

Avoid needless call to lib.infer_dtype, when string dtype.

Performance example:

>>> x = np.array([str(u) for u in range(1_000_000)], dtype=object)
>>> %timeit pd.Series(x, dtype=str)
344 ms ± 59.7 ms per loop  # v1.1.0
157 ms ± 7.04 ms per loop  # after #35519
22.6 ms ± 191 µs per loop  # after #36304
11.2 ms ± 48.6 µs per loop  # after this PR

Similar speed-up is possible for pd.Series(x, dtype="string"), but requires some refactorng of StringArray, so I'll do that is a seperate PR.

xref #35519 & #36304.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great (can u add all of these issues on the whats new, same entry just keep appending)

@topper-123
Copy link
Contributor Author

Updated.

@jreback jreback added Performance Memory or execution speed performance Strings String extension data type and string data labels Sep 13, 2020
@jreback jreback added this to the 1.2 milestone Sep 13, 2020
@jreback jreback merged commit 2d95908 into pandas-dev:master Sep 13, 2020
@jreback
Copy link
Contributor

jreback commented Sep 13, 2020

thanks @topper-123

@topper-123 topper-123 deleted the sanitize_array_str_perf branch September 13, 2020 13:25
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants