Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transpose operator inevitably triggers lazy dtypes metadata computation. #3725

Closed
dchigarev opened this issue Nov 22, 2021 · 1 comment · Fixed by #3748
Closed

Transpose operator inevitably triggers lazy dtypes metadata computation. #3725

dchigarev opened this issue Nov 22, 2021 · 1 comment · Fixed by #3748
Assignees
Labels
Backport 🔙 Issues that need to be backported to previous release(s) Performance 🚀 Performance related issues and pull requests.

Comments

@dchigarev
Copy link
Collaborator

Doing df.T triggers dtypes property access and so its computation, even if it has been delayed by setting modin_frame._dtypes = None:

new_dtypes = pandas.Series(
np.full(len(self.index), find_common_type(self.dtypes.values)),
index=self.index,
)

Data types computation may take up to several seconds for wide frames, we should avoid it when possible.

BTW df.T is a part of the reduction operations flow (columnarization at Series constructor), so it's generally slowing down every reduction method.

@dchigarev dchigarev added the Performance 🚀 Performance related issues and pull requests. label Nov 22, 2021
@dchigarev dchigarev self-assigned this Nov 22, 2021
@devin-petersohn
Copy link
Collaborator

devin-petersohn commented Nov 30, 2021

@dchigarev this should be a couple of lines of code to fix, and should improve performance significantly.

dchigarev added a commit to dchigarev/modin that referenced this issue Nov 30, 2021
…n transpose

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
devin-petersohn pushed a commit that referenced this issue Dec 3, 2021
…3748)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
@devin-petersohn devin-petersohn added the Backport 🔙 Issues that need to be backported to previous release(s) label Dec 6, 2021
devin-petersohn pushed a commit that referenced this issue Dec 16, 2021
…3748)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backport 🔙 Issues that need to be backported to previous release(s) Performance 🚀 Performance related issues and pull requests.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants