Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modin performance enhancements #4315

Closed
prutskov opened this issue Mar 11, 2022 · 1 comment
Closed

Modin performance enhancements #4315

prutskov opened this issue Mar 11, 2022 · 1 comment
Labels
Epic P2 Minor bugs or low-priority feature requests Performance 🚀 Performance related issues and pull requests.

Comments

@prutskov
Copy link
Contributor

prutskov commented Mar 11, 2022

The table below contains a set of performance gaps, which take the most time part of execution in Modin performance tests.

Functionality section Assignees Issue Comment Priority Status
merge operation - #4293 left join type - Not assigned
- #3588 inner join type - Not assigned
- #656 right/outer join type, defaults to pandas for now - Not assigned
groupby aggregation operations @prutskov #4288 groupby.mean reproducer is presented as example of aggregation operation. Applicable for flow DataFrameGroupBy._wrap_aggregation->qc.groupby_agg -
- #3585 groupby.mean duplication duplication
- #1901 groupby.mean duplication duplication
getitem operations @dchigarev #4268 __getitem__ operation, _get_dict_of_block_index() is hotspot -
@dchigarev #1903 iloc, _get_dict_of_block_index() is hotspot duplication duplication
binary operations @prutskov #4182 operands are Modin DataFrame/Series -
@prutskov #3100 operands are Modin DataFrame/Series, a lot of details (need to verify after #4391) - WIP
drop operation @dchigarev, @prutskov #3844 3 bottlenecks are in API, QC layers and _get_dict_of_block_index() -
sort_values operation @RehanSD #3535 defaults to pandas for now - @RehanSD
read_parquet operation @pyrito #4305 Only parallelization between columns/column-partitions is supported for now -
constructing DataFrame from dict with Modin Series values - #4263 flow is from_non_pandas - Not assigned
- #1572 According with the flow description this is duplication duplication duplication
@prutskov prutskov added Performance 🚀 Performance related issues and pull requests. Epic labels Mar 11, 2022
@vnlitvinov vnlitvinov added the P2 Minor bugs or low-priority feature requests label Aug 29, 2022
@YarShev
Copy link
Collaborator

YarShev commented Jan 11, 2024

Closing this issue as outdated. We will track the unresolved issues separately.

@YarShev YarShev closed this as completed Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Epic P2 Minor bugs or low-priority feature requests Performance 🚀 Performance related issues and pull requests.
Projects
None yet
Development

No branches or pull requests

3 participants