-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF-#7299: Avoid using synchronize_labels
for combine
function
#7300
Conversation
…nchronize_labels' for 'combine' function Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
lazy_metadata_decorator
instead of synchronize_labels
for combine
functionsynchronize_labels
for combine
function
if self._deferred_index: | ||
new_index = self.index | ||
if self._deferred_column: | ||
new_columns = self.columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A further improvement could be to get rid of the materialization of indexes in the main process. However this also happens in _propagate_index_objs
function now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great @anmyachev!
I think a good medium term goal would be to have support for a fully lazy index object, adding it as a first class citizen to the query compiler (and adding index.py
to modin/pandas
.
LGTM
@dchigarev, @anmyachev, to what extent does our current ModinIndex perform this task? |
What do these changes do?
Perf gain: ~15% against main branch with Ray 8 cores.
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
synchronize_labels
forcombine
function #7299added andpassingdocs/development/architecture.rst
is up-to-date