Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT-#7249: Add reload_modin feature #7280

Merged
merged 3 commits into from
Jun 15, 2024

Conversation

YarShev
Copy link
Collaborator

@YarShev YarShev commented May 17, 2024

What do these changes do?

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves how to take down ray and put up again in local mode #7249
  • tests passing
  • module layout described at docs/development/architecture.rst is up-to-date

Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
modin/utils.py Outdated Show resolved Hide resolved
if an execution engine has been shut down and
is going to be started up once again.
"""
modules = sys.modules.copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a copy here?

Copy link
Collaborator Author

@YarShev YarShev May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got an error without a copy.

RuntimeError: dictionary keys changed during iteration

modules = sys.modules.copy()
for name, module in modules.items():
if name.startswith("modin"):
importlib.reload(module)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it enough to just do importlib.reload(modin)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not enough since we have to re-import all previously imported Modin modules.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering different comments from https://stackoverflow.com/questions/28101895/reloading-packages-and-their-submodules-recursively-in-python, this approach is error prone if you restart the modules not in the correct order.
For example, this comment:
consider there are two submodules A and B. the objects in A depends on B. Suppose A is reloaded first, A would import the old B (since B has not reloaded). It results in a not fully updated A.

So the main question here is, can we rely on the order of modules in sys.modules?

More reliable would be the engine reinitialization function, which also resets the entire cache that relates to the engine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the main question here is, can we rely on the order of modules in sys.modules?

Judging by print(sys.modules) the modules are in the order they imported. I think we should reload the modules exactly in this order.

More reliable would be the engine reinitialization function, which also resets the entire cache that relates to the engine.

IIUC, in that case we will have to manually track each single file related to an engine, what is cumbersome. Also, we may have our own cache(s), which have to be reloaded too.

Having said this, I think we can go with the current approach. If you still think the approach is not reliable, we can reject the changes. Of course, if there are no other ideas ;)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, in that case we will have to manually track each single file related to an engine, what is cumbersome. Also, we may have our own cache(s), which have to be reloaded too.

I'm considering an observer option, roughly as done here (instead of reimporting modules). When a situation is detected where ray is restarting, it goes through all known places and updates the cache.

Yes, it is cumbersome, but at least we will know and understand what to expect. When reloading all modules, it is very difficult to predict what may happen.

On the other hand, I don’t want to be a stopper, so I’ll approve, and then the decision is up to you.

@YarShev YarShev merged commit 2d46ab3 into modin-project:main Jun 15, 2024
36 of 37 checks passed
@Liquidmasl
Copy link

This seams to break methods that use the @multimethod decorator which chooses the method based on dataframe type.

ultimethod.DispatchError: ('update_columns: 0 methods found', (<class 'ModinRayPointqloud'>, <class 'modin.pandas.dataframe.DataFrame'>), set())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

how to take down ray and put up again in local mode
3 participants