Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix load_session() and restrict loading a session in a different module #507

Merged
merged 14 commits into from
Jul 10, 2022

Conversation

leogama
Copy link
Contributor

@leogama leogama commented Jun 10, 2022

Resolves #498. Ongoing work, I'm adding a fix/feature per commit. The plan:

Operations allowed by load_session():

  • Save an importable module and restore to the same module
  • Save a module created at runtime (with ModuleType()) and restore to another runtime module with the same name
  • Save an importable module and restore to a runtime created module with the same name (allow unpickling multiple sessions, for comparing versions of objects for example).

Operations disallowed by load_session():

  • Save any module and restore it to a module with a different name

Other changes:

  • The main parameter is now also optional for simply loading modules other than __main__
  • The main parameter accepts a runtime module, or a name for load_session() to create it with ModuleType(main)
  • If the main module is other than __main__, it is returned:
module = dill.load_session('module_session.pkl')  # don't need to "import module"

@leogama leogama changed the title Fix load_session() and restrict loading a session in a different module (fix #498) Fix load_session() and restrict loading a session in a different module Jun 10, 2022
@mmckerns mmckerns added this to the dill-0.3.6 milestone Jun 10, 2022
@leogama
Copy link
Contributor Author

leogama commented Jun 13, 2022

@mmckerns, I'm progressing with this issue, but I can't find a way to allow a user to restore a common session into a module created with ModuleType() without also restoring it to the original module or without breaking the API...

@leogama leogama marked this pull request as ready for review June 13, 2022 01:06
@mmckerns
Copy link
Member

I can't find a way to allow a user to restore a common session into a module created with ModuleType() without also restoring it to the original module or without breaking the API.

That's ok. It should be a pretty small corner case. Some options are: (1) don't allow it, (2) allow it but also restore the original module where (a) the original module is imported then removed, (b) the original module is imported with an alias... or (c) something else. I'd look at the importlib for some guidance on the behavior people might expect, and how that behavior is implemented.

@mmckerns mmckerns self-requested a review June 13, 2022 09:31
@leogama
Copy link
Contributor Author

leogama commented Jun 14, 2022

I can't find a way to allow a user to restore a common session into a module created with ModuleType() without also restoring it to the original module or without breaking the API.

That's ok. It should be a pretty small corner case. Some options are: (1) don't allow it, (2) allow it but also restore the original module where (a) the original module is imported then removed, (b) the original module is imported with an alias... or (c) something else. I'd look at the importlib for some guidance on the behavior people might expect, and how that behavior is implemented.

What I ended doing is to create a separate function load_session_copy() that loads the original module, makes a backup copy of its state, calls load_session(), copies the loaded dictionary to a new module and restores the original module.

The problem with the options 2.a and 2.b that you suggested is that the CPython 3.7 import system seems to be somewhat broken. As I mentioned in this comment (#463 (comment)) and a couple ones before it, the sequence of commands import moduledel sys.modules['module']import module doesn't always put the module back into sys.modules (and there's also the problem of submodules imported indirectly).

The importilb docs didn't give me much light, but I used it as reference to clear out the attributes __loader__, __spec__ and __path__ in the created runtime module.

@leogama
Copy link
Contributor Author

leogama commented Jun 28, 2022

I've found at least one project that would benefit from this PR: https://github.com/SAME-Project/same-project/blob/main/sameproject/ops/execution.py (introduced a week ago by SAME-Project/same-project#176)

dill/_dill.py Outdated Show resolved Hide resolved
dill/_dill.py Outdated Show resolved Hide resolved
dill/_dill.py Outdated Show resolved Hide resolved
Copy link
Member

@mmckerns mmckerns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the review comments.

@mmckerns mmckerns added the bugfix label Jul 4, 2022
@leogama
Copy link
Contributor Author

leogama commented Jul 5, 2022

I did various changes in the commit "adjustments":

  • Wrote documentation for the public functions and added some Sphinx options to docs/source/conf.py that shouldn't affect already existing documentation (at least I didn't notice any changes using the RTD theme).
  • Renamed private function names and load_session_copy() to load_vars(), which now returns only the module's dictionary, as it shouldn't be used as a restored module because of the problems we discussed earlier. See if you like it this way or if you still prefer it merged with load_session(). I added an option to use the current state of the module being restored as a base namespace, before loading objects from the file.
  • Converted the old _open_pickable function into a wrapper class _PeekableReader, which I think makes the code purpose more transparent.

dill/_dill.py Outdated Show resolved Hide resolved
dill/_dill.py Outdated Show resolved Hide resolved
dill/_dill.py Outdated Show resolved Hide resolved
dill/_dill.py Outdated Show resolved Hide resolved
Copy link
Member

@mmckerns mmckerns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code really LGTM. However, before we merge... let's iterate once or twice more purely on the names of the functions I've identified.

@leogama
Copy link
Contributor Author

leogama commented Jul 8, 2022

I changed the modmap object to use types.SimpleNamespace instead of that AttrDict class. But I still think that a custom dict subclass for settings['module'] would be very handful. It would allow:

  • access of items via attributes
  • lookup default value in parent dictionary (if set)
  • automatic creation of empty parent for submodules settings

The settings API I'm aiming at is somewhat as this:

>>> from dill import FilterRules, settings
>>> mod_settings = settings['module']
>>> mod_settings['imported_byref']  # item access
False
>>> mod_settings.imported_byref = True  # attribute access
>>> mod_settings.__main__ = {'imported_byref': False}  # a module specific session setting
>>> mod_settings.__main__.imported_byref  # instead of mod_settings['__main__']['imported_byref']
False
>>> mod_settings['my.submodule'] = {'rules': FilterRules()}  # create a 'submodule' entry under a 'my' entry
>>> mod_settings.my.submodule.rules.exclude.add('somevar')

@mmckerns
Copy link
Member

mmckerns commented Jul 8, 2022

I see... so you want to (for example) have a settings "tree" as opposed to a flat settings dict. This might take a bit of design discussion. How about we tackle that in a separate PR?

@leogama
Copy link
Contributor Author

leogama commented Jul 8, 2022

I see... so you want to (for example) have a settings "tree" as opposed to a flat settings dict. This might take a bit of design discussion. How about we tackle that in a separate PR?

I'm refining the idea. Indeed, a settings "tree" as I was imagining may have an implementation too complex. I'm thinking now of having just the filter rules for dump_module() as a tree, by subclassing FilterRules (or ExcludeRules) and some mapping type.

However, I'm still thinking of a format for a config file, including the filter rules for specific modules. Maybe we should use configparser (the format and the reader) as a base:

[dill]
byref = true
protocol = 3

[dump_module]
exclude.types = [FunctionType, ModuleType, TypeType]  # or [function, module, type]

[dump_module.__main__]
exclude.regexes = "_.*"

@leogama
Copy link
Contributor Author

leogama commented Jul 9, 2022

Oops! My two last messages were intended to go to #475...
I finished renaming, documenting and fixing a couple things here.

@leogama leogama requested a review from mmckerns July 9, 2022 17:27
Copy link
Member

@mmckerns mmckerns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation could use going over once more to clean it up a bit, but otherwise LGTM. If you feel like you don't want to make another pass at the documentation, I'll pull it and then tweak the docs a bit myself.

@leogama
Copy link
Contributor Author

leogama commented Jul 9, 2022

The documentation could use going over once more to clean it up a bit

I'm sorry? Didn't understand a word. I'm happy by now. Merge it, please, so you can tweak it as you like, and I can merge it with the filter PR and finish that. I'm going to complement the code annotation and the private functions' documentation there if necessary.

Or... Can't you add suggested changes directly here, in a way that I could just accept and include in a new commit? I think I saw a button in the desktop web interface, in the comment box.

@mmckerns mmckerns merged commit 0fa524a into uqfoundation:master Jul 10, 2022
@leogama leogama deleted the fix-load-session branch July 10, 2022 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reloading a session into the same module keeps the previous session
2 participants