-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Pickle support #372
Add Pickle support #372
Conversation
@andlaus I tried your suggestion, which is to copy only the dataclass fields of the objects and then pickle/unpickle the object returned by “copy_db”. I therefore excluded .jar files from serialization by adding __getstate__ and __setstate__ methods to the Database class. |
@kayoub5 what do you think? Any suggestion? |
The approach taken by the PR looks valid at first glance. A unit test will be needed to ensure future versions don't break pickling support. For the suggestion from @andlaus, pickling should be in theory a faster than re-reading the pdx, and work fine on circular references of normal classes.
@nada-ben-ali do you have numbers on how much pickling is faster on a large PDX? |
I guess it's better to overload the (as a general approach, a |
we tried that, doesn't work on a subclass of the builtin type list. dataclasses are in general pickable, tablerow was the exception that we had to fix. |
sure, but you probably only want to serialize the fields of the dataclass, not also the resolved references, etc?! |
actually, we do, saves time on computing them again |
@kayoub5 yes, pickling has saved us a lot of time. For example, parsing a PDX file without pickling took 14 sec 296 ms, while using the pickling reduced it to just 846 ms. Similarly, with another PDX file, the time dropped from 2 sec 791 ms to 406 ms.
I agree, a unit test is crucial to guarantee that pickling support remains intact in future versions. I'll add one.
@andlaus we tried that, but for list subclasses, the __setstate__() method is not invoked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this...
@andlaus could you please review the changes after the updates I've made and if you approve them, then we can merge them? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nada-ben-ali: press the green button once you're happy with it...
I am working on caching the PDX config parsing result to avoid reparsing it each time, using Pickle for serialization.
During the pickling process, no issues were encountered. However, during the unpickling step, the following exception occurred: "maximum recursion depth exceeded" which originated in the NamedItemList class.
The recursion occurred due to infinite calls to the NamedItemList.__getattr__ method. Specifically, the code attempted to access a property that was uninitialized at the time, causing the __getattr__ to be invoked repeatedly during the reconstruction of the object.
To address this, I added the __reduce__ method which explicitly defines how an object is serialized and reconstructed during pickling and unpickling. After adding this method, the recursion issue was resolved.
After fixing the recursion issue, I got the following exception with some PDX files during the unpickling:
This issue arises because the NamedItemList assumes all its elements conform to the OdxNamed interface, including having a short_name attribute. While the PDX files were correctly parsed initially, the issue arose because TableRow objects were reconstructed in a state where their short_name attribute was not properly restored or initialized.
Thus, the solution was to add the __reduce__ method for the TableRow class, ensuring its attributes are correctly restored during unpickling.
odxtools: 8.3.4