Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pickle support #372

Merged
merged 7 commits into from
Dec 19, 2024

Conversation

nada-ben-ali
Copy link
Collaborator

@nada-ben-ali nada-ben-ali commented Dec 9, 2024

I am working on caching the PDX config parsing result to avoid reparsing it each time, using Pickle for serialization.

During the pickling process, no issues were encountered. However, during the unpickling step, the following exception occurred: "maximum recursion depth exceeded" which originated in the NamedItemList class.
The recursion occurred due to infinite calls to the NamedItemList.__getattr__ method. Specifically, the code attempted to access a property that was uninitialized at the time, causing the __getattr__ to be invoked repeatedly during the reconstruction of the object.
To address this, I added the __reduce__ method which explicitly defines how an object is serialized and reconstructed during pickling and unpickling. After adding this method, the recursion issue was resolved.

After fixing the recursion issue, I got the following exception with some PDX files during the unpickling:

image

This issue arises because the NamedItemList assumes all its elements conform to the OdxNamed interface, including having a short_name attribute. While the PDX files were correctly parsed initially, the issue arose because TableRow objects were reconstructed in a state where their short_name attribute was not properly restored or initialized.
Thus, the solution was to add the __reduce__ method for the TableRow class, ensuring its attributes are correctly restored during unpickling.

odxtools: 8.3.4

@nada-ben-ali
Copy link
Collaborator Author

@andlaus I tried your suggestion, which is to copy only the dataclass fields of the objects and then pickle/unpickle the object returned by “copy_db”.
I got an exception saying that “cannot pickle ‘zlib.Decompress’ object”:

image

I therefore excluded .jar files from serialization by adding __getstate__ and __setstate__ methods to the Database class.
Then, I got the same exception regarding the recursion:

image

@nada-ben-ali nada-ben-ali mentioned this pull request Dec 10, 2024
@nada-ben-ali
Copy link
Collaborator Author

@kayoub5 what do you think? Any suggestion?

@kayoub5
Copy link
Collaborator

kayoub5 commented Dec 12, 2024

@kayoub5 what do you think? Any suggestion?

The approach taken by the PR looks valid at first glance.

A unit test will be needed to ensure future versions don't break pickling support.

For the suggestion from @andlaus, pickling should be in theory a faster than re-reading the pdx, and work fine on circular references of normal classes.

  • Trying to get the whole database to work with pickling is not the target here.
  • We are not pickling the whole pdx, just the request/response we are interested in (so unused inherited dops and structures are pruned from serialization)
  • the NamedItemList overriding the builtin list type is causing the side effects on pickling we are trying to fix here.

@nada-ben-ali do you have numbers on how much pickling is faster on a large PDX?

@andlaus
Copy link
Collaborator

andlaus commented Dec 12, 2024

(so unused inherited dops and structures are pruned from serialization)

I guess it's better to overload the __getstate__() and __setstate__() methods in classes that should be pickleable: https://docs.python.org/3/library/pickle.html#object.__getstate__

(as a general approach, a OdxDataObject class could be introduced which could handle serialization of dataclass fields. then, it must just be ensured that all dataclasses that correspond to ODX XML tags are derived from that class.)

@kayoub5
Copy link
Collaborator

kayoub5 commented Dec 12, 2024

(so unused inherited dops and structures are pruned from serialization)

I guess it's better to overload the __getstate__() and __setstate__() methods in classes that should be pickleable: https://docs.python.org/3/library/pickle.html#object.__getstate__

(as a general approach, a OdxDataObject class could be introduced which could handle serialization of dataclass fields. then, it must just be ensured that all dataclasses that correspond to ODX XML tags are derived from that class.)

we tried that, doesn't work on a subclass of the builtin type list.

dataclasses are in general pickable, tablerow was the exception that we had to fix.

@andlaus
Copy link
Collaborator

andlaus commented Dec 12, 2024

dataclasses are in general pickable

sure, but you probably only want to serialize the fields of the dataclass, not also the resolved references, etc?!

@kayoub5
Copy link
Collaborator

kayoub5 commented Dec 12, 2024

dataclasses are in general pickable

sure, but you probably only want to serialize the fields of the dataclass, not also the resolved references, etc?!

actually, we do, saves time on computing them again

@nada-ben-ali
Copy link
Collaborator Author

nada-ben-ali commented Dec 13, 2024

@nada-ben-ali do you have numbers on how much pickling is faster on a large PDX?

@kayoub5 yes, pickling has saved us a lot of time. For example, parsing a PDX file without pickling took 14 sec 296 ms, while using the pickling reduced it to just 846 ms. Similarly, with another PDX file, the time dropped from 2 sec 791 ms to 406 ms.

A unit test will be needed to ensure future versions don't break pickling support.

I agree, a unit test is crucial to guarantee that pickling support remains intact in future versions. I'll add one.

I guess it's better to overload the getstate() and setstate() methods in classes that should be pickleable: https://docs.python.org/3/library/pickle.html#object.__getstate__

@andlaus we tried that, but for list subclasses, the __setstate__() method is not invoked.

Copy link
Collaborator

@andlaus andlaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this...

odxtools/nameditemlist.py Show resolved Hide resolved
odxtools/nameditemlist.py Outdated Show resolved Hide resolved
odxtools/tablerow.py Outdated Show resolved Hide resolved
@nada-ben-ali nada-ben-ali requested a review from andlaus December 15, 2024 05:01
tests/test_odxtools.py Outdated Show resolved Hide resolved
@nada-ben-ali
Copy link
Collaborator Author

@andlaus could you please review the changes after the updates I've made and if you approve them, then we can merge them?

Copy link
Collaborator

@andlaus andlaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nada-ben-ali: press the green button once you're happy with it...

odxtools/tablerow.py Show resolved Hide resolved
@nada-ben-ali nada-ben-ali merged commit 03a096f into mercedes-benz:main Dec 19, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants