You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #39530 there was migration to v2 facets of OpenLineage client done. v2 introduces quite a lot of improvements but seems like it also brought some unseen result when changing from @attr.s to @attrs.define -> default value of slots were changed from False to True.
Given PR changes parent classes of e.g. AirflowJobFacet to v2 version but it still has slots set to False. This leads into unwanted behaviour when pickling instances of the class. Given the example of AirflowJobFacet__slots__ contain only _deleted (coming from parent class), therefore pickling fails on the attributes of the child class.
Below example illustrates it well:
In [1]: importpickle, attrsIn [2]: @attrs.define(slots=False)
...: classA():
...: a: str
...:
In [3]: @attrs.define(slots=True)
...: classB():
...: b: str
...:
In [4]: @attrs.define(slots=False)
...: classC(A):
...: c: str
...:
In [5]: @attrs.define(slots=False)
...: classD(A):
...: d: str
...:
In [6]: @attrs.define(slots=True)
...: classE(B):
...: e: str
...:
In [7]: @attrs.define(slots=False)
...: classF(B):
...: f: str
...:
In [8]: deftest(klazz):
...: try:
...: instance=pickle.loads(pickle.dumps(klazz(**{a.name: a.nameforainattrs.fields(klazz)})))
...: forfieldinattrs.fields(klazz):
...: getattr(instance, field.name)
...: exceptAttributeError:
...: print(f"{klazz} failed to unpickle")
...:
In [9]: test(A), test(B), test(C), test(D), test(E), test(F)
<class'__main__.F'>failedtounpickle
This wasn't caught with unit tests as it is revealed only when using ProcessPoolExecutor from within OpenLineageListener. When dealing with objects between processes Python pickles them.
What you think should happen instead
For two reasons:
not to migrate to another set of facets in OL client that change slots from True to False
keeping slots in case of facets does not seem to have huge impact on performance
I suggest we simply change slots argument to True for all facets used in dag run state listener hooks.
How to reproduce
Run breeze with --integration openlineage and OL provider installed from wheel. Run example DAG and check scheduler logs for error indicating pickling failure.
Apache Airflow Provider(s)
openlineage
Versions of Apache Airflow Providers
No response
Apache Airflow version
main branch
Operating System
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" VERSION_ID="12" VERSION="12 (bookworm)" VERSION_CODENAME=bookworm ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/"
Deployment
Other
Deployment details
No response
What happened
In #39530 there was migration to v2 facets of OpenLineage client done. v2 introduces quite a lot of improvements but seems like it also brought some unseen result when changing from
@attr.s
to@attrs.define
-> default value ofslots
were changed fromFalse
toTrue
.Given PR changes parent classes of e.g.
AirflowJobFacet
to v2 version but it still has slots set toFalse
. This leads into unwanted behaviour when pickling instances of the class. Given the example ofAirflowJobFacet
__slots__
contain only_deleted
(coming from parent class), therefore pickling fails on the attributes of the child class.Below example illustrates it well:
This wasn't caught with unit tests as it is revealed only when using
ProcessPoolExecutor
from withinOpenLineageListener
. When dealing with objects between processes Python pickles them.What you think should happen instead
For two reasons:
I suggest we simply change
slots
argument toTrue
for all facets used in dag run state listener hooks.How to reproduce
Run breeze with
--integration openlineage
and OL provider installed from wheel. Run example DAG and check scheduler logs for error indicating pickling failure.Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: