-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix integration tests with Python 3.11 #2275
Fix integration tests with Python 3.11 #2275
Conversation
f277cd8
to
b92b718
Compare
integration_tests/CMakeLists.txt
Outdated
@@ -780,3 +780,4 @@ COMPILE(NAME import_order_01 LABELS cpython llvm c) # any | |||
|
|||
# LPython emulation mode | |||
RUN(NAME lpython_emulation_01 LABELS cpython NOMOD) | |||
RUN(NAME lpython_emulation_02 LABELS cpython) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add llvm
test as well. And rename the test, it should not be called emulation --- the emulation test is only for CPython only tests, which we have just one, to show how to hook your own implementations in CPython.
I think this is a "struct" test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless this test doesn't run in LPython yet, in which case keep this test cpython only and let's also add a second test that runs in LPython and still breaks in CPython 3.11.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also add a second test that runs in LPython and still breaks in CPython 3.11.
I think we do not yet have a CI test for CPython 3.11, but I guess we might eventually have one. Please, could you share what we should do.
src/runtime/lpython/lpython.py
Outdated
if isinstance(orig_val, (int, float, str, bool, tuple)): | ||
setattr(arg, class_mem, py_field(default=orig_val)) | ||
else: | ||
setattr(arg, class_mem, py_field(default_factory=lambda val=orig_val: val)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do exactly? I am trying to understand this code.
class X:
a: i32 = 123
b: bool = True
c: i32[:] = array([1, 2, 3])
d: i32[:] = array([4, 5, 6])
Is class_mems_init
equal to {a: 123, b: True, c: array([1,2,3), d: array([4,5,6])}
?
What does this do: py_field(default_factory=lambda val=orig_val: val)
? I am trying to parse it. It is assigning to the default_factory
argument, but it is assigning lambda val=orig_val: val
, what does that mean? Is it a lambda function with a default argument?
In general, this seems to be adding a layer of indirection, away from the default CPython behavior, see here: #2274.
What exactly is the problem with dataclass
and numpy arrays? You can't store those there anymore?
Maybe there is no other way to fix this, but I am hoping we can just use CPython exactly as is, without adding more things on top. Even the behavior in the "main" branch seems modified by:
def __class_getitem__(key):
return Array(arg, key)
Why is that needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly is the problem with dataclass and numpy arrays? You can't store those there anymore?
Yes, We can't store them directly with Python 3.11. They made this change in https://docs.python.org/3.11/whatsnew/3.11.html#dataclasses (also shared in the description of this issue). Only types that are immutable (or atleast hashable) are allowed as default values for @dataclass
members. Since, python 3.11, for immutable types, we need to use field(default_factory=a_call_back_function_here_which_returns_the_default_value_for_the_parameter)
as the default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am completely confused. I thought you can store mutable things in dataclasses, I thought that is the whole point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there is no other way to fix this, but I am hoping we can just use CPython exactly as is, without adding more things on top. Even the behavior in the "main" branch seems modified by:
def __class_getitem__(key):
return Array(arg, key)
I think that is needed for the following case to work with CPython:
$ cat examples/expr2.py
from lpython import i32, dataclass
@dataclass
class A:
n: i32
def some_func(x: A[:]):
print(x)
$ lpython examples/expr2.py
$ python examples/expr2.py
Traceback (most recent call last):
File "/Users/ubaid/Desktop/OpenSource/lpython/examples/expr2.py", line 7, in <module>
def some_func(x: A[:]):
TypeError: 'type' object is not subscriptable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do: py_field(default_factory=lambda val=orig_val: val) ? I am trying to parse it. It is assigning to the default_factory argument, but it is assigning lambda val=orig_val: val, what does that mean? Is it a lambda function with a default argument?
Yes, it is a lambda function with a default argument. I initially tried several attempts using py_field(default_factory=lambda: orig_val)
(returning the value of orig_val
directly, without any default argument). But it does not do the correct initialization. For example, the following fails (also added as CPython emulation test):
$ cat examples/expr2.py
from lpython import i32, dataclass
from numpy import array
@dataclass
class X:
c: i32[:] = array([1, 2, 3])
d: i32[:] = array([4, 5, 6])
def main0():
x: X = X()
print(x)
assert x.c.sum() == 6
assert x.d.sum() == 15
main0()
$ python examples/expr2.py
X(c=array([4, 5, 6]), d=array([4, 5, 6]))
Traceback (most recent call last):
File "/Users/ubaid/Desktop/OpenSource/lpython/examples/expr2.py", line 15, in <module>
main0()
File "/Users/ubaid/Desktop/OpenSource/lpython/examples/expr2.py", line 12, in main0
assert x.c.sum() == 6
AssertionError
both c
and d
got assigned to the same value of [4, 5, 6]
where as c
was supposed to be assigned [1, 2, 3]
. On investigating further, I realised that it happens because the return value of the lambda function is being evaluated when it is called and not when it is defined. To fix it we used the default argument which are evaluated at definition time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question:
- I am surprised that this change has not caused a firestorm of pushback from the Python community. Immutable objects (instances of classes) are exactly wrong for object-oriented programming (OOP). The entire point of OOP is mutable state :)
- I would imagine this change in 3.11 is going to break lots and lots of released code in the field, no ?
- I would imagine someone is going to "fix" this by introducing a legacy dataclass. Perhaps we need to do likewise: just implement our own legacy dataclass.
Observation:
Immutable objects are interesting and useful. Clojure, Haskell, other functional-programming languages (FPLs) have them. Efficient implementation is non-trivial (see everything that cites Chris Okasaki -- finger trees, Bloom filters, all kinds of sophisticated jazz).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Immutable objects are observably different from mutable objects. In Clojure, all mutable objects are syntactically different from immutable objects. Immutable objects are the default. Mutable objects are stored in atoms, agents, vars, a couple more, and must be accessed via AT-SIGN, kind of a thread-safe indirection operator (actually called Software Transactional Memory (STM)). Without STM, mutable objects must be manually synchronized -- a gigantic tarpit of problems for real-world programmers.
Maybe this is the reason 3.11 went immutable -- they're going multithreaded, no ? With multithreading, concurrent access is a must, and immutability makes it easy.
However, perhaps they (Python) forgot that mutable state is actually NECESSARY, and perhaps they forgot that they need language mechanisms for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding A[:]
, you should not subscript A
directly, but rather have a NumPy array of A, then everything works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clojure addressed this trifecta: {immutable-by-default, threadsafe-by-default, mutable-by-opt-in} in 2009, and it fixed the terrible problems introduced, on the one hand, by Haskell, which has NO mutability, and Java, on the other hand, which has no immutability but has threads. Java inflicted pain and bugs on a generation of programmers (see books by Brian Goetz). Haskell doomed itself to impracticality (except for finTech). Clojure found The Golden Mean and is, in my opinion, the best solution to Squaring the Circle of mutability and threads. I hope Python people are smart enough to learn from Clojure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I figured it out: #2276
@Shaikh-Ubaid I would say let's implement the solution from #2276, and not add any new layers into |
This is needed to make integration_tests compatible with CPython
b92b718
to
e000102
Compare
This PR needs a rebase. The old |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good, thanks!
This PR does not seem to reflect the rebase. I merged it using the |
fixes #2267
Python 3.11 has a breaking change for
@dataclass
. With python3.11
, the default values for@dataclass
members need to beimmutable
(orhashable
). More info: https://docs.python.org/3.11/whatsnew/3.11.html#dataclasses, huggingface/datasets#5230The PR also adds a CI check for integrations tests with python
3.11.3
, so that it does not break in future.