Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix integration tests with Python 3.11 #2275

Merged
merged 6 commits into from
Aug 17, 2023

Conversation

Shaikh-Ubaid
Copy link
Collaborator

@Shaikh-Ubaid Shaikh-Ubaid commented Aug 14, 2023

fixes #2267

Python 3.11 has a breaking change for @dataclass. With python 3.11, the default values for @dataclass members need to be immutable (or hashable). More info: https://docs.python.org/3.11/whatsnew/3.11.html#dataclasses, huggingface/datasets#5230

The PR also adds a CI check for integrations tests with python 3.11.3, so that it does not break in future.

@@ -780,3 +780,4 @@ COMPILE(NAME import_order_01 LABELS cpython llvm c) # any

# LPython emulation mode
RUN(NAME lpython_emulation_01 LABELS cpython NOMOD)
RUN(NAME lpython_emulation_02 LABELS cpython)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add llvm test as well. And rename the test, it should not be called emulation --- the emulation test is only for CPython only tests, which we have just one, to show how to hook your own implementations in CPython.

I think this is a "struct" test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless this test doesn't run in LPython yet, in which case keep this test cpython only and let's also add a second test that runs in LPython and still breaks in CPython 3.11.

Copy link
Collaborator Author

@Shaikh-Ubaid Shaikh-Ubaid Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also add a second test that runs in LPython and still breaks in CPython 3.11.

I think we do not yet have a CI test for CPython 3.11, but I guess we might eventually have one. Please, could you share what we should do.

if isinstance(orig_val, (int, float, str, bool, tuple)):
setattr(arg, class_mem, py_field(default=orig_val))
else:
setattr(arg, class_mem, py_field(default_factory=lambda val=orig_val: val))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do exactly? I am trying to understand this code.

class X:
    a: i32 = 123
    b: bool = True
    c: i32[:] = array([1, 2, 3])
    d: i32[:] = array([4, 5, 6])

Is class_mems_init equal to {a: 123, b: True, c: array([1,2,3), d: array([4,5,6])}?

What does this do: py_field(default_factory=lambda val=orig_val: val) ? I am trying to parse it. It is assigning to the default_factory argument, but it is assigning lambda val=orig_val: val, what does that mean? Is it a lambda function with a default argument?

In general, this seems to be adding a layer of indirection, away from the default CPython behavior, see here: #2274.

What exactly is the problem with dataclass and numpy arrays? You can't store those there anymore?

Maybe there is no other way to fix this, but I am hoping we can just use CPython exactly as is, without adding more things on top. Even the behavior in the "main" branch seems modified by:

    def __class_getitem__(key):
        return Array(arg, key)

Why is that needed?

Copy link
Collaborator Author

@Shaikh-Ubaid Shaikh-Ubaid Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is the problem with dataclass and numpy arrays? You can't store those there anymore?

Yes, We can't store them directly with Python 3.11. They made this change in https://docs.python.org/3.11/whatsnew/3.11.html#dataclasses (also shared in the description of this issue). Only types that are immutable (or atleast hashable) are allowed as default values for @dataclass members. Since, python 3.11, for immutable types, we need to use field(default_factory=a_call_back_function_here_which_returns_the_default_value_for_the_parameter) as the default value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am completely confused. I thought you can store mutable things in dataclasses, I thought that is the whole point?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there is no other way to fix this, but I am hoping we can just use CPython exactly as is, without adding more things on top. Even the behavior in the "main" branch seems modified by:

    def __class_getitem__(key):
        return Array(arg, key)

I think that is needed for the following case to work with CPython:

$ cat examples/expr2.py    
from lpython import i32, dataclass

@dataclass
class A:
    n: i32

def some_func(x: A[:]):
    print(x)
$ lpython examples/expr2.py
$ python examples/expr2.py 
Traceback (most recent call last):
  File "/Users/ubaid/Desktop/OpenSource/lpython/examples/expr2.py", line 7, in <module>
    def some_func(x: A[:]):
TypeError: 'type' object is not subscriptable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do: py_field(default_factory=lambda val=orig_val: val) ? I am trying to parse it. It is assigning to the default_factory argument, but it is assigning lambda val=orig_val: val, what does that mean? Is it a lambda function with a default argument?

Yes, it is a lambda function with a default argument. I initially tried several attempts using py_field(default_factory=lambda: orig_val) (returning the value of orig_val directly, without any default argument). But it does not do the correct initialization. For example, the following fails (also added as CPython emulation test):

$ cat examples/expr2.py 
from lpython import i32, dataclass
from numpy import array

@dataclass
class X:
    c: i32[:] = array([1, 2, 3])
    d: i32[:] = array([4, 5, 6])

def main0():
    x: X = X()
    print(x)
    assert x.c.sum() == 6
    assert x.d.sum() == 15

main0()
$ python examples/expr2.py
X(c=array([4, 5, 6]), d=array([4, 5, 6]))
Traceback (most recent call last):
  File "/Users/ubaid/Desktop/OpenSource/lpython/examples/expr2.py", line 15, in <module>
    main0()
  File "/Users/ubaid/Desktop/OpenSource/lpython/examples/expr2.py", line 12, in main0
    assert x.c.sum() == 6
AssertionError

both c and d got assigned to the same value of [4, 5, 6] where as c was supposed to be assigned [1, 2, 3]. On investigating further, I realised that it happens because the return value of the lambda function is being evaluated when it is called and not when it is defined. To fix it we used the default argument which are evaluated at definition time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question:

  1. I am surprised that this change has not caused a firestorm of pushback from the Python community. Immutable objects (instances of classes) are exactly wrong for object-oriented programming (OOP). The entire point of OOP is mutable state :)
  2. I would imagine this change in 3.11 is going to break lots and lots of released code in the field, no ?
  3. I would imagine someone is going to "fix" this by introducing a legacy dataclass. Perhaps we need to do likewise: just implement our own legacy dataclass.

Observation:

Immutable objects are interesting and useful. Clojure, Haskell, other functional-programming languages (FPLs) have them. Efficient implementation is non-trivial (see everything that cites Chris Okasaki -- finger trees, Bloom filters, all kinds of sophisticated jazz).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Immutable objects are observably different from mutable objects. In Clojure, all mutable objects are syntactically different from immutable objects. Immutable objects are the default. Mutable objects are stored in atoms, agents, vars, a couple more, and must be accessed via AT-SIGN, kind of a thread-safe indirection operator (actually called Software Transactional Memory (STM)). Without STM, mutable objects must be manually synchronized -- a gigantic tarpit of problems for real-world programmers.

Maybe this is the reason 3.11 went immutable -- they're going multithreaded, no ? With multithreading, concurrent access is a must, and immutability makes it easy.

However, perhaps they (Python) forgot that mutable state is actually NECESSARY, and perhaps they forgot that they need language mechanisms for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding A[:], you should not subscript A directly, but rather have a NumPy array of A, then everything works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clojure addressed this trifecta: {immutable-by-default, threadsafe-by-default, mutable-by-opt-in} in 2009, and it fixed the terrible problems introduced, on the one hand, by Haskell, which has NO mutability, and Java, on the other hand, which has no immutability but has threads. Java inflicted pain and bugs on a generation of programmers (see books by Brian Goetz). Haskell doomed itself to impracticality (except for finTech). Clojure found The Golden Mean and is, in my opinion, the best solution to Squaring the Circle of mutability and threads. I hope Python people are smart enough to learn from Clojure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I figured it out: #2276

@certik
Copy link
Contributor

certik commented Aug 14, 2023

@Shaikh-Ubaid I would say let's implement the solution from #2276, and not add any new layers into lpython.py, and later actually remove the custom wrapper of dataclass completely that we currently have there.

@certik certik marked this pull request as draft August 14, 2023 15:50
@Shaikh-Ubaid
Copy link
Collaborator Author

This PR needs a rebase. The old 2 commits are part of #2280.

@Shaikh-Ubaid Shaikh-Ubaid marked this pull request as ready for review August 16, 2023 23:54
Copy link
Contributor

@certik certik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, thanks!

@Shaikh-Ubaid Shaikh-Ubaid merged commit 91207c9 into lcompilers:main Aug 17, 2023
10 checks passed
@Shaikh-Ubaid Shaikh-Ubaid deleted the python_3.11_integ_tests branch August 17, 2023 06:05
@Shaikh-Ubaid Shaikh-Ubaid restored the python_3.11_integ_tests branch August 17, 2023 06:07
@Shaikh-Ubaid
Copy link
Collaborator Author

This PR does not seem to reflect the rebase. I merged it using the merge with rebase option at the web GUI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

integration tests fail on Python 3.11
3 participants