-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expression quoting #6
Comments
(First, there's a typo in I do like the symmetry here. With this addition, we would have 3 calling schemes in Python: Call-by-reference. The usual. Call-by-index. Likely not normally thought of as such, but it does apply eager evaluation to the desired slice args, then made available as a slice object to >>> import numpy as np
>>> x = np.array([1., -1., -2., 3, 4, -4])
>>> x[x > 2] results in array([3., 4.]) given that Unfortunately, one cannot then do some obvious things like
because that's an implicit use of Call-by-name. So with the above, NumPy could instead define So at first glance, this concept works for me! It would be interesting to think through a recursive qdef, and whether that's useful or not - recursive generators certainly are. |
Yeah, that's what I was going for. The big question is if having the free variable of the quoted expression be implicit will work. I'm thinking about how to define def apply___qcall__(qqfunc: Quote, qarg: Quote) -> Any:
qfunc = qqfunc() # Quote('x+1', lambda: x+1)
var = qarg.raw # 'x'
f = ?? # extract free variables from func and construct a new lambda with one argument named x
arg = qarg() # 0
return f(arg) # 1 I don't think I can do the step Anyway the idea would be to combine this with something like PEP 501's i-string objects, but with each of the interpolations presented as a Translation-marked strings would have to reconstruct the raw string by combining the fixed strings, the Thinking about it more, the compiler may have to produce something slightly more complicated than a parameter-less |
Sounds good in terms of next steps exploring this idea - So let's see what is actually needed. |
If I understand it correctly, import types
import functools
from dataclasses import dataclass
from typing import Callable
@dataclass
class Quote:
raw: str
function: Callable
def __call__(self, *args, **kwargs):
return self.function(*args, **kwargs)
# NOTE https://en.wikipedia.org/wiki/Lambda_lifting seems to be the prior
# art here. However, "lift" instead of "lambda lift" also somewhat different
# meaning in functional programming. By analogy to functools.partial, which
# has similar concerns if we were to survey FP, we will just call it "lift".
#
# Another name might be "uncurry"
def lift(self, *varnames: str) -> "Quote":
"""Lambda lifts varnames to the embedded lambda function, return a new Quote
Any varname that is not used in the freevar is ignored, as it would be
in any usual code.
Because of the compilation, this is a relatively expensive operation in
pure Python, so it should be cached (outside of this specific object of
course). However, it should be possible to optimize this in a specific
scenario, at least for CPython and Jython.
"""
def param_list(names):
return ", ".join(names)
code = self.function.__code__
# FIXME do more generic patching - it should be possible to do the
# following for some function `f`:
#
# 1. extract co_freevars from f.__code__
# 2. make a simple lambda that references these freevars
# 3. use like below
# 4. then patch back in the original code object
#
# Such a function could then be `functools.lift`, and then just used
# here for the specific implementation in Quote.
wrapped = f"""
def outer({param_list(code.co_freevars)}):
def lifted({param_list(varnames)}):
return (lambda: {self.raw})()
"""
capture = {}
exec(wrapped, self.function.__globals__, capture)
# Essential for tracing out what is going :)
# import dis; dis.dis(capture["outer"])
lifted = types.FunctionType(
capture["outer"].__code__.co_consts[1],
self.function.__globals__)
functools.update_wrapper(self.function, lifted)
return Quote(self.raw, lifted)
def test_scope():
x = 47
q = Quote("x+1", lambda: x + 1)
print(f"{q()=}")
q_x = q.lift("x") # lift x out of the free var
print(f"{q_x(42)=}")
q_x2 = q_x.lift("x") # ignore extra x in lifting again
print(f"{q_x2(42)=}")
try:
# but we cannot lift the same variable twice in one lift, likely we want
# some wrapper exception here
q_xx = q_x.lift("x", "x")
except SyntaxError:
pass # SyntaxError: duplicate argument 'x' in function definition
q_xy = q_x.lift("x", "y") # y is not used in the body, but that can be true of any function
print(f"{q_xy(42, 99999)=}")
test_scope() |
Still wrapping my head around this. But don't you have the arguments to functools.update_wrapper() reversed? |
Quite possibly, they don't actually change anything I believe in this more
limited example!
I added this wrapping support, thinking I might attempt the more generic
lift function, then decided to 1. Show intermediate work; 2. Take a long
walk. :)
|
Okay, you definitely reversed those arguments. I pushed a slightly altered version that also fixes this, as lifting.py. Now I have a real question. Suppose we have I feel this is important once we start doing things like |
In the current example, we don't consider any globals (including builtins) as free variables. I believe we should keep that distinction. We could potentially add something like
I see your point. I had assumed that Right now, I think the only real solution here is to be explicit and state the freevar(s) to lift: qmap{x, n**x, a} If we added some further syntactic sugar (here qmap{x: n**x, a} In either case, def qmap{qparams, qexpr, qarg}:
arg = qarg()
for x in arg:
yield apply{qparms, qexpr, x}
Alternatively we could always use print(list(qmap{_+1, range(10)})) Personally, I'm not so sure if that's such a good approach. |
With respect to my earlier comment, this can be more straightforward and without extra syntactic sugaring, but with the same user syntax: foo{x: x+1} is the equivalent of
With this modest modification, scope analysis works as expected:
What I like about this approach is that it maintains the original idea of #4 of using existing Python scope analysis, and then using the resulting symbol table captured it in the function (and corresponding code object). It is also still possible for someone to write >>> f = lambda: n**x
>>> dis.dis(f)
1 0 LOAD_GLOBAL 0 (n)
2 LOAD_GLOBAL 1 (x)
4 BINARY_POWER
6 RETURN_VALUE But calling |
With respect to tagged strings, it would be still desirable I think to have |
I agree we should also have tagged strings -- as a special case of quoted expressions. I don't like The rest has to wait until my tendonitis calms down. |
+1 - I plan to go through tagged strings/numbers again, and re-express accordingly in terms of quoted exprs.
We could use arrow notation, somewhat similar to JavaScript, Scala, and other languages. So One advantage is that this notation can also support type annotations: Such arrows could then be a more elegant way of typing anonymous functions (python/mypy#4226) than what is necessary now, as we see with this workaround: https://stackoverflow.com/questions/33833881/is-it-possible-to-type-hint-a-lambda-function. In particular, such usage is not compatible with direct usage of the lambda, since it requires an assignment!
Hope you feel better! I'm experiencing something similar in my right foot for the last 3 months. Slow, slow recovery. |
That stackoverflow item is ridiculous. The deleted/hidden answer actually
got it right: the solution to typing a lambda is to not use a lambda but a
def. The upvoted answers only work if you assign the lambda to a variable,
making it not anonymous. IOW I don't think we should try to add types to
anonymous functions.
|
Hah, that is true.
Right.
Also, a more powerful type inferencer could do this work for the programmer, instead of requiring type annotations, given the fact that there is available context. (It's true of certain FP languages, but they are also very different than Python!) So let's put aside type annotations for anonymous functions, and just focus on finding a possible better syntax for lambdas. Also it's possible I didn't solve it here, since do we need to consider noargs or arrows more generally outside of quoted usage? (If quoted, this can be ignored, it's just So let's assume we have arrows in general. Then we could write the following, ignoring equivalent construction with a list comprehension: list(map(x -> x+1, range(20))) which is the same as list(map(lambda x: x + 1, range(20))) But for noargs arrows, would we write it like so? f(-> some_freevar+1) Or the following, somewhat similar to JS? f(() -> some_freevar+1) (My preferred variant.) Or just require lambda for this case? f(lambda: some_freevar+1) |
Noargs functions seem somewhat important (in a sense every interpolation in
an fl-string would be a noargs function).
I think we can make plain f(-> x+1) work, although it does look odd.
Assuming a twoarg is written (x, y) -> x+y, maybe () -> 42 would be better.
But we seem to be getting off-topic. Why do we need a new syntax for
lambdas again?
|
We can avoid a new syntax as follows:
For the above there are two other concerns, which should be addressed by separate issues:
|
I worry we've gone off the rails a bit. Let's look at use cases again and try to see the minimal feature set we'll need for them. I'm aware of these use cases for tagged strings:
Now for quoted expressions the only use case so far that I've seen is a slightly more concise form of lambda, possibly with an implicit free variable. That seems hardly worth new syntax. But what makes this attractive would be if it was introspectable so that e.g. a numpy library could compile things like Apart from the tagged numbers (which are cool but also probably don't have many serious applications besides Decimal?), what problem are we trying to solve here? Is there really a unifying principle for tagged strings, tagged numbers and quoted expressions? The solutions we're looking at all look rather complicated and I'm not so keen to have a new anonymous function syntax that's just a few letters shorter than lambda. Did I miss something exciting? |
+1
+1
+1, or for simplifying computations in general. It's can be challenging to avoid eagerness while still providing a nice API, and that can be costly if the intermediate computations are unneeded or can be simplified. If we can get it right, we can provide such libraries a straightforward and standard scheme to get laziness, while providing nice ergonomics. What I like about this is that Python is widely used as a coordination language for scientific computing, data science, and ML. Quoted expressions can improve this support for coordination. Again, if we get it right :)
So two concerns here:
I looked at the following popular libraries that work with complex expressions in some interesting way, generally by parsing and/or building a computation graph. I did this by looking for usage of
So if quoted expressions are going to be useful, the best targets would seem to be to look at Numpy and/or Pandas, ideally in some simplified means so that it can be discussed without saying "understand the source code of this large codebase."
FWIW - the only usage in Python stdlib for decimal literals is in tests. But surely they are used elsewhere? Interestingly, when I looked at the top 4000 PyPI packages plus related libraries (strictly looking at package names, since metadata is not provided), I found numpy-financial, stripe-python (payment processing), yfinance (Yahoo finance API), and starkbank. Only numpy-financial supported decimals at all, and decimal literals were only used in tests. Decimals should be supported by such packages, but it might not be the highest priority either. Most importantly - the
Let's not do arrows with this work, since they are not necessary.
Not yet, more work needs to be done here! I think it might be worthwhile to write the core of a simplify qfunc for array indexing, just to see if that would be useful or not with the proposed syntax and dunder protocol. |
Oooh, I realize I'm totally out of my depth here, never having looked at any of those libraries (well, a brief failed encounter with pandas excluded). If you want me to participate you're going to have to pick examples that don't use any of those libraries -- surely the same principles can be applied to other domains? I'm imagining the places where quoted expressions are more useful than the typical approach that builds an expression tree using operator overloading in situations where operator overloading doesn't exist, e.g. Hm, I know that my first draft of PS. I did look at the getframe usage in numpy you linked to and it actually looks like it's only used when the array is indexed with a string (see docs). This is implemented using |
Another thought: look at C# (expression trees, trying to find the best docs) and Julia (https://docs.julialang.org/en/v1/manual/metaprogramming/), which both have robust support for metaprogramming to do interesting things. This keeps it closer to the language level, vs diving into complex libraries. For me personally, I'm just an occasional user of some of these libraries myself, not a developer of them!
That would be the optimized way to do it...
... but this is how it is actually done in Numpy, with eager evaluation. So it's less than ideal, and results in the behavior we see of requiring writing expressions like
Maybe, I need to wrap my head around this. In general, I think we should assume a minimal
Right, Pandas has some amazing capabilities, but it feels quite challenging to work through. |
So a few more thoughts here:
So this might look like: quoted def qf(a, b):
return a*b + 1 So far this is just like saying def g(left, right):
quoted def qf():
return {left} + {right}
return qf Now the thought here is that maybe I could call def g(left, right):
quoted def qf():
return h{left} + h{right}
return qf As an idea, it's not terribly worked out, other than to say it vaguely resembles quasiquotes in Lisp. Why wouldn't I just use exec on some f-string template, like is already done (either with examples like lifting or https://github.com/python/cpython/blob/master/Lib/dataclasses.py#L377). Maybe I need some sort of looping or recursion to build the template? But I feel like there's something here. TBD. |
I'm sorry, I lost track of what you mean with The resemblance with Lisp quasiquotes is probably intentional -- it's been forever since I looked at Lisp or Scheme but I recall something that passes an s-expression unchanged rather than treating it as an expression to evaluate. Maybe I can't find much about Julia or C# (Julia seems to be focused on on defining functions that run while the compiler is compiling -- IIRC we (i.e., I :-) rejected that idea earlier. |
Okay, in order to make this more concrete, I started a branch where we can work on implementing https://github.com/gvanrossum/cpython/tree/quoting So far it only supports calls with exactly one argument, and it translates that argument into the stringified expression (so no Quote object yet). Proof it works:
The next challenge will actually be changing it so that instead of calling I figure extending this to support multiple positional arguments will be straightforward. I'm not sure what to do with |
Yikes, mixing in sets obviously wouldn't work so well! :) I was a bit too close to thinking about f-strings when I was writing down the example. And without
Addition of two expressions. So let's just assume it's like so (again, just trying out ideas here): x = 2
y = 3
def make_sum(left_expr: Quote, right_expr: Quote) -> Callable:
quoted def qf(factor: int) -> int:
return (left + right) * factor
# As used here, Quote.sub inserts in the expr so that it's like
# ... left() + right() ..., as opposed to requiring
# it to be called. There are probably some interesting alternatives
# that could support recursion. TBD.
return qf.sub(left=left_expr, right=right_expr)
f = make_sum{x+1, x*y}
print(f(2)) This is roughly equivalent to writing x = 2
y = 3
def make_sum(left: str, right: str) -> Callable:
code = f'def qf(factor):\n return (({left}) + ({right})) * factor'
ns = {}
exec(code, globals(), ns)
return ns['qf']
f = make_sum("x+1", "x*y")
print(f(2)) except the following hold:
I need to work out how this can potentially help with some of the metaprogramming seen in the stdlib.
A quasiquote mixes evaluation in by using `(+ 1 2 ,(* 3 4)) is equivalent to '(+ 1 2 12) So the quasiquote like quality here is not as much as I initially intended, because we don't have any special syntax to help here - we need the explicit substitution provided by
:) Hah, I'm absolutely fine with that restriction. Julia and C# can give us some ideas, but neither are a dynamic language like Python. |
Nice, I will try it out!
+1
Yeah, got to get a few things figured out before then! |
New version (same branch). This translates the quoted arg into a tuple
UPDATE: Now supporting multiple arguments:
|
This leaves out the It also leaves out turning I think however that we can now build working prototypes, e.g. |
So this works, sort of:
However it doesn't work if the first arg uses any locals from Maybe the lambda I synthesize in the parser (see the branch, not this little example) should actually get arguments whose defaults capture the values of the corresponding variables? E.g. But if we do this we lose the ability to use undefined variables, since defaults capture the values early. We could solve that problem by turning the defaults into lambdas themselves, so that That honestly looks horrible, so I think maybe we should be okay with early capture. What do you think? |
I have to think about early capture. But in general, I think of the lambda as a very convenient symbolic table, where we don't have to modify anything deep about compilation to get this quoting to work. So I'm less concerned about what it looks like (this is just more nested) so long it's serving a purpose. I quickly tried out a variant of selection for the Numpy scenario, in this case where Python does all the actual work of evaluating the select expression, logical short cutting included and nested scopes compatible. It's pleasingly simple, but I need to play with some more rewriting now that this is possible. from functools import total_ordering
@total_ordering
class obj:
def __init__(self, value):
self.value = value
def __eq__(self, other):
if hasattr(other, 'value'):
return self.value == other.value
else:
return self.value == other
def __le__(self, other):
if hasattr(other, 'value'):
return self.value <= other.value
else:
return self.value <= other
def __repr__(self):
return f'<obj {self.value}>'
def __call__(self, expr):
print(f'{self=!r}, {expr=}')
selected = expr[1]()
return self if selected else None
def f(value):
x = obj(value)
def g(a, b):
return x{a <= x < b}
return g
g = f(10)
print(g(1, 11)) |
Nice! One thing that now worries me: curlies look too much like plain old parentheses, and I had a heck of a time finding the qcall in your example. :-( |
There's other possible syntax, but likely we will be able to better see the curlies once we are used to them being used. My thinking is that otherwise, it would be hard to distinguish I'm too tired today from being on-call this week, but a couple of things:
|
Warning: f-strings are a challenge even for the PEG parser. Tagged strings could be prototyped as ‘NAME STRING+’ in the grammar (allowing a space between, but that’s fine in a prototype) but you’d still have to refactor the f-string code (a bit) to parse the interpolations. |
Heads up: I continue to be swamped with work, but I completed some big chunks as of today. I also have an InDay I can dedicate to the quoting work on Friday and look at expression rewriting and PEG parsing per your note above. I will then be taking vacation the following week, and then hopefully a better cadence going forward. |
No worries, I am working on other things as well. Enjoy your vacation!
|
Another radical idea.
Define a Quote class that represents a quoted expression. It gets passed the string representation of the expression and a lambda that evaluates it in the original context (though perhaps the walrus assigns to a variable in the containing scope). Like in #4, the lambda can be used to recover the cellvars referenced by the expression (and the globals).
Now if we write
foo{a+1}
this constructsq = Quote('a+1', lambda: a+1)
and then callsfoo.__qcall__(q)
. If we writefoo{x+1, y-1}
it would construct two Quotes (forx+1
and fory-1
) and then callfoo.__qcall__(q, r)
with those two. Etc.Next we could allow an alternate function definition, written as
This would just be a shorthand for defining a class
foo
with a__qcall__
method whose argument list is(self, x, y)
and whose body is exactly the body offoo
above.Now we can write clever functions like
We need a builtin apply() that takes an expression and argument (both quoted) and somehow calls the expression on the argument. This would use the co_cellvars trick again. The expression would have one free variable that should correspond to the name of the argument. (So the
x
infor x in arg
must correspond to thex
inx+1
.)This would satisfy some of the desire to have functions that see their arguments as quoted expressions -- the problem has always been that there's no way that the parser can know that a function needs its arguments quoted, which we solve here by using
{}
instead of()
for the call syntax. (Luckily Python usesa[...]
anda(...)
but not yeta{...}
. :-)That's as far as I got before family interrupted today.
The text was updated successfully, but these errors were encountered: