RFC: Curried getproperty syntax #53946

Keno · 2024-04-04T01:51:43Z

This PR adds support for parsing .a as x->x.a. This kind of thing has come up multiple times in the past, but I'm currently finding myself doing a lot of work on nested structs where this operation is very common.

In general, we've had the position that this kind of thing should be a special case of the short-currying syntax (e.g. #38713), but I actually think that might be a false constraint. In particular, .a is a bit of a worst case for the curry syntax. If there is no requirement for .a to be excessively short in an eventual underscore curry syntax, I think that could open more options.

That said, any syntax proposal of course needs to stand on its own, so let me motivate the cases where I think this plays:

Motivation

Curried getfield

I think this is probably the most obvious and often requested. The syntax here is very useful for situations where higher order functions operate on collections of records:

map(.a, vec) and reductions for getting the fields of an object - also includes things like sum(.price, items)
Predicates like sort(vecs, by=.x) or filter(!(.deleted), entries)
In pipelines vecs |> .x |> sqrt |> sum

I think that's mostly what people are thinking of, but the use case for this syntax is more general.

A syntax for lenses

Packages like Accessors.jl provide lens-like abstractions. Currently these are written as lens = @optic _.a. An example use of Accessors.jl is (from their documentation)

julia> modify(lowercase, (;a="AA", b="BB"), @optic _.a)
T("aa", "BB")

This PR can be thought of as providing lenses first class syntax, as in:

julia> modify(lowercase, (;a="AA", b="BB"), .a)
T("aa", "BB")

Symbol index generalization to hierarchical structures

We have a lot of packages in the ecosystem that support named axes of various forms (Canonical examples might be DataFrames and NamedArrays, but there's probably two dozen of these). Generally the way that this syntax works is that people use quoted symbols for indexing:

df[5, :col]

However, this breaks down when there is hierachical composition involved. For example, for simulation models, you often build parameter sets and solutions out of hierarchies of simpler models.

There's a couple of solutions that people have come up with for this problem:

Some packages parse out hierachy from symbol names: sol[:var"my.nested.hierachy.state"]
Other packages have a global root object: sol[○.my.nested.hierarchy.state] 2a. A variant of this is using the object as its own root sol[sol.my.nested.hierarchy.state] 2b. Yet another variant is having the root object be context specific sol[sys.my.nested.hierarchy.state]
Yet other packages put symbolic names into the global namespaces sol[my.nested.hierarchy.state]

These solutions are all lacking. 1 requires string manipulation for composition, the various variants of 2 are ok, but there is no agreement among packages what the root object looks like or is spelled, and even so, it's an extra export and 3 pollutes the global namespaces.

By using the same mechanism here, we essentially standardize the solution 2, but make the root object implicit.`

Additional considerations

As mentioned above, this was previously argued to be a special case of underscore currying. Esp if used for the other use cases above (which don't necessarily use this applicatively), I don't think that's necessarily dispositive, but as always it's better to add one syntax rather than two.
This is of course also related to the long-standing (almost 10 years - oh no) issue of "mutating immutables" (ref Julep: More support for working with immutables #11902). I think this makes sense independent of any proposal there, but it's worth thinking about.
Every once in a while people request broadcasted getproperty along the lines of vec..a. This of course provides that with (.a).(vec). I had briefly explored not requiring the parentheses, along the lines of .a.(vec), but it didn't feel right when I played with it, so this PR explicitly reserves that syntax.

!!! note
This only provides an flisp implementation for people to play around with. As such, this needs JULIA_USE_FLISP_PARSER=true. I will provide a JuliaSyntax implementation once there's consensus on semantics.

@optic

This PR adds support for parsing `.a` as `x->x.a`. This kind of thing has come up multiple times in the past, but I'm currently finding myself doing a lot of work on nested structs where this operation is very common. In general, we've had the position that this kind of thing should be a special case of the short-currying syntax (e.g. #38713), but I actually think that might be a false constraint. In particular, `.a` is a bit of a worst case for the curry syntax. If there is no requirement for `.a` to be excessively short in an eventual underscore curry syntax, I think that could open more options. That said, any syntax proposal of course needs to stand on its own, so let me motivate the cases where I think this plays: A. Curried getfield I think this is probably the most obvious and often requested. The syntax here is very useful for situations where higher order functions operate on collections of records: 1. `map(.a, vec)` and reductions for getting the fields of an object - also includes things like `sum(.price, items)` 2. Predicates like `sort(vecs, by=.x)` or `filter(!.deleted, entries)` 3. In pipelines `vecs |> .x |> sqrt |> sum` I think that's mostly what people are thinking of, but the use case for this syntax is more general. B. A syntax for lenses Packages like Accessors.jl provide lens-like abstractions. Currently these are written as `lens = @optic _.a`. An example use of Accessors.jl is (from their documentation) ``` julia> modify(lowercase, (;a="AA", b="BB"), @optic _.a) T("aa", "BB") ``` This PR can be thought of as providing lenses first class syntax, as in: ``` julia> modify(lowercase, (;a="AA", b="BB"), .a) T("aa", "BB") ``` C. Symbol index generalization to hierachical structures We have a lot of packages in the ecosystem that support named axes of various forms (Canonical examples might be DataFrames and NamedArrays, but there's probably two dozen of these). Generally the way that this syntax works is that people use quoted symbols for indexing: ``` df[5, :col] ``` However, this breaks down when there is hierachical composition involved. For example, for simulation models, you often build parameter sets and solutions out of hierarchies of simpler models. There's a couple of solutions that people have come up with for this problem: 1. Some packages parse out hierachy from symbol names: `sol[:var"my.nested.hierachy.state"]` 2. Other packages have a global root object: `sol[○.my.nested.hierarchy.state]` 2a. A variant of this is using the object as its own root `sol[sol.my.nested.hierarchy.state]` 2b. Yet another variant is having the root object be context specific `sol[sys.my.nested.hierarchy.state]` 3. Yet other packages put symbolic names into the global namespaces `sol[my.nested.hierarchy.state]` These solutions are all lacking. 1 requires string manipulation for composition, the various variants of 2 are ok, but there is no agreement among packages what the root object looks like or is spelled, and even so, it's an extra export and 3 pollutes the global namespaces. By using the same mechanism here, we essentially standardize the solution `2`, but make the root object implicit.`

Seelengrab · 2024-04-04T06:37:30Z

In the past there's also been discussion around using .foo as syntax for broadcasting of functions, e.g. to map a broadcasted function over a collection of collections like map(.foo, [ [1 2], [3 4] ]). This would also be consistent with what .+ currently means, which has that usecase in mind:

julia> .+
Base.Broadcast.BroadcastFunction(+)

So at a glance, this PR/RFC would clash with that, since + is a valid identifier for properties, no?

topolarity · 2024-04-04T11:05:16Z

The !.deleted jumps out to me as a likely source of confusion, such that I'm not sure it's worth supporting.

(!).(foo) versus !.foo versus (!.foo).(vec) makes me start to feel like I'm parsing a regex

Keno · 2024-04-04T13:41:15Z

The !.deleted jumps out to me as a likely source of confusion, such that I'm not sure it's worth supporting.

That's an operator precedence question and falls out of the current definitions. I'm happy to make parantheses mandatory though, so that you'd have to write !(.deleted).

Keno · 2024-04-04T19:57:02Z

Updated to require parentheses after unary operators.

aplavin · 2024-04-09T09:17:46Z

Not sure how big of an improvement this addition would be in actual workflows, given a range of package-based alternatives (examples below). Seems a little bit strange to special-case property access (_.a.b.c) here, compared to indexing (_[:a][1]) and functions (year(_.date)) are just as reasonable to use and supported by existing alternatives.

But don't see anything fundamentally bad with including this either.
Accessors.jl will be able to reuse this GetPropertyLens object instead of its current PropertyLens, and provide all the same functionality as it is now.

A few specific questions

Judging from the name, GetPropertyLens, is it correct to say that Julia Base has plans to include set()/modify() functionality from Accessors as well?
If yes, would be curious to hear about. If not, naming it a "lens" seems strange.

As for implementation of that object: I wonder what's the reason to include syms::Tuple{Vararg{Symbol}} instead of just prop::T? There are two parts to this question:

Property names can be arbitrary types, not just Symbols. Common ones include Ints and Strings.
Putting a single property level would be enough: composition can be handled by using ComposedFunction in a straightforward way. That's how it also is done in Accessors.

Also, would be important to ensure and test that using this object to access properties is zero-cost. Definitely not the case now, but I guess this is a PoC.

Examples

Can be written both using packages and with this proposal:

sum(.price.dollars, items)  # proposed here
sum((@o _.price.dollars), items)  # Accessors
@p sum(_.price.dollars, items)  # DataPipes

sol[.my.nested.hierarchy.state]  # proposed here
sol[@o _.my.nested.hierarchy.state]  # Accessors

Cannot be written with this proposal:

sol[@o _.my.nested[1].hierarchy.state]  # Accessors
sum((@o _.prices[1]), items)  # Accessors
sum((@o to_dollars(_.price)), items)  # Accessors
@p items |> filter(year(_.date) > 2000) |> map(first(_.things)) |> sum(_.prices[1])  # DataPipes

topolarity · 2024-04-09T11:47:32Z

Seems a little bit strange to special-case property access (.a.b.c) here, compared to indexing ([:a][1]) and functions (year(_.date)) are just as reasonable to use and supported by existing alternatives.

One advantage that jumps out to me from the restriction is that this lens is "literal-like", so that it behaves very predictably and is eligible for the largest number of compiler optimizations.

In contrast _[x] would (presumably) capture x mutably, making it vulnerable to #15276 and "unexpected" mutation that might lead to bugs, such as keying into a Dict with the value and then mutating it or accidentally mutating the value before the closure is used.

I think it's an advantage of this proposal that such things are not possible - The constructed lens is practically "inert".

Keno · 2024-04-09T19:48:29Z

Judging from the name, GetPropertyLens, is it correct to say that Julia Base has plans to include set()/modify() functionality from Accessors as well?

No plans for this

As for implementation of that object

Implementation is a placeholder

_.my.nested[1]

I thinking indexing can and should be reserved. Arbitrary functions, I don't want to do anything with syntactically. That said, you can of course dispatch on the lens type and make any particular function work, which I think is good enough. For anything syntactically captured, you'd want to go all the way to something like #38713.

Keno · 2024-04-09T19:57:18Z

So at a glance, this PR/RFC would clash with that, since + is a valid identifier for properties, no?

Yes, .foo would no longer be available for broadcasting, but I'm not convinced there's a strong case for that to be syntax as opposed to a higher order function. As for using operators as field names, that's already an issue:

julia> struct Foo
       +
       end
       
julia> Foo(+).+


ERROR: ParseError:
# Error @ REPL[9]:2:2
Foo(+).+

#└ ── premature end of input
Stacktrace:
 [1] top-level scope
   @ none:1

julia> Foo(+).(+)
ERROR: MethodError: objects of type Foo are not callable
Stacktrace:
 [1] _broadcast_getindex_evalf
   @ Base.Broadcast ./broadcast.jl:677 [inlined]
 [2] _broadcast_getindex
   @ Base.Broadcast ./broadcast.jl:650 [inlined]
 [3] getindex
   @ Base.Broadcast ./broadcast.jl:604 [inlined]
 [4] copy
   @ Base.Broadcast ./broadcast.jl:886 [inlined]
 [5] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, Foo, Tuple{Base.RefValue{typeof(+)}}})
   @ Base.Broadcast ./broadcast.jl:871
 [6] top-level scope
   @ REPL[10]:1

julia> Foo(+).:(+)
+ (generic function with 152 methods)

This proposal follows the same rules with respect to that. .:(+) is the field accessor, plain .+ is the operator. It's a little messy yes, but having operators as field names is unusual, so I don't think it's that bad.

jariji · 2024-04-09T20:13:50Z

The original underscore issue #24990 has ~three different proposals which are all pretty good and more general than .a while only taking one extra character. If .a is merged, people will still want 1+_; if 1+_ is merged, the argument for .a will be much weakened.

Keno · 2024-04-09T20:18:57Z

I'd be ok not having this if _.a ends up existing, but I think ->_.a would be too much. My read of previous discussions on that was that _.a is unlikely to be feasible standalone.

Keno · 2024-04-09T20:39:00Z

To be clear, my preference would be to do both of

This PR
A version of underscore curry with explicit curry scope marker.

But I'm not super invested in the exact syntax of the underscore curry. The only additional point that I will make is that e.g. things like a[.b] in this proposal very deliberately indexes a with .b, while a[_.b] should arguably be x->a[x.b], which is not the same and a special case of the binding scope issue we haven't been able to resolve for the past 7 years ;).

stevengj · 2024-04-30T15:28:10Z

base/experimental.jl

@@ -375,4 +375,18 @@ adding them to the global method table.
 """
 :@MethodTable

+struct GetPropertyLens
+    syms::Tuple{Vararg{Symbol}}
+end


Couldn't we use Fix1{getproperty} for this instead of defining a new type, if getproperty is extended to take a tuple of symbols as the second argument?

Yes, we can. The original idea here was to have something more first class, but you can always turn Fix1{getproperty} into that by evaluating it on an appropriate tracing object. Since that's a more minimal change, I'll use that instead.

I am somewhat surprised we can const-prop that reliably enough (esp to something like broadcast which tries not to inline this much) without putting the symbols in the type parameters of the Lens object. Should we put it there instead?

stevengj · 2024-04-30T15:30:11Z

Note that the "tight-binding" currying proposal included the possibility of lowering f(_, a) to Fix1{f}(a), which would allow getindex to specialize on Fix1{getproperty} too.

sadish-d · 2024-06-20T13:10:18Z

In case it helps the discussion, the following gets us a syntax like ppt.name(obj), 3 characters more than .name(obj). I might prefer the former if I am forced to use parentheses like (.name)(obj) in practice.

struct Ppt end
Base.getproperty(::Ppt, s::Symbol) = (x)->getproperty(x, s)
const ppt = Ppt()
r = Ref(true)
@assert r.x == ppt.x(r)
@assert !r.x == !ppt.x(r) # negation is not ambiguous
@assert 1r.x == 1ppt.x(r) # implicit multiplication works

You could also change the name to Get and get to do get.name(obj), but I might prefer ppt. Get might be more disruptive since I can imaging it's a commonly used function name.

Keno added parser Language parsing and surface syntax compiler:lowering Syntax lowering (compiler front end, 2nd stage) labels Apr 4, 2024

Keno added 3 commits April 4, 2024 18:55

WIP

2519832

Allow :. to continue working (but consider it deprecated)

781bb32

Disallow . after unary operator

9e0f2a9

Keno added the triage This should be discussed on a triage call label Apr 9, 2024

stevengj reviewed Apr 30, 2024

View reviewed changes

This was referenced May 1, 2024

headless anonymous function (->) syntax #38713

Open

RFC: curry underscore arguments to create anonymous functions #24990

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Curried getproperty syntax #53946

RFC: Curried getproperty syntax #53946

Keno commented Apr 4, 2024 •

edited

Loading

Seelengrab commented Apr 4, 2024 •

edited

Loading

topolarity commented Apr 4, 2024

Keno commented Apr 4, 2024

Keno commented Apr 4, 2024

aplavin commented Apr 9, 2024 •

edited

Loading

topolarity commented Apr 9, 2024 •

edited

Loading

Keno commented Apr 9, 2024

Keno commented Apr 9, 2024

jariji commented Apr 9, 2024 •

edited

Loading

Keno commented Apr 9, 2024

Keno commented Apr 9, 2024

stevengj Apr 30, 2024

Keno May 8, 2024

vtjnash May 8, 2024

stevengj commented Apr 30, 2024

sadish-d commented Jun 20, 2024 •

edited

Loading

RFC: Curried getproperty syntax #53946

Are you sure you want to change the base?

RFC: Curried getproperty syntax #53946

Conversation

Keno commented Apr 4, 2024 • edited Loading

Motivation

Curried getfield

A syntax for lenses

Symbol index generalization to hierarchical structures

Additional considerations

Seelengrab commented Apr 4, 2024 • edited Loading

topolarity commented Apr 4, 2024

Keno commented Apr 4, 2024

Keno commented Apr 4, 2024

aplavin commented Apr 9, 2024 • edited Loading

A few specific questions

Examples

topolarity commented Apr 9, 2024 • edited Loading

Keno commented Apr 9, 2024

Keno commented Apr 9, 2024

jariji commented Apr 9, 2024 • edited Loading

Keno commented Apr 9, 2024

Keno commented Apr 9, 2024

stevengj Apr 30, 2024

Choose a reason for hiding this comment

Keno May 8, 2024

Choose a reason for hiding this comment

vtjnash May 8, 2024

Choose a reason for hiding this comment

stevengj commented Apr 30, 2024

sadish-d commented Jun 20, 2024 • edited Loading

Keno commented Apr 4, 2024 •

edited

Loading

Seelengrab commented Apr 4, 2024 •

edited

Loading

aplavin commented Apr 9, 2024 •

edited

Loading

topolarity commented Apr 9, 2024 •

edited

Loading

jariji commented Apr 9, 2024 •

edited

Loading

sadish-d commented Jun 20, 2024 •

edited

Loading