Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make .a syntactic sugar for i->i.a #22710

Closed
davidanthoff opened this issue Jul 8, 2017 · 36 comments
Closed

Make .a syntactic sugar for i->i.a #22710

davidanthoff opened this issue Jul 8, 2017 · 36 comments
Labels
design Design of APIs or of the language itself speculative Whether the change will be implemented is speculative

Comments

@davidanthoff
Copy link
Contributor

davidanthoff commented Jul 8, 2017

I think this has been suggested in various places before (i.e. I deserve no credit for this idea), but I couldn't find an issue for it, so here it is.

The motivation for something like this are @group .. into statements in Query.jl. With those one often gets an array of named tuples, and a super typical next step is that one wants to run some aggregation function over one specific field of the named tuple. Say A is an array of named tuples, then I might want to write something like mean(map(i->i.b,A)) to take the mean of column b.

Idea 1 would be to simple make A..b syntactic sugar for map(i->i.b,A). The aggregation expression would then be written as mean(A..b).

Idea 2 is based on an observation by @JeffBezanson in #21875:

Some languages use .a as short for x -> x.a, which is kind of nice.

Which is probably somehow related to this issue, but I'm not entirely sure.

I think maybe idea 2a might be something like .b.(A) instead of A..b? Not sure, more putting this out here for discussion. The aggregation would then be written as mean(.b.(A)). I find that a bit confusing, though.

Maybe idea 2b could be to still have .b mean i->i.b, and then make sure that all aggregation functions like mean etc. take an anonymous function as their first argument, so that one could always write these aggregations as say mean(.b, A).

queryverse/Query.jl#121 in Query.jl currently implements A..b within queries, but I'm a bit hesitant to add too much special syntax in Query.jl, especially around things where we might end up with some other solution in base

UPDATE: It seems pretty clear that idea 1 is not a good one, so I changed the title of this issue to refer to idea 2b, which seems the most plausible one.

@ararslan ararslan added design Design of APIs or of the language itself speculative Whether the change will be implemented is speculative labels Jul 8, 2017
@ararslan
Copy link
Member

ararslan commented Jul 8, 2017

.. is widely used in math packages to mean an interval, so this would be quite breaking for packages. I also find the syntax .b.(A) quite odd. An abbreviated syntax for this kind of map already exists as getfield.(A, :b), which is equivalent to broadcast(i->i.b, A).

@davidanthoff
Copy link
Contributor Author

.. is widely used in math packages to mean an interval, so this would be quite breaking for packages.

Ah, that wouldn't be good. Just out of curiosity, what is an example package like that?

An abbreviated syntax for this kind of map already exists as getfield.(A, :b)

That doesn't seem type stable, whereas both a broadcast and map version are type stable. It also is a tad too verbose for my taste.

Given the .. conflict with other packages, I think my current preference would be idea 2b in that case.

@ararslan
Copy link
Member

ararslan commented Jul 8, 2017

what is an example package like that?

IntervalSets

That doesn't seem type stable

I'm confused, why is getfield.(A, :b) not type stable but map(i->i.b, A) is? The former lowers to the same code as broadcast(i->i.b, A).

I think my current preference would be idea 2b

Of those proposed I do prefer 2b as well, though I'm still not really a fan of it. i->i.b, while more verbose, is IMO clearer than .b, since we use prefix . for dot-broadcasted infix operators. Explicitly providing the i in i.b makes it clear that it's a getfield rather than a broadcasted operator of some kind.

@davidanthoff
Copy link
Contributor Author

I'm confused, why is getfield.(A, :b) not type stable

I have no idea, I just looked at the output from @code_warntype for all three variants, and the getfield. version was the one that looked type instable.

@JeffBezanson
Copy link
Sponsor Member

Agree that we should keep .. as an operator for intervals, and it's also useful for range queries. I'm fine with the syntax .a for x->x.a though.

When you look at code_warntype for getfield.(A, :b), it applies typeof to all the arguments first, so you'll see code for type Symbol as the final argument. But at a particular call site the constant :b will be taken into account.

@davidanthoff davidanthoff changed the title Make A..b syntactic sugar for map(i->i.b, A) (or some alternative design) Make .a syntactic sugar for i->i.a Jul 8, 2017
@TotalVerb
Copy link
Contributor

This is a little sketchy to me. It's sort of introducing a global namespace of field names. What kind of accessor is .name, and what kinds of properties do you expect of this operation? I don't think you can really say, and so these things can't be used in generic code.

@JeffBezanson
Copy link
Sponsor Member

We already have a global namespace of field names, as does every other object-oriented language. In any case, those issues apply equally to a.b and getfield(a, :b); .b is just syntax for the same thing.

@ararslan
Copy link
Member

ararslan commented Jul 8, 2017

We already have a global namespace of field names

Those are called without a leading . though.

It still seems really weird and confusing to me to be omitting the object from which you're getting the field. What's wrong with i->i.b?

@malmaud
Copy link
Contributor

malmaud commented Jul 9, 2017

-1 from me. . can be already a daunting, seemingly-magical concept to newcomers because of the broadcast lowering. The last thing we need is to for it to have more magical properties.

@davidanthoff
Copy link
Contributor Author

What's wrong with i->i.b?

For my Query.jl use case it is just too verbose (e.g. see this comment).

. can be already a daunting, seemingly-magical concept to newcomers because of the broadcast lowering. The last thing we need is to for it to have more magical properties.

I hear you, that worries me too. I'm not particularly wedded to this syntax, but so far I couldn't think of anything better, and (at least from my perspective) the benefits of having something for this use-case outweigh the costs, even if we end up using the .b notation.

@stevengj
Copy link
Member

stevengj commented Jul 10, 2017

If we adopt @JeffBezanson's suggestion for dot overloading, then Field{:b}(x) could be defined as x.b.

In my mind, the main use for this is for things like map and broadcast/dot calls. For example: map(Field{:b}, x) or sqrt.(Field{:b}.(foo.(x))). Or, in @davidanthoff's example, @select {g.key.metric, m = myfun(Field{:score}.(g), Field{:track_id}.(g)) }.

Field{:b} is reasonably terse while remaining fairly readable and explicit. (And if it is not terse enough, we could use dot overloading to make this equivalent to Field.b.)

(Is there a problem that dot overloading doesn't solve? 😉 )

@stevengj
Copy link
Member

stevengj commented Jul 10, 2017

Another possibility would be to use $.b as sugar for x -> x.b and $[i] as sugar for x -> x[i], but $ is pretty overloaded already.

Or _.b and _[i], since we're already turning _ into a quasi-magical placeholder symbol (#9343)?

@malmaud
Copy link
Contributor

malmaud commented Jul 10, 2017

I definitely feel your pain about verboseness though, @davidanthoff . Perhaps we just have to resort to having a macro that goes in front of a query than relying on changes to Julia syntax though.

@JeffBezanson
Copy link
Sponsor Member

resort to having a macro that goes in front of a query than relying on changes to Julia syntax

I think it's highly valuable to try to think of generally-usable syntax that makes macros less necessary.

@malmaud
Copy link
Contributor

malmaud commented Jul 10, 2017

OK sure, I am all for bending Julia's syntax to be more accommodating to data analysis :) I was just trying to be sensitive to the valid complaints that Julia syntax should not become the symbol soup of Mathematica etc.

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Jul 10, 2017

I would still like to have a terse function syntax based on _ so that _[i] and _.b work as @stevengj mentions above, but it's not a feature we need for 1.0 and since _ is already disallowed as an r-value, we're in the clear to give it some new meaning in the future.

@stevengj
Copy link
Member

stevengj commented Jul 10, 2017

Basically, _ could become an implicit single-argument currying syntax when used as an r-value. f(_, y) would be sugar for x -> f(x, y), and _.b and _[i] would just be special cases of this for getfield and getindex. People have also suggested using ~ for this. (See also #5571 and #554.)

@JeffBezanson
Copy link
Sponsor Member

_.b is definitely an appealing option here. The syntax rule could be that the anonymous function contains the single function call directly containing the _. (Similar to how T{<:S} puts where outside one set of curly braces.)

@yurivish
Copy link
Contributor

yurivish commented Jul 10, 2017

Here's a previous discussion with a bunch of good examples to check against: #5571 (comment)

@stevengj
Copy link
Member

Note that @davidanthoff can already use the _.b syntax in Query.jl, since it parses just fine.

@davidanthoff
Copy link
Contributor Author

I really like the _.b idea, and especially that I can use it now :)

For my use-case it does kind of rely on reducer functions having a combined map-reduce method that accepts a map function as an argument. Currently many reducer functions don't have such a method. Over in #20402 @StefanKarpinski has one item "Reducers APIs. Make sure reducers have consistent behaviors – all take a map function before reduction; congruent dimension arguments, etc." I guess if that happens for 1.0 all is good and we would have a pretty elegant solution for the Query.jl use-case (and many others). Thanks all for the great ideas :)

@bramtayl
Copy link
Contributor

bramtayl commented Sep 9, 2017

How about .b being Field{:b}? Then .b.a would be broadcast(Field{:b}, a).

@bramtayl
Copy link
Contributor

bramtayl commented Sep 9, 2017

Eh maybe inconsistent if .field is essentially a function with special suffix syntax. Still, plain old .field would be very useful, because once the compiler knows field is a type parameter, not a value, all sorts of operations can be shifted from run time to compile time.

@davidanthoff
Copy link
Contributor Author

I thought a bit more about this, and I think I could actually solve the original issue in Query.jl that motivated this issue in a much more elegant way if we had dot overloading a la #1974. So from my point of view we could close this issue and just add one more cheer for #1974.

Essentially, I could then extend the Grouping container that holds results from a @group operation in Query.jl so that g.a would extract column a from the group g if g happens to be a collection of NamedTuples. That would be much more consistent with some future table type where df.a would extract a column from a table type, something that would also be enabled by #1974.

@davidanthoff
Copy link
Contributor Author

I'm going to close this issue because I can essentially solve this in a really good way for Query.jl with the new dot-overloading.

Having said that, one crazy idea might be to add such a dot-overloaded method to any AbstractArray. A modest version would be for any AbstractArray that holds named tuples, the radical option would be for just any AbstractArray. In that world, if a is an AbstractArray, a.b would always end up extracting a collection of the b properties of the individual elements of a.

@JeffBezanson
Copy link
Sponsor Member

Wouldn't that be implicit vectorization of the kind we've moved away from?

@davidanthoff
Copy link
Contributor Author

Hm, I'm not sure? It would unify the user API for arrays-of-struct and struct-of-array containers in the table world. I assume DataFrame at some point will get df.a as a shortcut for df[:a], and then a DataFrame and an array of named tuples would both provide x.a as a way to get the a column. I'm not sure that is good, but it could be done ;) I guess another question is what else a.b could mean...

But in any case, clearly not 1.0 stuff.

@JeffBezanson
Copy link
Sponsor Member

To me, a table-like thing is semantically always an array or collection of structs. It might be stored as a struct of arrays, but should have the same API. So for example map(i->i.a, table) can be O(1) and non-copying if the table is stored as a struct of arrays. That needs better syntax, but you get the idea.

@bramtayl
Copy link
Contributor

bramtayl commented May 3, 2018

It would be convenient to make . syntax available to package authors, lowered to something like

.variable => dot(:variable), where dot isn't defined in Base. I'd like to be able to use dot to create custom keys. Currently the syntax to get symbols into the type domain is somewhat ugly; alternatives like Dot(:variable) and dot"variable" are definitely not as pretty as .variable.

@mbauman
Copy link
Sponsor Member

mbauman commented May 3, 2018

Why do you need it in the type domain? IPO on 0.7 will propagate symbols as constants to any inlined functions. From there you can lift them to the type domain yourself if you really wish.

@bramtayl
Copy link
Contributor

bramtayl commented May 3, 2018

Yeah, I've worked pretty heavily trying to get constant propagation to work. Even with the changes in 0.7 constant propagation is finicky. It's disabled during recursion, and it doesn't work through slurps, making lispy tuple programming very difficult (though not impossible with the judicious use of @pure). Unless constant propagation becomes a semantic guarantee, it's a lot more reliable just to keep everything in the type domain as early as possible. Making dot available to package authors could potentially satisfy both mine and David's needs.

@bramtayl
Copy link
Contributor

bramtayl commented May 3, 2018

@stevengj
Copy link
Member

stevengj commented May 3, 2018

@bramtayl, in #24990, _.variable already gets lowered to a Fix2{typeof(getproperty),...} object that you could dispatch on.

@mbauman
Copy link
Sponsor Member

mbauman commented May 3, 2018

And #26826 seeks to address constant propagation through varargs.

@bramtayl
Copy link
Contributor

bramtayl commented May 3, 2018

Another issue is that constant propagation doesn't survive keyword arguments and named tuples.

@bramtayl
Copy link
Contributor

bramtayl commented May 3, 2018

Oh and here's another option: define a custom type with overloaded dots, something like

struct K 
end

@inline getproperty(k::K, s::Symbol) = Key{s}()

const k = K()

k.a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design of APIs or of the language itself speculative Whether the change will be implemented is speculative
Projects
None yet
Development

No branches or pull requests

10 participants