-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing ufunc-broadcasting across record fields #457
Comments
For the record I did propose also to only disable ufuncs on arrays where |
Also, I want to keep these rules of "what applies when" as simple as possible, to minimize surprising behavior. |
I'm not going to change the policy on this: the behavior will stay the same. (Although NumPy ufuncs won't be allowed on strings, as in the above example: #504.) |
@lgray FYI |
So yeah, my 2 cents here is that the probability of the user doing something they don't intend is incredibly large. Nick ran into an interesting case where NanoEvents jets (which are four vectors beneath) can be element-wise multiplied by each other. This doesn't have a defined behavior from a physics standpoint and would also be very easy to accidentally do or do intentionally without knowing what it meant. In either case it can result in technically valid but wrong behavior when writing analysis - which is a much less preferred outcome than throwing an error. |
The way it works now is more prone to error for the interpretation of records as objects. "Extremely" is subjective. Heck, in crafting any kind of response to this, I'm only convincing myself that it needs to be done. It is the case that examples of this antifeature are actually in some of my slides, so "fixing" it will make my slides wrong. But it is an important point. I think this issue could be implemented as an inhibitor that prevents broadcasting through any records unless a customization is defined. Note that that even means this: >>> array = ak.Array([{"only": 1}, {"only": 2}, {"only": 3}])
>>> array + 10
<Array [{only: 11}, {only: 12}, {only: 13}] type='3 * {"only": int64}'> If/when people complain that what used to work no longer works, I'll just have to explain that. (In the above example, they'd have to do |
I'm calling it a bug because I've just defined the current behavior as wrong. (Even though I've presented it in talks.) (A motivator for being short with these things is that the list of issues is a lot longer than I thought, and I have only until December 1 to make this |
Yeah - I agree that a good policy is to not allow broadcasting ufuncs through records unless that ufunc has been defined for that record type. This has the added benefit that people have to agree on the ufuncs/interface for whatever domain-specific set of records they want to use. |
@henryiii I'm working on this now: broadcasting will be prevented from passing through records unless an overload has been implemented. For example, It is a change in interface, and I've even presented examples of this interface in public, so I have to Do The Right Thing and give it a deprecation cycle. It's removing a feature, which is good because changing a feature doesn't have a way to provide both old and new behaviors for an overlapping time period. What will happen is this: if any function attempts to broadcast through a record, it will raise a warning now, saying that this will become an error in version 1.0.0 on Dec 1, 2020, unless that broadcast is an overloaded ufunc. Then I'm going to push this change into a released version of Awkward Array right away, so that about a month of releases will get the message. I'm sure there are users who update less frequently than that, and unfortunately, they won't get the message. Post-1.0.0, I'll have to start publishing regular release schedules, so that deprecation warnings can refer to both a version number and a date. |
In Awkward Array 0.4.1, which I want to release later today, the warning that you get from one = awkward1.Array([{"x": 1}, {"x": 2}, {"x": 3}])
two = awkward1.Array([{"x": 1.1}, {"x": 2.2}, {"x": 3.3}])
one + two is
But def overload_add(left, right):
return awkward1.Array({"x": left.x + right.x})
behavior = {}
behavior[numpy.add, "Overload", "Overload"] = overload_add
one = awkward1.Array([{"x": 1}, {"x": 2}, {"x": 3}], with_name="Overload", behavior=behavior)
two = awkward1.Array([{"x": 1.1}, {"x": 2.2}, {"x": 3.3}], with_name="Overload", behavior=behavior)
assert (one + two).tolist() == [{"x": 2.1}, {"x": 4.2}, {"x": 6.3}] raises no warnings or errors. |
Looks like the correct behavior to me. :-) |
Currently, all ufuncs are broadcasted across all fields of a record:
This is causing some confusion because the fields of a record have qualitatively different meanings. Some are trigger booleans, some are momenta, some are ML-derived isolation variables, some are strings...
Furthermore, when @henryiii is writing vector, he has to distinguish between
LorentzVector + LorentzVector
accidentally working because they're Cartesian (but not preserving their Lorentzness) and getting the wrong answer because they're not Cartesian. Even though the+
behavior is defined, due to the fact that they are records, he has to be sure to override every case.I think there would be fewer surprises for both users and developers if broadcasting a ufunc through a record were an error (withe a nice error message). Custom behaviors for specialized records, like LorentzVectors, would still be possible to define, as they are now, but instead of replacing wrong behavior, they'd be replacing no behavior.
Note that NumPy does not define such an operation on structured arrays:
Although it does work for Pandas:
it is not our intention to generalize from Pandas, only NumPy.
This would also affect ufuncs that return booleans, like comparison operators. For these, the argument isn't as strong. Maybe we want
to work, maybe we don't.
I'm considering removing ufuncs-through-records for all ufuncs, without affecting the custom ufunc behavior that can be assigned to any record with a name. (I'm not considering ufuncs-on-strings right now, though that's something to think about.) Does anyone have a strong argument about that?
(I suppose this needs a deprecation cycle, though it would be a little difficult getting a warning into the middle of the broadcast-and-apply. I'm tempted to remove it all at once, like a band-aid...)
The text was updated successfully, but these errors were encountered: