-
-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorize observe
method and _correct_for_light_travel_time
#526
base: master
Are you sure you want to change the base?
Conversation
mpc.comet_orbit
, mpc.mpc_orbit
, etc.mpc.comet_orbit
, mpc.mpc_orbit
, etc.
It's hard to say without diving into each case. Could you create a little repro for each of them? Each could be turned into a little test case and we could solve those two problems before landing this PR. |
I added some tests for multiple comets and multiple minor planets that trip up |
Thanks! I'll plan to take a look at them early next week, if I don't get to it this weekend. |
This is the script I've been using with a debugger:
|
I found it helpful to write down my reasoning for what should happen step-by-step in The call to the observe method from the above script looks like this:
Once inside the function,
So far so good. Now we get to this line:
Edited to add: Inside of the loop, the argument used to call |
OK, I think that latest commit successfully vectorizes |
The changes necessary to do this raise some interesting questions:
we could just have this:
|
Though I suppose if it's going to be possible to observe multiple objects from a vectorized skyfield object it would have to be possible for the resulting |
OK, that last commit wast the last patch necessary to allow the above script to run without errors. |
Thanks for looking into this problem! Here are my initial thoughts:
Stepping back, I like the questions you are asking about vectors in general, that's a good starting point. Here's a first try at some responses:
Yes, which sounds symmetric with the current design: the
I would prefer that we not impose restrictions on the internals of the vector sum. While I suspect it's true that the vector sums we build in the near future will indeed have the internal property you outline here, I'm not sure that we'd want to ratchet down a requirement on them.
Agreed, I don't think this needs to be enforced. Vector sums should be treated by other code like simple vector functions that have Finally: I think your questions about dimensions really go best to the heart of the problem; we might need to define more useful constraints on the dimensions users supply. To get that question underway, let's start with a test case for light-travel time. I'll go try writing one up, and then I'll post it as a comment here for us to discuss. Then, once we agree on a good shape for the problem, we can look at getting the light-travel routine ready for this problem! |
mpc.comet_orbit
, mpc.mpc_orbit
, etc.observe
method and _correct_for_light_travel_time
I moved the changes that were for vectorizing |
Thanks for splitting this into two PRs! The idea of a “vector of comets” (or minor planets, etc) introduces a novel semantic question for Skyfield. Imagine a vector function But what if instead,
I suspect that the second rule will be less surprising to users than the first rule. But, more importantly, if we adopt the first rule, then I think we make it impossible to apply If someone really wanted |
I am having trouble thinking of a use case for option 2 outside of Independent of whether we choose option 1 or 2, the
Right now this PR attempts to vectorize |
I believe that this would also be a pretty common use case for this functionality. At least coming from observation scheduling and results processing background this makes a lot of sense. |
A busy January has prevented me so far from following up with a knock-down drag-out essay here that would weigh every possible issue and reach a conclusion, so I'm instead going to try making a short and simple comment every day or two so that y’all are not kept waiting forever, and hope thereby to gradually answer the interesting points that you each have raised. Today’s single point: I do not envision Skyfield growing to contain any So, yeah. Single small points are definitely easier to write up; that one wasn't bad. I'll plan to continue to return to this issue a few times a week, and hopefully the comments will lead somewhere! Thanks for y'all's patience as I've not yet been forthcoming with a conclusion pronouncement :) |
A second thought is that I am not sure it is only comets that would need a second secret way to generate positions, if we break away from NumPy’s behavior and decree that Unless comets are a special case for a reason that I am not yet seeing, then all of the Skyfield vector functions — including those that might be written in the future — would need the secret method added that broadcasts like NumPy does when an array of n objects is asked for their position at n distinct times. Which seems like a steep price in code and tests and future maintenance. |
Ahh, I think up until now I've been a little confused when you talked about generating nxn positions as "break[ing] away from NumPy’s behavior". The way I've been thinking about it is that if we have
Yes, that is what I was suggesting before, but I've since learned of a way to avoid this: suppose you have an array with shape
I think that's correct in that either option 1 or 2 should be the way vectorization works everywhere it's implemented. But I don't think we would have to implement vectorization for every object type. If we do go ahead with option 2 I think we should evaluate whether the increase in complexity would be worthwhile. For the use cases I can think of I'm not sure it is, but there might be others I haven't thought of. |
One thing I want to mention is I hope I don't come across as adversarial in this discussion. I'm sure that whatever path you choose will turn out well and either way I'll be happy to help make it happen! |
Don’t worry, your comments have been helpful! And thanks for your patience — it will be another week or two before I am finally able to return to a normal schedule. |
No worries! I've got lots to stay busy with :) |
I note that the new issue #539 also seems to ask for Skyfield to support arbitrary n×m vectorization combinations. |
Perhaps these PR's having to do with vectorization could go into their own branch. Knowing more about what changes are necessary might make it easier to decide between the different kinds of vectorization. |
This might be an area where I’m not familiar enough with how folks use GitHub to know the advantages — are there operations you would like to perform that become possible, or easier, if instead of living on your own branch, they live on a branch in the main repository? I did, by the way, spend time in late January looking at this, and as before ran into the fact that NumPy is simply backwards from how I would like a numeric library to work. So there are two possible approaches:
While traveling a couple of weeks ago (quick drive to visit parents, whom I hadn't seen in a year), I did get far enough into But as I've just finished up, and published yesterday, the Ptolemy medieval cosmos model that I promised long ago in a PyCon Ireland talk: https://rhodesmill.org/ptolemy/ — and as I just reached the chapter in re-reading The Lord of the Rings where Aragorn has to decide whether to go east or west from Rauros, it’s time to return to Actually, here’s what I should do: I should not choose, but instead implement a vectorized I'll post back here with the results! |
@JoshPaterson — All right: it looks like the couple of weeks of thinking I've done since last attempting to upgrade
And as you'll see if you try running the new script (see the commit linked just above this comment), you'll see that fixing those subtractions produced a Of course the design script now needs to be taken farther. What if there are But for now I'm going to stop and let you read over the code first and respond to the ideas there. Try running it, and check out the output of the |
The light travel time routine can survive almost unchanged if instead of expecting NumPy broadcasting to make up for extra missing dimensions, we add an extra dimension to the observer’s position before calling, to avoid NumPy needing to do its own back-to-front dimension matching.
@JoshPaterson — More good news: after working through the examples described above, it occurred to me that maybe we should just avoid having mismatched dimensions in the first place. If our calls to NumPy subtraction, for example, never asks NumPy to fix a mismatch between 2 dimensions in one array and 3 in the other, then Skyfield’s code will never even be sensitive to whether NumPy matches front-to-back or back-to-front. So maybe we just need checks in front of big operations that would stop and ask the user to make their dimensions match, before we proceed with vector operations? You can check out the commit I've just made (see the link above) to see where things stand, as I'll now go look at why the JPL ephemeris logic can't happily accept a time with 2 dimensions instead of 1. |
The problem I'm trying to solve is that this PR depends on #532, so that makes it hard to keep experimenting on this PR before that one is merged. The only advantage I had in mind for having the branch here versus in my fork was that it would be easier to discuss it here if the branch was also here. There might be a better way around this problem? Maybe I could keep the branch in my fork and then start a discussion in the discussions tab if becomes necessary.
I'm glad you mentioned that here; it's super interesting, especially the scripts for generating the animation! I'm taking a look at the different ways of vectorizing |
I changed the version of To summarize the methods for vectorization so far:
I think any of these could work, but the last one seems to have the most potential advantages:
Do you have any ideas about what user code that adds dimensions could look like? |
Good question — and, no. Not yet, because as I've worked through those light-time examples I've been working my way through the possibilities for the first time myself. My guess is that we'll want to make whatever (hopefully small) tweaks are necessary to get all position routines working in the n×m×p case (where there are arbitrary user-defined dimensions involved), and then add big informative messages when someone tries something like I agree with you entirely that explicit dimension adjustments in user code is preferable to magic invisible dimension adjustments in Skyfield code, not only for keeping Skyfield simpler, but so user code explains itself as best it can. I am glad you also see what might be a way forward here without for loops! I was hopeful that NumPy could be corralled into doing the right thing. I plan to work on my design example more this week. Depending on how much free time I have, we might have a way forward ready by the end of the week! |
One thing to look out for is that functions that contain branches or loops that cause different parts of arrays to go through different paths can't be vectorized with so few changes as The changes in
I looked ahead at the functions that need to be vectorized for the |
This PR vectorizes
mpc.comet_orbit
andmpc.mpc_orbit
while also allowing them to continue working the way they did previously for scalars.I changed
keplerlib.propagate
so that returned arrays have shape (xyz, t, n) in light of the discussion in #520.As far as I know no further changes are necessary in
mpc.py
orkeplerlib.py
. Here are the problems I currently know about:comets.at(t).xyz_position(ecliptic_frame)
creates a broadcasting error infunctions.mxv
earth.at(t).observe(comets)
creates a broadcasting error invectorlib._correct_for_light_travel_time
Should I try making changes directly where those problems are happening, or is there perhaps a different approach I'm not aware of?