-
-
Notifications
You must be signed in to change notification settings - Fork 2k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pm.Model(coords=coords) should preserve coordinate type if it is a pandas Index #5994
Comments
I'm not convinced. What valuable information exactly is lost? Can you give an example? Also, with the current API (tuples) one can do In addition, remember that the If you can give examples of useful information that should be captured for a |
Fair warning up front: I feel strongly about this, I hope this doesn't get too preachy, and if it does I'm sorry ;-) Just to get the About the lost info:
I guess the reason why I feel that strongly is that I usually put quite a bit of thought into what kind of Index I use for each dimension. If all that is then just turned into a tuple that doesn't know anything about what it represents, all that is useless. |
Sorry for the delayed response, I was pretty busy this week... No doubt that First of all, I'm investing a lot of thought into standardizing the metainformation around models/MCMCs in a way that is PPL and language-agnostic. This has many benefits including things like live-streaming traces, RVs with varying shapes and much more. I'd be happy to elaborate in a call or something. Is the The current implementation just does
What does this mean for the And most people don't know about this feature, which makes it super confusing that |
Ok, I guess I'm now starting to understand why you prefer to just have tuples, and I can see that in the context of mcbackend that might make a few things easier. It seems to me we already have something like a standard for traces, namely arviz. It sure isn't perfect (eg I'd love to have hierarchical variables, for which I guess we'd first need something like pydata/xarray#4118), but overall I'm pretty happy with it. If you think you can improve on it and write a different one, that's great, but it'll take a lot of great features to make me use something based on protocol buffers when I can have xarray...
Again, I don't think I want to go from pandas and xarray to protocol buffers. Pandas and xarray sure have their flaws but for pretty much everything I might want to do except maybe streaming traces pandas and xarray seem like a way better fit. And I don't care about streaming traces a lot to be honest. |
Hm, maybe that was a bit harsh... I actually think mcbackend looks pretty nice, it's just that I'm not so sure it should be the default, and I don't like to get worse interoperability with pandas and xarray because of it. If you think live preview of the trace while it is sampling is important, I think we can find a solution for that, that is much less involved. I'll play around with nutpie in that regard a little, I think the infrastructure there should make it pretty easy to experiment there. If we want to send the trace to a remote machine, then using protobuf to send the updates might be a good solution, but I don't think this needs to be the default. For that usecase we could also always just pickle the index object if that makes it easier. |
This is already possible even without changes to PyMC, but we can reduce complexity in McBackend is essentially a cleaned-up refactoring of
In McBackend the order is:
So for the initialization of the storage backend, McBackend takes the good things from And then there's conversion to |
Oh and just to be clear, I think our intentions are well aligned:
|
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
We currently convert all coordinates to tuples, but this removes valuable information, if the coordinates already are a pandas Index.
I think we should only convert the input values if they are not already a pandas index, and if they are not we should wrap them in an
pd.Index
instead of a tuple. This is also immutable, but preserves the meaning of a coordinate.Relevant PR that changed this: #5061
The text was updated successfully, but these errors were encountered: