-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement shape and dims for v4 #4552
Comments
Would specifying shape just be a redundant way of specifying size (+ some informative message if the shape didn't make sense)? |
Because One idea that circulated in our chat discussion was to slightly change what If distribution parameters are symbolic, it's not immediately clear how the RV looks like:
Specifying x = MvNormal(cov=cov_prior, shape=(3,)) # can no longer be broadcasted into a different size It's the same as x = MvNormal(cov=cov_prior)
aesara.tensor.specify_shape(x, (3,)) # this Op needs to become a function output to be effective! We could implement |
I like the idea of distinguishing batch from event dimensions. This is currently one of the main pain points (for me) with ArviZ converters. The variables in posterior predictive, observed data and log likelihood have the same shape for univariate distributions, but the log likelihood has a subset of the shape for multivariate. ArviZ currently does a quick and dirty check when retrieving the log likelihood (for all converters, as AFAIK none of the libraries properly supported distinguish on this) which is dropping the last dimensions when a shape mismatch is found to see if this solves things. The example above would therefore break ArviZ when using custom dims because the event dim is not the last.
would work. If the variable I don't really know if this would help (I guess we can internally make sure the event dims are the last if using |
Shape was definitely one of those "invisible" features that took me some time to figure out. Looking at questions in the discourse, it seems to crop up frequently. The other that seems to confuse people (although it's just theano borrowing from the numpy API) is fancy indexing: indexes = np.array([0, 1, 2, 3, 0, 1, 2, 3])
a = pm.Normal('a', 0, 1, shape=4)
like = pm.Normaly('like', a[indexes], 1, observed=data) The new coords seems even more enigmatic to me. Would be nice to have a doc notebook just walking through the different shape/size and indexing possibilities in pymc3. I know we have some examples, but I think (without proof) that they tend to be difficult to find. Edit: This was brought up here #4007 |
In Also, I don't see any real advantages to porting the More importantly, the As @michaelosthege mentioned, if one wants to constrain the shape of a variable, they can use The problem I see with advertising a This |
I suppose I would need to be sold with examples where size is superior. If
we can market this as a big improvement it's a different story.
…On Sat, Mar 20, 2021, 00:07 Brandon T. Willard ***@***.***> wrote:
Shapes are some of the hardest things to get correctly and I'm very
concerned that users might shy away from v4 because of that.
In v4 users are no longer required to determine the resulting shapes of
the objects they create. In v3 a user is forced to do this themselves,
and, as we all know, it's confusing and leads to numerous errors. *Those*
are actual reasons for users to shy away from PyMC. I don't understand how
asking a user to do *less* is off-putting.
Also, I don't see any real *advantages* to porting the shape
keyword—aside from backward compatibility, but, since we're explicitly
creating a *new version*, backward compatibility breaks like this are
completely justified.
More importantly, the shape keyword has a few very real *disadvantages*,
and we've been dealing with them for quite a while now (e.g. it's
confusing, error prone, limits the flexibility and reuse of our models,
etc.), so I'm very much against propagating this key issue into the next
version.
As @michaelosthege <https://github.com/michaelosthege> mentioned, if one
wants to constrain the shape of a variable, they can use specify_shape,
and we *could* implement a shape keyword option using that; however, a
user that very specifically wants to restrict the shape of a random
variable can just as easily apply specify_shape after creating said
variable.
The problem I see with advertising a shape keyword in v4 is that v3 users
who have learned this bad habit will likely end up doing it in v4, giving
rise to all those old issues and limitations. This is also likely to happen
when new users attempt to recreate models written for v3.
This shape keyword is both the interface to—and cause of—one of the
biggest problems we're fixing in v4, so it's the last thing we want to
unnecessarily shoehorn back into it.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4552 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFETGEI3VAK6X3WJ7CDODTTEPKMZANCNFSM4ZKXGZTQ>
.
|
Here's a very simple example that works in my local #4551 branch of import numpy as np
import pymc3 as pm
import aesara
import aesara.tensor as aet
data = np.random.normal(size=(1, 3))
X_at = aesara.shared(data)
with pm.Model() as m:
beta = pm.MvNormal("beta", mu=aet.zeros((X_at.shape[1],)), cov=aet.eye(X_at.shape[1]))
Y = pm.MvNormal("Y", mu=X_at.dot(beta), cov=aet.eye(X_at.shape[0]), size=(2,))
with m:
priorp_samples = pm.sample_prior_predictive(samples=10) >>> priorp_samples["beta"].shape
(10, 3)
>>> priorp_samples["Y"].shape
(10, 2, 1) new_data = np.random.normal(size=(2, 4))
X_at.set_value(new_data)
with m:
priorp_samples = pm.sample_prior_predictive(samples=10) >>> priorp_samples["beta"].shape
(10, 4)
>>> priorp_samples["Y"].shape
(10, 2, 2) This would not even be remotely possible if the shapes of Otherwise, I invite everyone to imagine how this implementation would change if symbolic values were to be used in the |
OK that's awesome, I get it now. Shapes are automatically propagated
through.
…On Sat, Mar 20, 2021, 01:24 Brandon T. Willard ***@***.***> wrote:
I suppose I would need to be sold with examples where size is superior. If
Here's a very simple example that works in my local #4551
<#4551> branch of v4:
import numpy as npimport pymc3 as pm
import aesaraimport aesara.tensor as aet
data = np.random.normal(size=(1, 3))X_at = aesara.shared(data)
with pm.Model() as m:
beta = pm.MvNormal("beta", mu=aet.zeros((X_at.shape[1],)), cov=aet.eye(X_at.shape[1]))
Y = pm.MvNormal("Y", mu=X_at.dot(beta), cov=aet.eye(X_at.shape[0]), size=(2,))
with m:
priorp_samples = pm.sample_prior_predictive(samples=10)
>>> priorp_samples["beta"].shape
(10, 3)>>> priorp_samples["Y"].shape
(10, 2, 2)
new_data = np.random.normal(size=(2, 4))X_at.set_value(new_data)
with m:
priorp_samples = pm.sample_prior_predictive(samples=10)
>>> priorp_samples["beta"].shape
(10, 4)>>> priorp_samples["Y"].shape
(10, 2, 2)
This would not even be remotely possible if the shapes of beta and/or Y
were fixed via a shape keyword.
Otherwise, I invite everyone to imagine how this implementation would
change if symbolic values were to be used in the shape value. I guarantee
that the change wouldn't be an improvement in any way.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4552 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFETGCBMNK2SSAX33TAAKDTEPTNFANCNFSM4ZKXGZTQ>
.
|
The fact that this use case is now possible is awesome! In 95 % of the models I write, I'm more concerned with the inference on the data I have, not the out-of-sample prediction using shared variables. |
Don't forget that, because we can do this shared variable thing, it also means that people who run the same model on many different datasets will no longer need to recreate their |
Sure, that's great. Let's figure out how to make
|
From a naive/conceptual point of view, I see Whenever I have to do cross-validation, I often use It would be great to pair named dimensions with the size argument and then define the coordinate values at sampling/conversion time which is what feels more natural. In this crossvalidation/out of sample sampling, the dimension id generally is part of the model and I think it would be useful to allow having this info still there like we do now with dims. The coordinate values however, are not part of the model and I think they should be better off not being tied to a model but to data/sampling realizations |
The benefit I see is the shape propagation, not necessarily that we can switch things for prediction. With import numpy as np
import pymc3 as pm
import aesara
import aesara.tensor as aet
n_doses = 3
n_subjects = 5
data = np.random.normal(size=(n_doses, n_subjects))
X_at = aesara.shared(data)
with pm.Model() as m:
beta = pm.MvNormal(
"beta",
mu=aet.zeros((n_doses,)),
cov=aet.eye(n_doses))
Y = pm.MvNormal(
"Y",
mu=X_at.dot(beta),
cov=aet.eye(n_states),
dims=("experiments")) Without this, the likelihood would have to look like: Y = pm.MvNormal(
"Y",
mu=X_at.dot(beta),
cov=aet.eye(n_states),
dims=("experiments", "doses", "subjects")) Maybe we want to rename |
I don't see how this reduces confusion. On the contrary, I find There are tons of cases where you need to know the actual
...just to name a few. This whole shape propagation/broadcasting is nice if you need it and if you know how it works. For the 90 % other situations it's immensely important that model code is expressive and doesn't require you to evaluate in-line mathematical expressions such as
I think this RV is waay easier to work with. This way I won't be surprised that
|
Having |
It sounds like you're confusing shape with dimensions. In Theano/Aesara the number and (operation-induced) order of dimensions cannot be symbolic, so, when it comes to dimensions nothing changes. If one wants to provide labels for the dimensions, they can; the use of a
These operations do not require knowledge of the objects' exact shapes, and this is evidenced by the example I gave above, which uses these exact operations (both implicitly and explicitly).
Anything that involves plotting is something that exists outside of the PPL, and—thus—outside of the scope/relevance of symbolic shape considerations. In other words, if one is at the point of plotting something, then they necessarily have a trace with concrete non-symbolic shapes, so this
Again, if one has a trace, then all the shapes are already worked out and known, so this discussion isn't relevant.
The point of the change from Worse yet, the shape propagation and broadcasting is always needed and used. The difference is that under
What is this 90%? Why would one need to evaluate an expression to figure out the dimensions? If one can specify an exact shape—for the Also, there's nothing expressive about the use of a
Once again, these statements are conflating the DSL/PPL aspects of PyMC3 with the data it produces. If one wants to add dimension names to a random variable, that can be done at any point, and is not affected by manually vs. automatically computed shapes. As I stated before, the number of dimensions cannot be changed in Theano/Aesara; the only thing that is automatically computed is the length/size of each dimension. If one can specify a valid At a point, arguing for this |
We have obviously very different opinions about I am personally not particularly attached to
Knowing the The key point being: Verbosity is often a good thing! Now here's maybe a way out of this dilemma. What if we start using pm.MvNormal(mu=[1,2,3], cov=np.eye(3), shape=(2, 3))
# is the same as
pm.MvNormal(mu=[1,2,3], cov=np.eye(3), shape=(2, ...))
# is the same as
pm.MvNormal(mu=[1,2,3], cov=np.eye(3), size=2) The same with coords = {
year=[2020, 2021],
drink=["Water", "Orange Juice", "Apfelschorle"],
}
pm.MvNormal(mu=[1,2,3], cov=np.eye(3), dims=("year", "drink")) # explicit, verbose and shape-flexible!
# is the same as
pm.MvNormal(mu=[1,2,3], cov=np.eye(3), dims=("year", ...)) # implicit and ndim-flexible |
If that helps, we could do it. Remember, we can also strip those "base" dimensions from a user-specified |
I agree with the first point, but not so much with the second. We have an opportunity to make a clean slate and break virtually anything, and I think we should make the most of of this for two reasons. First is that it is less work for us, starting the new major version with multiple alternatives of doing things and having to handle deprecations from the very beginning seems like extra and unnecessary work to me. Second is that I think users will rather have things break once and then last for multiple versions than having to be constantly paying attention to the deprecations and api changes between minor versions.
This would be amazing, I like the ellipsis option with dims, however I don't think it would "keep" the dimensions from the variable who is actually defining them when broadcasting, so we'll have to take that into account when converting to ArviZ, otherwise arviz will see arrays with n dims and only m names for its dims and error out. |
You're right. What I meant is more along the lines of doing
Yes, with explicit One option (also looking at #4565) could be to add a |
This could (with the could being quite the supposition) be solved by using xarray as a backend, otherwise I fear we will end up making xarray from scratch on aesara. cc @ColCarroll @canyon289 as they are listed as possible mentors on that project and may have better ideas on this end. If Aesara were a dynamic computation library, it would be quite easy to get xarray to work with it, xarray doesn't really do any computation, only handles dims, coords and aligns the underlying arrays, everything else works as long as the underlying data structure implements the For static computation I think it won't be as straighforward, but it could be possible anyway, a la having References
|
Seems we all agree that
|
Re Right now I am 51% prefer |
I don't think there is agreement here at all that size is better than
shape. @aseyboldt, Michael and I prefer the explicit shape.
Personally I like the approach with Ellipses best where we can have
optional shape propagation but the default stays explicit, backwards
compatible, and compatible with dims.
…On Sat, Mar 27, 2021, 08:38 Junpeng Lao ***@***.***> wrote:
Re size naming, I am extremely torn between "Explicit is better than
implicit" (we should name it like batch_shape or plate_shape) and "Simple
is better than complex" (size is simple and consistent with numpy)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4552 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFETGAU32FATQCDIXGK6LLTFWDPHANCNFSM4ZKXGZTQ>
.
|
I hope it's clear to everyone that the ellipses approach is just a longer version of |
Yes, that's what I like about it. In most cases I think users will find
shape/dims easier. Advanced users can use Ellipses. That way we maintain
backwards compatibility but have a cool new feature rather than explaining
users a subtle difference in how we now handle shapes.
…On Sat, Mar 27, 2021, 09:58 Brandon T. Willard ***@***.***> wrote:
I hope it's clear to everyone that the ellipses approach is just a longer
version of size (e.g. shape=(2, 3, Ellipsis) is equivalent to size=(2, 3))
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4552 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFETGBLDQORQNJ7QS7PNLTTFWM3XANCNFSM4ZKXGZTQ>
.
|
This is what we agreed on after discussing our last lab meeting:
|
I thought that was |
Don't we translate |
I think I got lost during the meeting, nevermind. I am probably too xarray biased for my own good. So to clarify:
|
Here's how I see it.
I think that's the best option, just to remove confusion and too many options for doing the same thing, although we'd probably be artificially removing that capability. It's a good question.
I think yes.
No,
I assume that we currently look the shapes up during model definition, right? Ideally we can make that dynamic then but not sure how easy that would be. Ideally we move the whole names business into aesara eventually: pymc-devs/pytensor#352
That could be tricky, ideally we somehow infer this during model spec but not sure how easy that would be. I'm not sure how it can be solved :-/. |
All resizing beyond parameter-implied dimensionality is done from: - `shape` or `size` in `Distribution.dist()` - `dims` or `observed` in `Distribution.__new__` Closes pymc-devs#4552.
All resizing beyond parameter-implied dimensionality is done from: - `shape` or `size` in `Distribution.dist()` - `dims` or `observed` in `Distribution.__new__` Closes pymc-devs#4552.
All resizing beyond parameter-implied dimensionality is done from: - `shape` or `size` in `Distribution.dist()` - `dims` or `observed` in `Distribution.__new__` and only in those two places. Closes pymc-devs#4552.
All resizing beyond parameter-implied dimensionality is done from: - `shape` or `size` in `Distribution.dist()` - `dims` or `observed` in `Distribution.__new__` and only in those two places. Closes pymc-devs#4552.
Always treats `size` as being in addition to dimensions implied by RV parameters. All resizing beyond parameter-implied dimensionality is done from: - `shape` or `size` in `Distribution.dist()` - `dims` or `observed` in `Distribution.__new__` and only in those two places. Closes pymc-devs#4552.
Always treats `size` as being in addition to dimensions implied by RV parameters. All resizing beyond parameter-implied dimensionality is done from: - `shape` or `size` in `Distribution.dist()` - `dims` or `observed` in `Distribution.__new__` and only in those two places. Closes pymc-devs#4552.
Always treats `size` as being in addition to dimensions implied by RV parameters. All resizing beyond parameter-implied dimensionality is done from: - `shape` or `size` in `Distribution.dist()` - `dims` or `observed` in `Distribution.__new__` and only in those two places. Closes #4552.
Always treats `size` as being in addition to dimensions implied by RV parameters. All resizing beyond parameter-implied dimensionality is done from: - `shape` or `size` in `Distribution.dist()` - `dims` or `observed` in `Distribution.__new__` and only in those two places. Closes pymc-devs#4552.
closed by #4696 |
Always treats `size` as being in addition to dimensions implied by RV parameters. All resizing beyond parameter-implied dimensionality is done from: - `shape` or `size` in `Distribution.dist()` - `dims` or `observed` in `Distribution.__new__` and only in those two places. Closes #4552.
@brandonwillard is proposing to remove the current
shape
kwarg and replace it withsize
which works more similar to the waynumpy
handles it. The difference is thatsize
only specifies the batch dimension, while the event dimension is taken from the distribution. For univariate distributions, there is no difference, but for multivariate, there would be.All three would lead to the same shaped
(2, 3)
MvNormal.From @brandonwillard: "in v4, all the broadcasting and shape calculations for the arguments are already worked out so we always know what the shape of a single sample is under the given parameters. size just says that you want size-many more of those samples. as a matter of fact, size can be symbolic in v4!"
The big downside is that we're breaking backwards compatibility in a major way. Shapes are some of the hardest things to get correctly and I'm very concerned that users might shy away from
v4
because of that.For now, @brandonwillard @michaelosthege and I agree that definitely we should start with keeping
shape
for compatibility. The question is whether we should ever remove it completely, which is where I'm having concerns. After talking @junpenglao who also shared these concerns.The text was updated successfully, but these errors were encountered: