-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify dim tags for layers that create new axes #597
Comments
Even for simple layers like |
We should be a bit more systematic here by specifying the list of layers which should be extended/modified by such feature. This should directly be edited into the main issue description. I will start on this but please extend. Or if you are unsure about some layer, just ask in a comment. |
Btw, about Sisyphus, or in general simpler ways to specify new dim tags, and also on the dim tag description: I'm not sure on any of these. About Sisyphus: Whatever technical limitation there might be in Sisyphus, this could be fixed. E.g. if the issue is that On using Note my recently added comment on that in
I don't like it to rely on a string as mean for identification. This can lead to very strange bugs in rare cases which are probably hard to debug. Also, the automatic Also, the automatic With automatic But anyway, I want to get away from relying on See also #634 for some related discussion. On simpler ways to write that in the config: I was also thinking about this. This might be a valid point. I think with the proposed new way to define the network via returnn-common, this becomes probably already much simpler. Maybe it is also just about the |
Related is also to enforce dim tags to be unique (#632). |
One alternative to this is to use the E.g. when you used |
Note on your initial example using What you actually want is the
This would use |
Btw, this is not really either-or. We can have both such I think we need to discuss each (or most) layers individually. E.g. |
Some further ideas along the line of
Note that we would enforce in any case that the matching is unique. Once it is not unique, it would throw an error. For the layers we would probably use the |
Another idea I had (also for returnn-common) was that the user would explicitly specify all output dim tags. So basically replacing what the user anyway writes as comment (like This could not just be used for verification, but we can actually infer some of the options of the layer (e.g. There are some cases where the order is relevant though, such as Or we extend For |
For
E.g. this one seems dangerous, because static dims might accidentally match up when the user chooses different hyperparameters. |
Also, yet another related thing: It might be helpful to also explicitly specify the input dims or part of it. This is standard in PyTorch, e.g. This would be partly covered by such |
No, in those cases there can only be one dynamic dim tag (if at all), and all others must be static. And then it can be derived. So it should always be possible to derive it for
Currently people often use things like
But as said, this doesn't have to be either-or, we can do both, and also discuss them independently.
But then it would throw an error, so it cannot accidentally match multiple. And also this probably would not happen so often. |
Yes true. For me, for most layers such But then there are many layers for which this might be useful, e.g. I would like to use this for
This would work then. So you would e.g. say
(I know there is no But that's not so nice that here you have to specify the input axis twice. And in the case of
better. |
In case of |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Btw, |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I don't see the point in this generic solution then. If in the case of Having a parameter |
No. We should not mix up multiple operations, like the linear transform + assigning new dim tags.
I thought it might be nice to have such general interface for any such layers which create new axes (which we discuss here in this issue). Maybe it does not fit for all layers, but for many, it would. E.g. for It would not fit for e.g. |
Ah okay okay, I understand now, sorry. |
I really like this idea. Maybe one could also use Python type annotations for this, this would make the syntax nicer (not sure if thats so robust though / whether you can access them from code easily).
How do you mean this last point here? How would e.g. |
So you say because Btw, |
For |
So this would only be a returnn-common feature then? Because RETURNN itself would use still the same net / layer dict as before, right? And you mean e.g. like deepmind/tensor_annotations or PEP 646? Btw, related is also tsalib, xarray, TensorFlow Mesh.
It would expect that there is exactly one feature dim (of kind feature).
It does not generalize to every layer of course. But does it have to?
I did not propose to use it for any cases where it would be ambiguous or require any sort of magic.# But anyway, this inferring of options (e.g. |
I mean, it was just a proposal, to also have a uniform and consistent interface. What are the remaining cases (layers) where this would not work? And do they invalidate the proposal? E.g.
Those are examples where |
Ok, all layers should be covered now. So I'm closing this. |
E.g.
CumConcatLayer
from #589 allows (even enforces) you to specifydim_tag: DimensionTag
for the output axis.I propose something similar for all other layers that introduce new dims. Basically for all layers introducing a new dynamic seq length here that could be useful.
This is already possible to do using an additional
ReinterpretDataLayer
(or so, something that lets you specify new dim tags. I don't think this exists yet, but is easy to add). Then you would do:Here,
split_dims0
and so on are the tag descriptions thatSplitDimsLayer
internally sets.Not only is this solution pretty verbose, it also relies on these internal tag descriptions. As a user, you would need to look them up frequently, and we would need to make sure that they don't change internally.
I would like to avoid this, proposing to add a
dim_tags: List[DimensionTag]
option directly toSplitDimsLayer
.With other layers, its the same idea.
This also promotes the best-practice that you should always name your axes properly.
Maybe, for some layers, we would even enforce the use of this. Especially for layers that really introduce an entirely new dim (like
SplitDimsLayer
). But also for others. When enforced, this would need a new behavior version (#508).(Related to that I would also allow to specify just a string with the description as
dim_tag
to make this easier to use. Calling theDimensionTag
constructor is not that nice in Sisyphus configs.)List of layers which need this
Also see: Operations on dimension tags
Layers (potentially) introducing a new dynamic seq length
List
ConvLayer
(done,out_spatial_dims
,out_dim
,in_spatial_dims
,in_dim
, ConvLayer and PoolLayer, in_dim, in_spatial_dims, out_dim, out_spatial_dims #789)PoolLayer
(done,out_spatial_dims
,out_dim
,in_spatial_dims
,in_dim
, ConvLayer and PoolLayer, in_dim, in_spatial_dims, out_dim, out_spatial_dims #789)TransposedConvLayer
(done,out_spatial_dims
,out_dim
,in_spatial_dims
,in_dim
, TransposedConvLayer in_dim, out_dim, in_spatial_dims, out_spatial_dims #791)WindowLayer
(done,out_spatial_dim
andwindow_dim
, WindowLayer, window_dim and out_spatial_dim #776)PadLayer
(done,out_dims
andaxes
, via PadLayer, out_spatial_dims option #778 and PadLayer, rename out_spatial_dims to out_dims #779)ResizeLayer
(done,out_dim
andaxis
, via ResizeLayer out_dim option #797)PrefixInTimeLayer
(done,out_dim
andaxis
, via PrefixInTimeLayer, PostfixInTimeLayer, axis and out_dim options #799)PostfixInTimeLayer
(done,out_dim
andaxis
, via PrefixInTimeLayer, PostfixInTimeLayer, axis and out_dim options #799)SliceLayer
(done,out_dim
, via SliceLayer, out_dim option #772)SliceNdLayer
(done,out_spatial_dim
orsize
, via SliceNdLayer, size can be DimensionTag #771, SliceNdLayer, out_spatial_dim option #838)MergeDimsLayer
(done,out_dim
andaxes
, via MergeDimsLayer, handle out_dim option #785)SplitDimsLayer
(done,dims
andaxis
, via SplitDimsLayer, dims can be list of DimensionTag #786)UnflattenNdLayer
(done,out_dims
,in_dim
, via UnflattenNdLayer, in_dim, out_dims options #802)RepeatLayer
(done,out_dim
andaxis
, via RepeatLayer, handle out_dim option #803)TimeChunkingLayer
(done,out_dim
andaxis
, via TimeChunkingLayer axis option, TimeUnChunkingLayer, dim tags #805)RemoveLayer
(done,out_dim
andaxis
, via RemoveLayer, out_dim and axis options #806)RecLayer
with search (done,axis
, via RecUnstackLayer, declare_rec_time option, axis optional #751)EditDistanceTableLayer
(done,out_dim
, via EditDistanceTableLayer, handle out_dim #807)MaskedComputationLayer
(done,out_spatial_dim
, via MaskedComputationLayer, out_spatial_dim option #811; further discussion:MaskedComputationLayer
is violating the principle that the user should not need to think about rec automatic optimization #769)CumConcatLayer
(done,out_spatial_dim
, via CumConcatLayer #589 and CumConcatLayer, out_spatial_dim instead of new_dim option #760)RangeFromLengthLayer
(done,out_spatial_dim
, via allow_broadcast_all_sources, changed default, new behavior version #759)RangeLayer
(done,out_spatial_dim
, via allow_broadcast_all_sources, changed default, new behavior version #759)ScatterNdLayer
(done,out_spatial_dim
, via ScatterNdLayer, out_spatial_dim option #770)Layers involving some linear transformation on the features
LinearLayer
(done,out_dim
andin_dim
, via Generic LayerBase in_dim and out_dim options #765 and test_LinearLayer_in_dim_spatial, allow in_dim not feature dim #783)ConvLayer
(done,out_spatial_dims
,out_dim
,in_spatial_dims
,in_dim
, ConvLayer and PoolLayer, in_dim, in_spatial_dims, out_dim, out_spatial_dims #789)RecLayer
(done,axis
, via RecUnstackLayer, declare_rec_time option, axis optional #751)Layers with fixed output shape
ConstantLayer
(done,shape
, via Explicit output dim tags for RandIntLayer, VariableLayer, ConstantLayer #762)VariableLayer
(done,shape
, via Explicit output dim tags for RandIntLayer, VariableLayer, ConstantLayer #762)RandIntLayer
(done,shape
, via Explicit output dim tags for RandIntLayer, VariableLayer, ConstantLayer #762)Others
ReduceOutLayer
(done,out_dim
, via ReduceOutLayer, handle out_dim #808)StackLayer
(done,out_spatial_dim
, via StackLayer, out_spatial_dim option #809)SplitLayer
(done,out_dims
, via SplitLayer out_dims option #761)TileLayer
(done,out_dims
, via TileLayer, out_dims option #810)ConcatLayer
(done,out_dim
, via ConcatLayer, add out_dim #824)GatingLayer
(done,out_dim
, via GatingLayer, handle out_dim #839)The text was updated successfully, but these errors were encountered: