Design/Handling of dimension tags #17

albertz · 2021-08-05T15:34:39Z

Like batch dim, spatial dims (with dynamic lengths), or static dims (named, or also unnamed).

Dim (earlier DimensionTag) in RETURNN. Directly use that, or wrap it somehow?

Should this (the batch dim) include beam information, or be separate from this?

Relevant for all layers which define some shape or dimension (e.g. Const, Variable).

Should this be enforced, i.e. no simple int allowed in n_out or so but always a Dim object?
And maybe better use out_dim instead of n_out (consistent with rwth-i6/returnn#597).
Edit: It was decided to make nn.Dim mandatory, and use out_dim instead of n_out.

Very related is this issue on RETURNN side on explicit dim tags: rwth-i6/returnn#597

Related is also whether we want unique dim tags? (#48, rwth-i6/returnn#632)

This issues covers multiple aspects:

Use dim tags. Directly use RETURNN Dim. We inherit all its logic on equality etc. We also have FeatureDim, SpatialDim from RETURNN.
Dim tags (Dim instances) are mandatory for any shape or size
Shape for all tensors is always available (via Make shape and dims available? #47)
We have Tensor.verify_out_shape for easy verification
Solution for in_dim == out_dim, square matrices, Design/Handling of dimension tags #17 (comment), Solution for ambiguous dim tags returnn#871
Instead of out_spatial_dim argument, the a new spatial dim gets returned. See pool1d for example.
Tensor shape annotations, moved to Tensor type annotations #97

The text was updated successfully, but these errors were encountered:

albertz · 2021-11-05T22:58:41Z

We should clarify this because this is very fundamental to how all our code will look like.

The main question is to decide on these two options:

~~No special handling here in returnn-common for dim tags. Basically just as optional as in RETURNN.~~
Make it mandatory to use. This is whenever axis, axes, new_dim is involved, but also for out_dim in Linear, LSTM, Conv and some others. Edit we decided to do this.

I think when we require them, we should also require them to be unique.

I tend to prefer having them mandatory. This might make some parts maybe a bit more verbose from the first glance. But this should resolve any possible ambiguity on axes, and make the code very explicit and clear.

We might however introduce some shorter aliases, or maybe restructure things a bit. For example:

Dim = DimensionTag Edit: Done. Actually the other way around, it was renamed to Dim, and we have an alias for DimensionTag for older code (mostly RETURNN, not relevant for returnn_common).
Have one global B or Batch or batch_dim for Dim(kind=DimensionTag.Types.Batch). Edit We have a global batch_dim now.
F(dim, name=None, **kwargs) for Dim(kind=DimensionTag.Types.Feature, dimension=dim, description=name, **kwargs). Edit We have FeatureDim(...) now.
S(name=None, **kwargs) for Dim(kind=DimensionTag.Types.Spatial, description=name, **kwargs). Edit We have SpatialDim(...) now.

So in your config, it could look like:

input_dim = nn.FeatureDim("mfcc", 40)
classes_dim = nn.FeatureDim("cart", 5000)
time_dim = nn.SpatialDim("time")

extern_data = {
  "data": {"dim_tags": [nn.batch_dim, time_dim, input_dim]},
  "classes": {"dim_tags": [nn.batch_dim, time_dim], "sparse_dim": classes_dim}
}

class Model(nn.Module):
  def __init__(self, in_dim: nn.Dim, out_dim: nn.Dim):
    super().__init__()
    self.linear = nn.Linear(out_dim=out_dim, in_dim=in_dim)
  
  def forward(self, input: nn.LayerRef) -> Layer:
    return self.linear(input)

model = Model(in_dim=input_dim, out_dim=classes_dim)

...

(Just a draft.)

#52 Although this is still work in progress: - The test is not really testing anything. - Not using consistent dim tags (#17) - Missing is normal attention - Missing is auto-regressive self-attention

albertz · 2021-11-08T15:30:21Z

Would this imply that "F" or "T" are also not allowed anymore?

albertz · 2021-11-08T16:09:43Z

Would this imply that "F" or "T" are also not allowed anymore?

I do not have enough knowledge to understand the consequences of this.

Sorry, maybe I was confusing here. This is a design decision we should take. We can either still allow "F" and "T" or not. Very related is also a similar question on RETURNN side: rwth-i6/returnn#586

Or maybe we just should change the behavior (on RETURNN side or here in returnn-common): Whenever "F" or "T" is non-ambiguous (according to some sensible rules), then it is allowed, otherwise not. The rules could be:

F == static dimension (only valid if there is exactly one static dimension)
T == dynamic dimension (only valid if there is exactly one dynamic dimension; excluding batch dim of course)
B == batch dim (only valid if there is only exactly one batch dim)

Zettelkasten · 2021-12-17T10:35:04Z

All dims which are configurable and supposed to be used by the module for some output dimension would be passed as arguments as well (out_dim, out_spatial_dim, whatever).

Also, we would need to think about how out_shape should work.
Usually, you have some set of common dims (like the batch dim, but maybe more), that the module will not operate on.
We would also need to pass a set of these to each layer in some way. Or maybe we could also supply some explicit out_shape to the module, and then infer it from there.

albertz · 2021-12-17T10:54:10Z

All dims which are configurable and supposed to be used by the module for some output dimension would be passed as arguments as well (out_dim, out_spatial_dim, whatever).

Also, we would need to think about how out_shape should work. Usually, you have some set of common dims (like the batch dim, but maybe more), that the module will not operate on. We would also need to pass a set of these to each layer in some way. Or maybe we could also supply some explicit out_shape to the module, and then infer it from there.

Ah yes, good point. Maybe for the __call__, there could be another in_shape argument and then the module can derive it and add such checks on intermediate outputs. Or maybe better, via #47, we anyway have the shape of any inputs already, so no need for an extra in_shape. The user could have checked in advance that the inputs are as expected.

There could be an out_shape argument for module __call__. Although some modules might return multiple things, then this is a bit ambiguous. Also we cannot really make sure that all modules have this argument, so this would more be a convention than a strict rule. But this is ok.

Or, when we do #47, the user could explicitly do it afterwards like:

out = model(..., out_dim=feature_dim)
out.assert_shape({batch_dim, time_dim, feature_dim})

Maybe that is cleaner?

Zettelkasten · 2021-12-17T11:20:18Z

Or maybe better, via #47, we anyway have the shape of any inputs already, so no need for an extra in_shape. The user could have checked in advance that the inputs are as expected

That would be even better, yes. Then every module checking shapes (which ideally, every module should do I think), would have something like this on top:

common_dims = in.get_shape() - {in_dim}
out = ...
out.assert_shape(common_dims | {out_dim})

Or, when we do #47, the user could explicitly do it afterwards like:
out = model(..., out_dim=feature_dim)
out.assert_shape({batch_dim, time_dim, feature_dim})
Maybe that is cleaner?

This wouldn't work for intermediate states right? (Also, when nesting modules, the outer module would also need to know the common dims to assert the output of the inner module).
I think more complicated modules should check their output type themselves, too. Just like you would specify the return type of a function, this is very important documentation.

albertz · 2021-12-17T11:30:29Z

The out_shape or assert_shape is anyway mostly for the user. A user might not care what happens inside the model, only that he gets some specific output in the end.

However, if you are the author of model, or you want that any user can easily read the code of your model, then sure, you would also have out_shape or assert_shape usages inside the model, for any intermediate outputs, and maybe also the final output.

We should be careful to not make it too verbose or too complicated. When the code looks complex from a first glance, this is also bad. out_shape might be a bit better w.r.t. this aspect because it avoids a separate call assert_shape. But we anyway could support both.

Another aspect: Probably many modules will be lazy w.r.t. the input dim, just like it is standard in RETURNN and other frameworks. E.g. the user would specify out_dim but not in_dim. So in_dim is only known on the first call to the module, and then automatically inferred from the input (in.in_dim or so, which refers to the feature dim). I think enforcing to specify in_dim at module creation would be too verbose in many cases, so that is why we would have to support such lazy in_dim, and also using the feature dim logic.

albertz · 2021-12-19T01:25:18Z

So, I currently have this Linear module implementation:

class Linear(nn.Module):
  """
  Linear transformation.
  """

  def __init__(self, out_dim: nn.Dim, *, in_dim: Optional[nn.Dim] = None):
    super().__init__()
    self.out_dim = out_dim
    self.out_dim_inner = out_dim
    self.in_dim = in_dim
    self.weight = None  # type: Optional[nn.Parameter]
    self.bias = None  # type: Optional[nn.Parameter]
    if in_dim:
      self._lazy_init(in_dim)

  def _lazy_init(self, in_dim: nn.Dim):
    if self.in_dim:
      assert self.in_dim == in_dim
    else:
      self.in_dim = in_dim
      if in_dim == self.out_dim:
        self.out_dim_inner = self.out_dim.copy(same_as_self=False, description=f"{self}:out-dim-inner")
      self.weight = nn.Parameter((self.in_dim, self.out_dim_inner))
      self.bias = nn.Parameter((self.out_dim_inner,))

  @nn.scoped
  def __call__(self, source: nn.LayerRef) -> nn.Layer:
    self._lazy_init(source.dim)
    out = nn.dot(source, self.weight, reduce=self.in_dim) + self.bias
    if self.out_dim_inner != self.out_dim:
      out = nn.reinterpret_data(out, set_dim_tags={self.out_dim_inner: self.out_dim})
    return out

You see that I needed to introduce this out_dim_inner here. The problem occurs when we have in_dim == out_dim. The dim tags in the parameter are not distinguishable then, and the dot reduce=in_dim is ambiguous.

I'm not really sure about the best way to solve this. I'm not really happy with this solution now.

@Zettelkasten any suggestions or ideas?

albertz · 2021-12-19T21:21:04Z

While further thinking about this, some thoughts and options:

I think no matter what we do, we should not make an exception that the order of axes matters here now in this case. The order should never matter.
So, in dim_tags = (in_dim, out_dim) with out_dim = in_dim, we need some way to distinguish those dims. But this cannot be just the order.
When nothing is done, I think the current behavior of RETURNN is actually weird or wrong anyway. It looks like it reduces both axes and then fails with some strange other error? There should be some further testing and fixing on RETURNN side to better catch this with a better error message. Already get_axis_from_description should fail due to the ambiguity.
One option to distinguish them is in my example code above. I.e. in the case in_dim == out_dim, it does out_dim_ = out_dim.copy(same_as_self=False, ...) and then using shape (in_dim, out_dim_).
- I don't really like this too much because other code might expect the shape (in_dim, out_dim). Or all other code must also have this special logic for this special case.
- We probably have this issue also in other instances, e.g. for LSTM or others, when there is some matrix multiplication involved. It should be simple to always make use of the same logic for this case.
- It's too complicated, too ugly.
When reduce=in_dim is searching for matching dims, we need some way that it uniquely finds the right axis.
Maybe we can have some matching_priority flag or so, set on dim tags. In get_axis_from_description, when a dim tag matches multiple existing dim tags, but one of the matching tags has a higher matching priority, then it would still be allowed, and it would use that one. So in the example, in_dim == out_dim (but in_dim is not out_dim) but in_dim would be a copy with higher matching priority. The code would maybe look like:
```
weight = nn.Parameter((in_dim.copy(matching_priority=1), out_dim))
```
Edit I implemented this now here: Solution for ambiguous dim tags returnn#871

rwth-i6/returnn_common#17 (comment)

Via: rwth-i6/returnn_common#17 (comment) Also: Data.get_axes_from_description, only unique matching for dim tags

Via: rwth-i6/returnn_common#17 (comment)

#17 (comment)

albertz · 2022-01-03T11:20:40Z

Another thing: I wonder whether passing out_spatial_dim as argument is maybe un-intuitive. In many cases, it is a new dimension. So (also as the name implies) it is an output of the function or module. The current code often looks like this:

out_spatial_dim = nn.SpatialDim("whatever")
out = self.conv(..., out_spatial_dim=out_spatial_dim)

This is for dynamic sizes. For static dims, it is even more complicated, as you need to calculate the dim by hand (although maybe you could argue, it is a good thing to have it explicit).

Maybe the module should instead just create such a dim itself, and then return it?

One question is, how should it be returned? Just Tuple[nn.Layer, nn.Dim]? Or namedtuple? Or sth else?

For example, it would then look like:

out, out_spatial_dim = self.conv(...)

I think this is nicer.

The case where it is known beforehand (or expected to match some other existing dim) is maybe valid, but the user could easily still do that by calling declare_same_as afterwards.

out_dim is different though. out_dim would stay an argument. This is because out_dim is usually a feature dim and usually can be arbitrarily defined by the user, like Linear or Conv. Although, maybe for the cases where it is not arbitrary, like merge_dims for example, maybe there it should be returned as well?

Or consider the ConformerConvSubsample.__call__:

@nn.scoped
def __call__(self, inp: nn.LayerRef, *, in_spatial_dim: nn.Dim, out_spatial_dim: nn.Dim) -> nn.LayerRef:
  """forward"""
  in_spatial_dims = [in_spatial_dim, inp.feature_dim]
  in_dim = nn.FeatureDim("dummy-input-feature-dim", 1)
  x = nn.expand_dim(inp, dim=in_dim)
  for i, conv_layer in enumerate(self.conv_layers):
    out_spatial_dims = [nn.SpatialDim(f"conv-{i}-1"), nn.SpatialDim(f"conv-{i}-2")]
    x = conv_layer(x, in_dim=in_dim, in_spatial_dims=in_spatial_dims, out_spatial_dims=out_spatial_dims)
    in_spatial_dims = out_spatial_dims
    in_dim = conv_layer.out_dim
    x = self.activation(x)
    if self.pool_sizes and i < len(self.pool_sizes):
      x = nn.pool(
        x, in_dim=in_dim, in_spatial_dims=in_spatial_dims,
        pool_size=self.pool_sizes[i], padding='same', mode='max')
    if self.dropout:
      x = nn.dropout(x, axis=in_dim, dropout=self.dropout)
  out = nn.merge_dims(x, axes=in_spatial_dims, out_dim=out_spatial_dim)
  return out

With the suggestion, it would look like:

@nn.scoped
def __call__(self, inp: nn.LayerRef, *, in_spatial_dim: nn.Dim) -> nn.LayerRef:
  """forward"""
  in_spatial_dims = [in_spatial_dim, inp.feature_dim]
  in_dim = nn.FeatureDim("dummy-input-feature-dim", 1)
  x = nn.expand_dim(inp, dim=in_dim)
  for i, conv_layer in enumerate(self.conv_layers):
    x, in_spatial_dims = conv_layer(x, in_dim=in_dim, in_spatial_dims=in_spatial_dims)
    in_dim = conv_layer.out_dim
    x = self.activation(x)
    if self.pool_sizes and i < len(self.pool_sizes):
      x = nn.pool(
        x, in_dim=in_dim, in_spatial_dims=in_spatial_dims,
        pool_size=self.pool_sizes[i], padding='same', mode='max')
    if self.dropout:
      x = nn.dropout(x, axis=in_dim, dropout=self.dropout)
  out, out_spatial_dim = nn.merge_dims(x, axes=in_spatial_dims)
  return out, out_spatial_dim

@Zettelkasten opinions?

Edit This is implemented like this now.

This was wrapping reinterpret_data with set_axes. There should be no layer in returnn-common which would depend on this, so this should never be needed. If there are cases where this is needed, we probably could and should fix them such that it is not needed. #17

albertz · 2022-01-07T00:55:50Z

The last suggestion was implemented now, i.e. namely that functions and modules return a potentially new dim tag, and the argument (e.g. out_spatial_dim) is optional, and when None, it auto-creates some new dim.

albertz · 2022-01-07T00:57:39Z

Remaining is the issue with out_dim_inner. There is still the suggested solution here: rwth-i6/returnn#871
This also has some further discussion.
In any case, out_dim_inner should be removed. The param should be enough.

albertz · 2022-01-19T21:50:08Z

Another external data point is Mesh TensorFlow, and the gpt-neo implementation. Examples:

dim_kv = mtf.Dimension("features_per_head", params["n_embd"] // params["n_head"])
mtfparams = mtf.transformer.attention.attention_params_simple(
    x.mesh,
    io_dim=dim_embd,
    kv_dim=dim_kv,
    heads_dim=dim_heads,
    variable_dtype=variable_dtype
)

(via)

k = mtf.replace_dimensions(k, k.shape[1], memory_length_dim)
v = mtf.replace_dimensions(v, v.shape[1], memory_length_dim)

a = mtf_transformer.attention.attention(
    q, k, v,
    memory_length_dim=memory_length_dim,
    key_dim=dim_kv,
    value_dim=dim_kv,
    bias=broadcasted_bias,
    dropout_rate=attn_dropout_rate
)

def causal_linear_attention(q, k, v, eps = 1e-6):
    batch_dim, seq_dim, head_dim, dim_out = (v.shape[0], v.shape[1], v.shape[2], v.shape[3])
    q = mtf.rename_dimension(q, "features_per_head", "features_per_head_in")
    k = mtf.rename_dimension(k, "features_per_head", "features_per_head_in")

    dim_in = k.shape[-1]

    q = mtf.softmax(q, dim_in)
    k = mtf.exp(k)

    cumulative_k = mtf.cumsum(k, seq_dim) + eps
    D_inv = 1. / mtf.einsum([q, cumulative_k], output_shape=[batch_dim, seq_dim, head_dim])

    context = mtf.einsum([k, v], output_shape=[batch_dim, seq_dim, head_dim, dim_in, dim_out])
    cumulative_context = mtf.cumsum(context, seq_dim)

    attn = mtf.einsum([q, cumulative_context, D_inv], output_shape=[batch_dim, seq_dim, head_dim, dim_out])
    return attn

(via)

def einsum(xs, output_shape=None, reduced_dims=None, name=None):
  ...

(here)

Note that a mtf.Dimension is very simple. It's just (def):

Dimension = collections.namedtuple("Dimension", ["name", "size"])

where name: str and size: int.

MTF also requires unique dimensions in a shape (see here).

Solution for ambiguous dim tags, e.g. in VariableLayer for square matrix. Via: rwth-i6/returnn_common#17 (comment)

albertz · 2022-02-11T11:04:44Z

Dim.match_priority support (rwth-i6/returnn#871) is merged now, due to the lack of any better solution for the ambiguous dim tags.

Now we need to adopt the code here accordingly (get rid of out_dim_inner etc).

#17 rwth-i6/returnn#871

#17

albertz · 2022-02-11T12:29:52Z

This is all implemented now.

This comment has been minimized.

Sign in to view

albertz mentioned this issue Aug 6, 2021

Rec design for recurrent definitions / loops #16

Closed

This comment has been minimized.

Sign in to view

albertz mentioned this issue Oct 20, 2021

Missing pieces for first release #32

Open

albertz added this to the first-release milestone Oct 20, 2021

This was referenced Oct 29, 2021

Make shape and dims available? #47

Closed

Enforce dim tags to be unique? #48

Closed

Losses to implement and losses naming conventions #38

Closed

Transformer Modules #55

Merged

Conformer Modules #58

Merged

albertz mentioned this issue Nov 5, 2021

Implement standard attention and self-attention module #52

Closed

albertz assigned albertz, Zettelkasten, JackTemaki, mmz33 and Atticus1806 Nov 5, 2021

albertz mentioned this issue Nov 6, 2021

nn.dot should have better signature #67

Closed

This comment has been minimized.

Sign in to view

This was referenced Nov 9, 2021

Data special axes (feature, time) should be removed rwth-i6/returnn#586

Closed

Specify expected output shape for verification rwth-i6/returnn#706

Closed

This was referenced Nov 26, 2021

Examples for DimensionTag usage rwth-i6/returnn#775

Closed

Simplifying DimensionTag user interface rwth-i6/returnn#782

Closed

albertz added a commit to rwth-i6/returnn that referenced this issue Dec 19, 2021

Dim match_priority

22bdf2e

rwth-i6/returnn_common#17 (comment)

albertz mentioned this issue Dec 19, 2021

Solution for ambiguous dim tags rwth-i6/returnn#871

Merged

albertz added a commit to rwth-i6/returnn that referenced this issue Dec 20, 2021

Dim match_priority

64fe730

Via: rwth-i6/returnn_common#17 (comment) Also: Data.get_axes_from_description, only unique matching for dim tags

albertz added a commit to rwth-i6/returnn that referenced this issue Dec 21, 2021

Dim match_priority

74981f1

Via: rwth-i6/returnn_common#17 (comment)

albertz mentioned this issue Dec 22, 2021

nn.dropout should have axis mandatory #77

Closed

albertz added a commit that referenced this issue Dec 22, 2021

get_returnn_config_serialized

be78e0b

#17 (comment)

albertz mentioned this issue Jan 12, 2022

Enforce dim tags to be unique in tensor rwth-i6/returnn#632

Open

This comment was marked as duplicate.

Sign in to view

albertz mentioned this issue Feb 3, 2022

Tensor type annotations #97

Open

albertz added a commit to rwth-i6/returnn that referenced this issue Feb 11, 2022

Dim match_priority (#871)

a10e882

Solution for ambiguous dim tags, e.g. in VariableLayer for square matrix. Via: rwth-i6/returnn_common#17 (comment)

albertz added a commit that referenced this issue Feb 11, 2022

avoid out_dim_inner, use match_priority

53b7bc6

#17 rwth-i6/returnn#871

albertz added a commit that referenced this issue Feb 11, 2022

test_simple_net_linear_square_matrix

18a1218

#17

albertz closed this as completed Feb 11, 2022

This was referenced Feb 18, 2022

Specified out dim tags potentially needs duplicated logic rwth-i6/returnn#955

Closed

Out spatial dim tags arguments not needed #115

Closed

Out dim tags potentially needs duplicated logic #116

Closed

albertz mentioned this issue Mar 4, 2022

Improve verify_out_shape, ignore unrelated dims #121

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design/Handling of dimension tags #17

Design/Handling of dimension tags #17

albertz commented Aug 5, 2021 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

albertz commented Nov 5, 2021 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

albertz commented Nov 8, 2021

This comment has been minimized.

albertz commented Nov 8, 2021

Zettelkasten commented Dec 17, 2021

albertz commented Dec 17, 2021

Zettelkasten commented Dec 17, 2021

albertz commented Dec 17, 2021

albertz commented Dec 19, 2021 •

edited

Loading

albertz commented Dec 19, 2021 •

edited

Loading

albertz commented Jan 3, 2022 •

edited

Loading

albertz commented Jan 7, 2022

albertz commented Jan 7, 2022

albertz commented Jan 19, 2022 •

edited

Loading

This comment was marked as duplicate.

albertz commented Feb 11, 2022 •

edited

Loading

albertz commented Feb 11, 2022

Design/Handling of dimension tags #17

Design/Handling of dimension tags #17

Comments

albertz commented Aug 5, 2021 • edited Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

albertz commented Nov 5, 2021 • edited Loading

This comment has been minimized.

This comment has been minimized.

albertz commented Nov 8, 2021

This comment has been minimized.

albertz commented Nov 8, 2021

Zettelkasten commented Dec 17, 2021

albertz commented Dec 17, 2021

Zettelkasten commented Dec 17, 2021

albertz commented Dec 17, 2021

albertz commented Dec 19, 2021 • edited Loading

albertz commented Dec 19, 2021 • edited Loading

albertz commented Jan 3, 2022 • edited Loading

albertz commented Jan 7, 2022

albertz commented Jan 7, 2022

albertz commented Jan 19, 2022 • edited Loading

This comment was marked as duplicate.

albertz commented Feb 11, 2022 • edited Loading

albertz commented Feb 11, 2022

albertz commented Aug 5, 2021 •

edited

Loading

albertz commented Nov 5, 2021 •

edited

Loading

albertz commented Dec 19, 2021 •

edited

Loading

albertz commented Dec 19, 2021 •

edited

Loading

albertz commented Jan 3, 2022 •

edited

Loading

albertz commented Jan 19, 2022 •

edited

Loading

albertz commented Feb 11, 2022 •

edited

Loading