Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency for axis or in_spatial_dim #120

Open
albertz opened this issue Mar 1, 2022 · 2 comments
Open

Inconsistency for axis or in_spatial_dim #120

albertz opened this issue Mar 1, 2022 · 2 comments
Assignees
Milestone

Comments

@albertz
Copy link
Member

albertz commented Mar 1, 2022

We see both variants of axis or in_spatial_dim in various functions and modules.

axis is often used when the argument in principle does not always need to be a spatial dim. For many of the low-level API like split_dims, reduce etc, this makes sense.

in_spatial_dim is used when it usually would be a spatial dim, like conv, pool, etc.
Also, it is used to make a clear distinction between the feature dim in_dim.

Sometimes, this distinction becomes blurry, and I keep forgetting which argument it is.

Sometimes there are also other names.

Examples:

  • TransformerEncoder, TransformerEncoderLayer: axis
  • TransformerDecoder, TransformerDecoderLayer: memory_spatial_axis
  • Transformer: source_spatial_axis
  • ConformerConvSubsample: in_spatial_dim
  • ConformerConvBlock: axis
  • ConformerEncoderLayer: axis
  • ConformerEncoder: in_spatial_dim
  • Conv1d, pool1d: in_spatial_dim

Esp the Conformer here is inconsistent.

@JackTemaki
Copy link
Contributor

This is definitely an issue, especially with the limitation we have with the auto-generated layers that always use axis.

But even in those there is (e.g. merge layer) the mismatch axes for input dims and out_dim...

Not sure how to solve this...

@albertz
Copy link
Member Author

albertz commented Nov 6, 2022

One difference between spatial_dim and axis: An axis is spatial when the order is relevant, and neighboring frames are probably more related. E.g. thus LSTM, convolution etc operate on spatial dims, while self-attention or cross-attention can operate on any axes.

albertz added a commit that referenced this issue Nov 6, 2022
albertz added a commit that referenced this issue Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants