Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Custom encoder/decoder layer sequence. #470

Closed
wants to merge 18 commits into from

Conversation

tdomhan
Copy link
Contributor

@tdomhan tdomhan commented Jul 6, 2018

  • Encoder/Decoder consisting of custom layers

Also:

  • Janet RNN
  • QRNN
  • Highway layers

Still WIP!

Pull Request Checklist

  • Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]'
    until you can check this box.
  • Unit tests pass (pytest)
  • Were system tests modified? If so did you run these at least 5 times to account for the variation across runs?
  • System tests pass (pytest test/system)
  • Passed code style checking (./style-check.sh)
  • You have considered writing a test
  • Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.
  • Updated CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor

@fhieber fhieber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some tiny initial comments. I will give it a closer read soon.

# target: (batch_size, num_hidden) -> (batch_size, 1, num_hidden)
target = mx.sym.expand_dims(target, axis=1)

# Incompatible input shape: expected [80,0], got [80,1,32]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -161,21 +161,26 @@ def sym_gen(source_seq_len: int):
source_words = source.split(num_outputs=self.num_source_factors, axis=2, squeeze_axis=True)[0]
source_length = utils.compute_lengths(source_words)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spurious newline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -174,13 +174,13 @@ def __call__(self,
# self-attention
target_self_att = self.self_attention(inputs=self.pre_self_attention(target, None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer unpacking like target_self_att, _ = self.self_attention(...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, will do.

target = self.post_self_attention(target_self_att, target)

# encoder attention
target_enc_att = self.enc_attention(queries=self.pre_enc_attention(target, None),
memory=source,
bias=source_bias)
bias=source_bias)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

@tdomhan
Copy link
Contributor Author

tdomhan commented Jul 10, 2018

thanks for taking a first look. There are still several cleanups necessary though (just as a warning).

Copy link
Contributor

@fhieber fhieber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked at rnn.py yet.
Bear with me, this is a lot of new code :)

raise NotImplementedError("Pooling only available on the encoder side.")


class QRNNBlock:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some docstring might be helpful to describe what this implements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. I'll add some documentation here.

layer = meta_layer / parallel_layer / repeat_layer / subsample_layer / standard_layer
open = "("
close = ")"
empty_paran = open close
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paran -> paren

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

repeat_layer = "repeat" open int comma layer_chain close
subsample_layer = "subsample" open optional_params layer_chain_sep layer_chain close

standard_layer = standard_layer_name optional_paranthesis_params
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parenthesis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

separated_layer_chain = layer_chain_sep layer_chain
more_layer_chains = separated_layer_chain*

optional_paranthesis_params = paranthesis_params?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parenthesis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

def __init__(self):
super().__init__()

def visit_paranthesis_params(self, node, rest):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parenthesis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


# TODO: make sure the number of hidden units does not change!
class ResidualEncoderLayer(EncoderLayer):
def __init__(self, layers: List[EncoderLayer]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

# TODO: potentially add a projection layer (for when the shapes don't match up). Alternative: check that the input num hidden matches the output num_hidden (maybe add a get_input_num_hidden())
class ResidualDecoderLayer(NestedDecoderLayer):

def __init__(self, layers: List[DecoderLayer]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

->None



# TODO: potentially add a projection layer (for when the shapes don't match up). Alternative: check that the input num hidden matches the output num_hidden (maybe add a get_input_num_hidden())
class ResidualDecoderLayer(NestedDecoderLayer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we have a ResidualLayer that inherits from SharedEncoderDecoderLayer and implements all 3 methods (encode_sequence, decode_sequence, decode_step)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean inheriting from both NestedDecoderLayer and SharedEncoderDecoderLayer? yes, we potentially could. I'll add a TODO.

return ResidualDecoderLayer(layers)


# TODO: make this a block!?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

num_embed=num_embed_source)

# TODO: how to set this correctly!?
encoder_num_hidden = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate on that TODO?

@tdomhan tdomhan mentioned this pull request Aug 9, 2018
8 tasks
dtype: str = C.DTYPE_FP32,
prefix: str = ''):
"""
Create a single rnn cell.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing newline

"""Janet cell, as described in:
https://arxiv.org/pdf/1804.04849.pdf

Parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we keep docstring styles consistent?

return len(self.rnn_cell.state_info)

def state_variables(self, step: int) -> Sequence[mx.sym.Symbol]:
return [mx.sym.Variable("%rnn_state_%d" % (self.prefix, i))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a constant for this string?

forget_bias=forget_bias)

def create_encoder_layer(self, input_num_hidden: int, prefix: str) -> layers.EncoderLayer:
return RecurrentEncoderLayer(rnn_config=self.rnn_config, prefix=prefix + "rnn_")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constant available for string?

return RecurrentEncoderLayer(rnn_config=self.rnn_config, prefix=prefix + "rnn_")

def create_decoder_layer(self, input_num_hidden: int, prefix: str) -> layers.DecoderLayer:
return RecurrentDecoderLayer(rnn_config=self.rnn_config, prefix=prefix + "rnn_")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constant available for string?

@fhieber
Copy link
Contributor

fhieber commented Mar 20, 2019

What would it take to avoid having an int parameter representing the size of the length dimension in the calling methods of layer implementations? This could be a major step towards converting to hybrid Gluon blocks.
Currently we use seq_len as an argument to all top-level classes (encoder, decoder, attention etc.).
Back when we wrote it the first time, MXNet didn't have many operators and it forced us to know the sequence length to implement the symbolic graphs. However, since then many things have changed and there are operators such as slice_like, broadcast_like etc. that allow performing operations based on the size/axes of the input data.

I took a quick pass over encoder.py and decoder.py to see what blocks us from avoiding the int argument:

  • Transformers can be implemented fully without knowing the sequence length. We currently use it for the custom ops to create variable length biases, but thats easily avoidable if we know the max_seq_len, which we do at construction time of the classes.
  • RNNs: for encoders, we can follow this tutorial on control flow operators to implement RNN unrolling with a control flow operator. Some attention.
  • RNN attention/coverage types: some attention types (LocationAttention) require knowing the sequence length, but I think we can avoid that. GRUCoverage can also be implemented using control flow ops if still necessary.
  • CNN models: this is where I don't know if we can really do it. My impression is that it should be possible knowing max_seq_len and using ops such as slice_like.

This has implications on this PR, as it may change the signature of your basic Layer classes. What do you think?

@tdomhan
Copy link
Contributor Author

tdomhan commented Mar 25, 2019

Yeah, I think the main blockers were the RNNs. I'll check regarding CNNs. In general it would be really nice if we could get rid of the int parameter though.

@fhieber
Copy link
Contributor

fhieber commented Oct 15, 2020

Closing for now as this is needs more work to be applied to Sockeye 2.

@fhieber fhieber closed this Oct 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants