-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Custom encoder/decoder layer sequence. #470
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some tiny initial comments. I will give it a closer read soon.
sockeye/convolution.py
Outdated
# target: (batch_size, num_hidden) -> (batch_size, 1, num_hidden) | ||
target = mx.sym.expand_dims(target, axis=1) | ||
|
||
# Incompatible input shape: expected [80,0], got [80,1,32] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
sockeye/inference.py
Outdated
@@ -161,21 +161,26 @@ def sym_gen(source_seq_len: int): | |||
source_words = source.split(num_outputs=self.num_source_factors, axis=2, squeeze_axis=True)[0] | |||
source_length = utils.compute_lengths(source_words) | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spurious newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
sockeye/transformer.py
Outdated
@@ -174,13 +174,13 @@ def __call__(self, | |||
# self-attention | |||
target_self_att = self.self_attention(inputs=self.pre_self_attention(target, None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer unpacking like target_self_att, _ = self.self_attention(...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, will do.
sockeye/transformer.py
Outdated
target = self.post_self_attention(target_self_att, target) | ||
|
||
# encoder attention | ||
target_enc_att = self.enc_attention(queries=self.pre_enc_attention(target, None), | ||
memory=source, | ||
bias=source_bias) | ||
bias=source_bias)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed.
thanks for taking a first look. There are still several cleanups necessary though (just as a warning). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't looked at rnn.py yet.
Bear with me, this is a lot of new code :)
raise NotImplementedError("Pooling only available on the encoder side.") | ||
|
||
|
||
class QRNNBlock: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some docstring might be helpful to describe what this implements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. I'll add some documentation here.
sockeye/custom_seq_parser.py
Outdated
layer = meta_layer / parallel_layer / repeat_layer / subsample_layer / standard_layer | ||
open = "(" | ||
close = ")" | ||
empty_paran = open close |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paran -> paren
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
sockeye/custom_seq_parser.py
Outdated
repeat_layer = "repeat" open int comma layer_chain close | ||
subsample_layer = "subsample" open optional_params layer_chain_sep layer_chain close | ||
|
||
standard_layer = standard_layer_name optional_paranthesis_params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parenthesis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
sockeye/custom_seq_parser.py
Outdated
separated_layer_chain = layer_chain_sep layer_chain | ||
more_layer_chains = separated_layer_chain* | ||
|
||
optional_paranthesis_params = paranthesis_params? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parenthesis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
sockeye/custom_seq_parser.py
Outdated
def __init__(self): | ||
super().__init__() | ||
|
||
def visit_paranthesis_params(self, node, rest): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parenthesis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
sockeye/layers.py
Outdated
|
||
# TODO: make sure the number of hidden units does not change! | ||
class ResidualEncoderLayer(EncoderLayer): | ||
def __init__(self, layers: List[EncoderLayer]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
sockeye/layers.py
Outdated
# TODO: potentially add a projection layer (for when the shapes don't match up). Alternative: check that the input num hidden matches the output num_hidden (maybe add a get_input_num_hidden()) | ||
class ResidualDecoderLayer(NestedDecoderLayer): | ||
|
||
def __init__(self, layers: List[DecoderLayer]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
->None
|
||
|
||
# TODO: potentially add a projection layer (for when the shapes don't match up). Alternative: check that the input num hidden matches the output num_hidden (maybe add a get_input_num_hidden()) | ||
class ResidualDecoderLayer(NestedDecoderLayer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't we have a ResidualLayer that inherits from SharedEncoderDecoderLayer and implements all 3 methods (encode_sequence, decode_sequence, decode_step)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean inheriting from both NestedDecoderLayer and SharedEncoderDecoderLayer? yes, we potentially could. I'll add a TODO.
return ResidualDecoderLayer(layers) | ||
|
||
|
||
# TODO: make this a block!? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
num_embed=num_embed_source) | ||
|
||
# TODO: how to set this correctly!? | ||
encoder_num_hidden = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate on that TODO?
dtype: str = C.DTYPE_FP32, | ||
prefix: str = ''): | ||
""" | ||
Create a single rnn cell. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing newline
"""Janet cell, as described in: | ||
https://arxiv.org/pdf/1804.04849.pdf | ||
|
||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we keep docstring styles consistent?
return len(self.rnn_cell.state_info) | ||
|
||
def state_variables(self, step: int) -> Sequence[mx.sym.Symbol]: | ||
return [mx.sym.Variable("%rnn_state_%d" % (self.prefix, i)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a constant for this string?
forget_bias=forget_bias) | ||
|
||
def create_encoder_layer(self, input_num_hidden: int, prefix: str) -> layers.EncoderLayer: | ||
return RecurrentEncoderLayer(rnn_config=self.rnn_config, prefix=prefix + "rnn_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constant available for string?
return RecurrentEncoderLayer(rnn_config=self.rnn_config, prefix=prefix + "rnn_") | ||
|
||
def create_decoder_layer(self, input_num_hidden: int, prefix: str) -> layers.DecoderLayer: | ||
return RecurrentDecoderLayer(rnn_config=self.rnn_config, prefix=prefix + "rnn_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constant available for string?
What would it take to avoid having an int parameter representing the size of the length dimension in the calling methods of layer implementations? This could be a major step towards converting to hybrid Gluon blocks. I took a quick pass over encoder.py and decoder.py to see what blocks us from avoiding the int argument:
This has implications on this PR, as it may change the signature of your basic Layer classes. What do you think? |
Yeah, I think the main blockers were the RNNs. I'll check regarding CNNs. In general it would be really nice if we could get rid of the int parameter though. |
Closing for now as this is needs more work to be applied to Sockeye 2. |
Also:
Still WIP!
Pull Request Checklist
until you can check this box.
pytest
)pytest test/system
)./style-check.sh
)sockeye/__init__.py
. Major version bump if this is a backwards incompatible change.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.