-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Conversation
trains / evals successfully on convai2, including ranking |
"""Add command-line arguments specifically for this agent.""" | ||
agent = argparser.add_argument_group('Seq2Seq Arguments') | ||
agent.add_argument('--init-model', type=str, default=None, | ||
help='load dict/features/weights/opts from this file') | ||
help='load dict/model/opts from this path') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the way when we have automatic default in the help with %(default)s
, prolly we can fix that in seq2seq now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes let's do separate diff for those, I also want to reformat the whitespace on all the args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to block, but here are my opinions
:param dim: (default 0) dimension to pad | ||
|
||
:returns: padded tensor if the tensor is shorter than length | ||
""" | ||
if tensor.size(dim) < length: | ||
return torch.cat( | ||
[tensor, tensor.new(*tensor.size()[:dim], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of pure curiosity, did you benchmark this with other approaches? (namely init the full zero matrix, then assign the tensor into it?). Not that there's need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, should this actually fill_(null_idx)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't benchmark, although 1) this is copied from pytorch source, with dim
param added by me and 2) only used for candidate score padding so I'm not super concerned about the perf :) oh I'll update with null_idx
the last forward pass to skip recalcuating the same encoder output | ||
rank_during_training -- (default False) if set, ranks any available | ||
cands during training as well | ||
:param xs: (bsz x seqlen) LongTensor input to the encoder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<3
return self.START.detach().expand(bsz, 1) | ||
|
||
def _decode_forced(self, ys, encoder_states): | ||
"""Decode with teacher forcing.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this means
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_decode_forced is putting tokens from the target as the next input to estimate the target score during training or during eval PPL computation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"teacher-forcing" always inputs the true token at each time step, vs using the model's predicted
class Encoder(nn.Module): | ||
def __init__(self, num_features, padding_idx=0, rnn_class='lstm', | ||
emb_size=128, hidden_size=128, num_layers=2, dropout=0.1, | ||
scores = self._decode(encoder_states, maxlen or self.longest_label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All much cleaner now, great!
super().__init__() | ||
|
||
self.dropout = nn.Dropout(p=dropout) | ||
self.layers = num_layers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change all the names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for consistency, there was a mix of underscores and non and things didn't match with the opt args
parlai/agents/seq2seq/modules.py
Outdated
|
||
return encoder_output, hidden | ||
return hidden, encoder_output, attn_mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intentional swap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit TBH I think encoder_output, hidden, attn_mask is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swap was intentional because later
hidden = encoder_states[0]
attn_params = (encoder_states[1], encoder_states[2])
but I could change that to [1] and ([0], [2])
it does match the rnn output more closely
parlai/agents/seq2seq/modules.py
Outdated
if attn_mask is not None: | ||
# remove activation from NULL symbols | ||
# TODO: is this the best operation? | ||
attn_w_premask -= (1 - attn_mask) * 1e20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do th.masked_fill_(attn_w_premask, -1e20)
. That's what I've seen in fairseq.
Maybe make that 1e20
as a private constant, like _ALMOST_INFINITY
? Don't think it matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._YUGE
?
parlai/agents/seq2seq/seq2seq.py
Outdated
try: | ||
out = self.model(batch.text_vec, batch.label_vec) | ||
|
||
# generated response | ||
preds, scores = out[0], out[1] | ||
scores = out[0] | ||
preds = scores.max(2)[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i prefer _, preds = scores.max(2)
parlai/agents/seq2seq/seq2seq.py
Outdated
batch.candidates) | ||
|
||
text = [self._v2t(p) for p in preds] | ||
text = [self._v2t(p) for p in scores.max(2)[1].cpu()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO hiding this max() call in here is a bit nefarious.
_, preds = scores.max(2)
text = [self._v2t(p) for p in preds.cpu()]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually introduced a bug too because I used the teacher-forced scores on accident from the if block, I'll update
parlai/core/torch_agent.py
Outdated
@@ -775,7 +775,7 @@ def _init_cuda_buffer(self, model, criterion, batchsize, maxlen): | |||
try: | |||
print('preinitializing pytorch cuda buffer') | |||
dummy = torch.ones(batchsize, maxlen).long().cuda() | |||
sc = model(dummy, dummy)[1] | |||
sc = model(dummy, dummy)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs an explanation. It's assuming model has a particular interface and that interface isn't clear to me right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Assuming this is related to the hidden, encoder_out, attn_mask from earlier)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, I need to find a better way to extract the scores here or move it back to s2s
Note: I squashed old pretrained seq2seq module weights into the new modules and got exactly the same f1, ppl, hits on convai2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great
primary change is massive updates to further modularize and clean out modules.py
successfully trains & evals & ranks on babi:task10k:1
successfully trains & evals & ranks on convai2