This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

new version of seq2seq #1119

Merged

alexholdenmiller merged 12 commits into s2s_migrate from break_s2s

Aug 29, 2018

Member

alexholdenmiller commented Aug 28, 2018 •

edited

Loading

primary change is massive updates to further modularize and clean out modules.py
successfully trains & evals & ranks on babi:task10k:1
successfully trains & evals & ranks on convai2

alexholdenmiller added 5 commits

August 28, 2018 13:54


          partial progress on rewriting modules.py

5603ba3


          partial

a5b8823


          move ranker

a928f6c


          fix bugs

80eff35


          more bug fixes

589f0df

facebook-github-bot added the CLA Signed label

alexholdenmiller added 2 commits

August 28, 2018 16:45


          bug fixes for convai2

1e76d0d


          bug fix

9df6d0e

alexholdenmiller changed the title ~~[wip] new version of seq2seq~~ new version of seq2seq

Member Author

alexholdenmiller commented Aug 29, 2018

trains / evals successfully on convai2, including ranking

alexholdenmiller added 2 commits

August 29, 2018 07:47


          version

1dc9594


          lint

2e1efdd

uralik reviewed

View reviewed changes

parlai/agents/seq2seq/seq2seq.py

    
                      """Add command-line arguments specifically for this agent."""

                      agent = argparser.add_argument_group('Seq2Seq Arguments')

                      agent.add_argument('--init-model', type=str, default=None,

                                         help='load dict/features/weights/opts from this file')

                                         help='load dict/model/opts from this path')

Contributor

uralik Aug 29, 2018

I really like the way when we have automatic default in the help with %(default)s, prolly we can fix that in seq2seq now

Member Author

alexholdenmiller Aug 29, 2018

yes let's do separate diff for those, I also want to reformat the whitespace on all the args

alexholdenmiller mentioned this pull request

switch seq2seq parent class to TorchAgent #1110

Merged

stephenroller approved these changes

View reviewed changes

Contributor

stephenroller left a comment

No need to block, but here are my opinions

parlai/agents/seq2seq/modules.py

+                  :param dim: (default 0) dimension to pad
+                  :returns: padded tensor if the tensor is shorter than length
+                  """
                   if tensor.size(dim) < length:
                       return torch.cat(
                           [tensor, tensor.new(*tensor.size()[:dim],

Contributor

stephenroller Aug 29, 2018

Just out of pure curiosity, did you benchmark this with other approaches? (namely init the full zero matrix, then assign the tensor into it?). Not that there's need to.

Contributor

stephenroller Aug 29, 2018

Also, should this actually fill_(null_idx)?

Member Author

alexholdenmiller Aug 29, 2018

didn't benchmark, although 1) this is copied from pytorch source, with dim param added by me and 2) only used for candidate score padding so I'm not super concerned about the perf :) oh I'll update with null_idx

parlai/agents/seq2seq/modules.py

-                          the last forward pass to skip recalcuating the same encoder output
-                      rank_during_training -- (default False) if set, ranks any available
-                          cands during training as well
+                      :param xs:          (bsz x seqlen) LongTensor input to the encoder

Contributor

stephenroller Aug 29, 2018

<3

parlai/agents/seq2seq/modules.py

+                      return self.START.detach().expand(bsz, 1)
+                  def _decode_forced(self, ys, encoder_states):
+                      """Decode with teacher forcing."""

Contributor

stephenroller Aug 29, 2018

I don't understand what this means

Contributor

uralik Aug 29, 2018

_decode_forced is putting tokens from the target as the next input to estimate the target score during training or during eval PPL computation

Member Author

alexholdenmiller Aug 29, 2018

"teacher-forcing" always inputs the true token at each time step, vs using the model's predicted

parlai/agents/seq2seq/modules.py

-              class Encoder(nn.Module):
-                  def __init__(self, num_features, padding_idx=0, rnn_class='lstm',
-                               emb_size=128, hidden_size=128, num_layers=2, dropout=0.1,
+                          scores = self._decode(encoder_states, maxlen or self.longest_label)

Contributor

stephenroller Aug 29, 2018

All much cleaner now, great!

parlai/agents/seq2seq/modules.py

                       super().__init__()
                       self.dropout = nn.Dropout(p=dropout)
-                      self.layers = num_layers

Contributor

stephenroller Aug 29, 2018

Why change all the names?

Member Author

alexholdenmiller Aug 29, 2018

just for consistency, there was a mix of underscores and non and things didn't match with the opt args

parlai/agents/seq2seq/modules.py Outdated

    
                      return encoder_output, hidden

                      return hidden, encoder_output, attn_mask

Contributor

stephenroller Aug 29, 2018

intentional swap?

Contributor

stephenroller Aug 29, 2018

nit TBH I think encoder_output, hidden, attn_mask is better.

Member Author

alexholdenmiller Aug 29, 2018

swap was intentional because later

hidden = encoder_states[0]
attn_params = (encoder_states[1], encoder_states[2])

but I could change that to [1] and ([0], [2])

it does match the rnn output more closely

parlai/agents/seq2seq/modules.py Outdated

                           if attn_mask is not None:
                               # remove activation from NULL symbols
+                              # TODO: is this the best operation?
                               attn_w_premask -= (1 - attn_mask) * 1e20

Contributor

stephenroller Aug 29, 2018

Could do th.masked_fill_(attn_w_premask, -1e20). That's what I've seen in fairseq.

Maybe make that 1e20 as a private constant, like _ALMOST_INFINITY? Don't think it matters.

Member Author

alexholdenmiller Aug 29, 2018

self._YUGE?

parlai/agents/seq2seq/seq2seq.py Outdated

                       try:
                           out = self.model(batch.text_vec, batch.label_vec)
                           # generated response
-                          preds, scores = out[0], out[1]
+                          scores = out[0]
+                          preds = scores.max(2)[1]

Contributor

stephenroller Aug 29, 2018

nit: i prefer _, preds = scores.max(2)

parlai/agents/seq2seq/seq2seq.py Outdated

                                                           batch.candidates)
-                      text = [self._v2t(p) for p in preds]
+                      text = [self._v2t(p) for p in scores.max(2)[1].cpu()]

Contributor

stephenroller Aug 29, 2018

IMHO hiding this max() call in here is a bit nefarious.

_, preds = scores.max(2)
text = [self._v2t(p) for p in preds.cpu()]

Member Author

alexholdenmiller Aug 29, 2018

actually introduced a bug too because I used the teacher-forced scores on accident from the if block, I'll update

parlai/core/torch_agent.py Outdated

                           try:
                               print('preinitializing pytorch cuda buffer')
                               dummy = torch.ones(batchsize, maxlen).long().cuda()
-                              sc = model(dummy, dummy)[1]
+                              sc = model(dummy, dummy)[0]

Contributor

stephenroller Aug 29, 2018

this needs an explanation. It's assuming model has a particular interface and that interface isn't clear to me right now.

Contributor

stephenroller Aug 29, 2018

(Assuming this is related to the hidden, encoder_out, attn_mask from earlier)

Member Author

alexholdenmiller Aug 29, 2018

agreed, I need to find a better way to extract the scores here or move it back to s2s

Member Author

alexholdenmiller commented Aug 29, 2018 •

edited

Loading

Note: I squashed old pretrained seq2seq module weights into the new modules and got exactly the same f1, ppl, hits on convai2.

uralik approved these changes

View reviewed changes

Contributor

uralik left a comment

Looks great

alexholdenmiller added 3 commits

August 29, 2018 09:27


          stephen suggestions, bug fix, move cuda buffer back to s2s

4d8f103


          flake fixes

98f0eba


          another flake fix

4f14eca

alexholdenmiller merged commit 92346e1 into s2s_migrate

alexholdenmiller deleted the break_s2s branch

August 29, 2018 18:38

alexholdenmiller added the torchagent label

This was referenced Nov 4, 2022

[Snyk] Fix for 1 vulnerabilities javakian/ParlAI#56

Open

[Snyk] Fix for 1 vulnerabilities javakian/ParlAI#57

Open

[Snyk] Fix for 1 vulnerabilities javakian/ParlAI#58

Open

[Snyk] Fix for 1 vulnerabilities javakian/ParlAI#59

Open

[Snyk] Fix for 1 vulnerabilities javakian/ParlAI#60

Open

This was referenced Dec 25, 2022

[Snyk] Fix for 1 vulnerabilities javakian/ParlAI#72

Open

[Snyk] Fix for 1 vulnerabilities javakian/ParlAI#73

Open

[Snyk] Fix for 1 vulnerabilities javakian/ParlAI#74

Open

[Snyk] Fix for 1 vulnerabilities javakian/ParlAI#75

Open

[Snyk] Fix for 1 vulnerabilities javakian/ParlAI#76

Open

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels