Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added constrained decoding (#1536) #2402

Closed
wants to merge 27 commits into from

Conversation

mjpost
Copy link
Contributor

@mjpost mjpost commented Jul 31, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

This PR implements constrained decoding (Hokamp & Liu, 2017; Post & Vilar, 2018) with vectorization for batching (Hu et al., 2019). In addition, it add ordered constraints, where the constraints are generated on the target side in order, with zero or more unconstrained tokens in between. This variant allows for optimizations that increase speed and BLEU scores (when testing with random scraps from the references).

Usage and quick start

It works with fairseq-interactive via a new command-line option: fairseq-interactive --constraints [ordered,unordered], defaulting to ordered if nothing is provided. When active, it will split lines from STDIN on \t, with separate constraints each separated by a tab. For example (after downloading the Fairseq WMT19 German--English model):

echo -e "Die maschinelle Übersetzung ist schwer zu kontrollieren.\thard\tinfluence" \
  | [normalize.py](https://gist.github.com/mjpost/4c54446b7030d7c64b57461d27090650) \
  | [tok.py](https://gist.github.com/mjpost/ed7456f6a987c533102fc121678ed302) \
  | PYTHONPATH=$HOME/code/fairseq-constraints fairseq-interactive $modeldir \
  --bpe fastbpe \
  --bpe-codes $modeldir/bpecodes \
  --constraints \
  --constraints-both
  -s de -t en \
  --path $modeldir/model1.pt \
  --max-tokens 1000 \
  --beam 5 \

Adding the --constraints-both option causes it to batch-decode the input sentence both with and without the constraints. When run with the Fairseq WMT19 German--English model, the following results are produced (here run on a CPU, don't be alarmed by the times!)

S-0     Die masch@@ in@@ elle Über@@ setzung ist schwer zu kontrollieren .
W-0     1.844   seconds
C-0     hard
C-0     influence
H-0     -1.5333266258239746     Mach@@ ine trans@@ lation is hard to influence .
D-0     -1.5333266258239746     Machine translation is hard to influence .
P-0     -0.5434 -0.1423 -0.1930 -0.1415 -0.2346 -1.8031 -0.1701 -11.7727 -0.1815 -0.1511
S-0     Die masch@@ in@@ elle Über@@ setzung ist schwer zu kontrollieren .
W-0     1.844   seconds
H-0     -0.3731671869754791     Mach@@ ine trans@@ lation is difficult to control .
D-0     -0.3731671869754791     Machine translation is difficult to control .
P-0     -0.5434 -0.1423 -0.1930 -0.1415 -0.2346 -1.1430 -0.1665 -0.8482 -0.1678 -0.1514
2020-07-31 12:17:55 | INFO | fairseq_cli.interactive | Total time: 12.803 seconds; translation time: 3.688

Note the new tags present in the output:

  • C-# records active constraints (after applying preprocessing) for a sentence
  • W-# reports the sentence-level translation time (a useful unrelated feature I hope you'll accept)

Some unit tests are written (fairseq/test_constraints.py) but not yet integrated. Advice here on where to place this is welcome. I also have not run this through lint; if someone can tell me the command to run, I'd appreciate it.

Implementation notes

This is largely self-contained, implemented in a new LexicallyConstrainedBeamSearch class in search.py. It does require a few minimal hooks from _generate() in sequence_generator.py, to ensure that constraints are updated at each timestep. (Edit: most changes in that file are documentation clarifications, corrections, and updates). Unconstrained sentences that are intermingled with constrained ones will not incur any time penalty, so long as they do not occur in the same batch.

Addresses #1536.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mjpost mjpost marked this pull request as draft August 4, 2020 12:37
@mjpost
Copy link
Contributor Author

mjpost commented Aug 4, 2020

I'm marking this as a draft until I can straighten out all the test cases, which are giving me some trouble, in part due to the fact that I get different results running them locally.

@mjpost
Copy link
Contributor Author

mjpost commented Aug 5, 2020

Okay, code has been modified and improved such that all tests are passing.

@mjpost mjpost marked this pull request as ready for review August 5, 2020 15:59
Copy link
Contributor

@jhcross jhcross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me, but I'd like one of the Fairseq maintainers to take a look before merging.

fairseq_cli/interactive.py Show resolved Hide resolved
fairseq/constraints.py Outdated Show resolved Hide resolved
fairseq/search.py Outdated Show resolved Hide resolved
fairseq/search.py Show resolved Hide resolved
@mjpost
Copy link
Contributor Author

mjpost commented Aug 7, 2020

Oh, on the hooks—yes, now I see, this makes sense. I only just added the stubs throwing NotImplementedError at the end, to make the test cases pass, but now that they are there, you're right it makes sense to just call them as NOOPs instead of using an if statement. That simplifies a lot.

@alexeib
Copy link
Contributor

alexeib commented Aug 7, 2020

re: I agree the current approach is a bit cumbersome. I added all of those in order to get the test cases to pass. One counterargument is that constrained decoding has also been implemented for the Levenshtein Transformer, and there's no reason it couldn't be made to work with the multilingual decoders, too. We could add them just to these subsets, but it might be equally as messy. I'll follow your suggestion here, of course.

if its already applicable to more than one task then thats fine, we can keep the flags in options but maybe we can add the error throwing somewhere else. for example, you can add a property "supports_constraints" on the base task class that returns false, and overwrite it for tasks that do. then in some single places check that property and throw if constraints are set but task does not support them?

@mjpost
Copy link
Contributor Author

mjpost commented Aug 13, 2020

I did add this supports_constraints variable, defaulting to False in the base class, and currently only implemented in LexicallyConstrainedBeamSearch. The check is done in generate() since constraints are defined at the batch.

@mjpost
Copy link
Contributor Author

mjpost commented Aug 17, 2020

I just added an example of how to use constrained decoding under examples/, and then listed it in the top-level README. I think this is all set.

Copy link
Contributor

@alexeib alexeib left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, see a few small comments inline. @myleott may also want to take a look

README.md Outdated Show resolved Hide resolved
fairseq/options.py Outdated Show resolved Hide resolved
examples/constrained_decoding/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@myleott myleott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me! Made a few comments below. I'm also going to "import" this to run any internal downstream unit/integration tests.

README.md Show resolved Hide resolved
examples/constrained_decoding/normalize.py Show resolved Hide resolved
fairseq/search.py Outdated Show resolved Hide resolved
fairseq/sequence_generator.py Outdated Show resolved Hide resolved
fairseq/sequence_generator.py Show resolved Hide resolved
scripts/constraints/extract.py Show resolved Hide resolved
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@myleott has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@myleott has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@myleott merged this pull request in bd1b35d.

@PolKul
Copy link

PolKul commented Apr 5, 2021

Hi @mjpost, I'm trying to adopt your LexicallyConstrainedBeamSearch for the transformer model with the bytelevelbpe tokenizer (sub-word level tokens) but it doesn't seem to work well with it. It is copying the constraints at the beginning of the decoded sequence, then sets end-of-sequence token regardless of the input.

It does work well with the bpe (word level tokenizer) though, producing rich outputs with the correct positions of the constraints in them.

What do you think can be the problem with the bytelevelbpe?

Thanks

@PolKul
Copy link

PolKul commented Apr 5, 2021

My bad, I've just added eos to the constraints :) Problem solved.

@mjpost
Copy link
Contributor Author

mjpost commented Apr 6, 2021

Hmm, EOS to the constraints? Doesn't that force them to be applied at the end of the sentence? I'd be curious to understand this better, but I'm glad you have it working.

(BTW, I have found a bug with the constraint tracking. If a constraint is interrupted, instead of starting the tracking over at the beginning of that constraint, it starts over entirely. This only has an effect if you have multiple constraints. I'll have a fix in for this soon).

@PolKul
Copy link

PolKul commented Apr 6, 2021

no, if I add eos to the constraints it would just output the constraints. And setting min_length doesn't help. But I have to say that I'm using it with my custom transformer model, so maybe there is something else which is affecting it. Do you have different behavior with the eos?

@mjpost
Copy link
Contributor Author

mjpost commented Apr 6, 2021

I'm not quite sure what you mean by "the eos". I assumed you meant that you were appending it to the constraints. It's hard to answer without knowing exactly what your input and command invocation are.

@PolKul
Copy link

PolKul commented Apr 6, 2021

I mean that adding eos token to the end of the constrains (I have just one constrain) makes the decoder output only that constrain. Changing beam size or min_length doesn't help. I can only produce rich sentences without eos in the constrains. Let me know if you can reproduce the same...

@mjpost
Copy link
Contributor Author

mjpost commented Apr 6, 2021

If you post a minimal working example (input and command), I can take a look.

@PolKul
Copy link

PolKul commented Apr 6, 2021

well, its a bit tricky to provide the working example, because, as I say, I'm using it in my custom transformer (from ParlAI library), so I have to stripe a lot of it. But basically, if you initialize it like so:


#tokenize constraint and input
seed = parse('a feeling of attraction')
input = parse('what is love?')
#init constrained beam search 
self.search.init_constraints([seed], beam_size)
#run the rest of the decoder code below... 
....
#if you add [eos] to the seed then the output will be the same as input 
# without [eos] in the seed the output is a nice long sentence  

@PolKul
Copy link

PolKul commented Apr 12, 2021

hi @mjpost, I have just added my developments to the ParlAI repo. Can you please check it here https://github.com/PolKul/ParlAI

If you run tests/run_constrained_beam_search.py with your constraints list and see how it works, I would appreciate it. I see several problems:

  1. Currently I cannot use more than one constraint in the list
  2. When using beam_context_block_ngram >0 it would output garbage in the generated text for the second and all consequent utterances

Thanks for you help

@PolKul
Copy link

PolKul commented Apr 12, 2021

I've started new discussion about ParlAI implementation here #facebookresearch/ParlAI#3582

@jhkd-kevin
Copy link

Before asking:

search the issues.
search the docs.

What is your question?

As I reimplement the part of examples-><constrained_decoding> I finish the example of constrained decoding, but as I want to use my model.pt to instead of the WMT's model.pt to achieve Vi-En's constrained translation, I found this issue:

RuntimeError: Error(s) in loading state_dict for TransformerModel:
Unexpected key(s) in state_dict: "encoder.cons_pos_embed._float_tensor", "encoder.seg_embed.weight", "decoder.ptrnet.linear.weight", "decoder.ptrnet.linear.bias".
size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([42296, 512]) from checkpoint, the shape in current model is torch.Size([42295, 512]).
size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([42296, 512]) from checkpoint, the shape in current model is torch.Size([42295, 512]).
size mismatch for decoder.output_projection.weight: copying a param with shape torch.Size([42296, 512]) from checkpoint, the shape in current model is torch.Size([42295, 512]).
Code

echo -e "Cảm ơn bạn"
| python normalize.py | python tok.py
| fairseq-interactive /public/home/zhchynnu/perl5/ourmodel/examples/constrained_decoding/data
--path /public/home/zhchynnu/perl5/ourmodel/examples/constrained_decoding/path/ourmodel.pt
--bpe fastbpe
--bpe-codes /public/home/zhchynnu/perl5/ourmodel/examples/constrained_decoding/path/ourbpecodes
--constraints
-s vi -t en
--beam 10
What have you tried?

I guess the question might be related to the class of PT file
What's your environment?

fairseq Version (e.g., 1.0 or master):fairseq==0.10.2
PyTorch Version (e.g., 1.0)torch==1.5.0+cu101
OS (e.g., Linux):Linux
How you installed fairseq (pip, source):pip
Build command you used (if compiling from source):
Python version:3.6
CUDA/cuDNN version:10.1
GPU models and configuration: 2060
Any other relevant @InFormation:No

@mjpost

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants