-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added constrained decoding (#1536) #2402
Conversation
I'm marking this as a draft until I can straighten out all the test cases, which are giving me some trouble, in part due to the fact that I get different results running them locally. |
Okay, code has been modified and improved such that all tests are passing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me, but I'd like one of the Fairseq maintainers to take a look before merging.
Oh, on the hooks—yes, now I see, this makes sense. I only just added the stubs throwing |
if its already applicable to more than one task then thats fine, we can keep the flags in options but maybe we can add the error throwing somewhere else. for example, you can add a property "supports_constraints" on the base task class that returns false, and overwrite it for tasks that do. then in some single places check that property and throw if constraints are set but task does not support them? |
I did add this |
I just added an example of how to use constrained decoding under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, see a few small comments inline. @myleott may also want to take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me! Made a few comments below. I'm also going to "import" this to run any internal downstream unit/integration tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@myleott has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@myleott has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Hi @mjpost, I'm trying to adopt your LexicallyConstrainedBeamSearch for the transformer model with the bytelevelbpe tokenizer (sub-word level tokens) but it doesn't seem to work well with it. It is copying the constraints at the beginning of the decoded sequence, then sets end-of-sequence token regardless of the input. It does work well with the bpe (word level tokenizer) though, producing rich outputs with the correct positions of the constraints in them. What do you think can be the problem with the bytelevelbpe? Thanks |
My bad, I've just added eos to the constraints :) Problem solved. |
Hmm, EOS to the constraints? Doesn't that force them to be applied at the end of the sentence? I'd be curious to understand this better, but I'm glad you have it working. (BTW, I have found a bug with the constraint tracking. If a constraint is interrupted, instead of starting the tracking over at the beginning of that constraint, it starts over entirely. This only has an effect if you have multiple constraints. I'll have a fix in for this soon). |
no, if I add eos to the constraints it would just output the constraints. And setting min_length doesn't help. But I have to say that I'm using it with my custom transformer model, so maybe there is something else which is affecting it. Do you have different behavior with the eos? |
I'm not quite sure what you mean by "the eos". I assumed you meant that you were appending it to the constraints. It's hard to answer without knowing exactly what your input and command invocation are. |
I mean that adding eos token to the end of the constrains (I have just one constrain) makes the decoder output only that constrain. Changing beam size or min_length doesn't help. I can only produce rich sentences without eos in the constrains. Let me know if you can reproduce the same... |
If you post a minimal working example (input and command), I can take a look. |
well, its a bit tricky to provide the working example, because, as I say, I'm using it in my custom transformer (from ParlAI library), so I have to stripe a lot of it. But basically, if you initialize it like so:
|
hi @mjpost, I have just added my developments to the ParlAI repo. Can you please check it here https://github.com/PolKul/ParlAI If you run tests/run_constrained_beam_search.py with your constraints list and see how it works, I would appreciate it. I see several problems:
Thanks for you help |
I've started new discussion about ParlAI implementation here #facebookresearch/ParlAI#3582 |
Before asking:
What is your question? As I reimplement the part of examples-><constrained_decoding> I finish the example of constrained decoding, but as I want to use my model.pt to instead of the WMT's model.pt to achieve Vi-En's constrained translation, I found this issue: RuntimeError: Error(s) in loading state_dict for TransformerModel: echo -e "Cảm ơn bạn" I guess the question might be related to the class of PT file
|
Before submitting
What does this PR do?
This PR implements constrained decoding (Hokamp & Liu, 2017; Post & Vilar, 2018) with vectorization for batching (Hu et al., 2019). In addition, it add ordered constraints, where the constraints are generated on the target side in order, with zero or more unconstrained tokens in between. This variant allows for optimizations that increase speed and BLEU scores (when testing with random scraps from the references).
Usage and quick start
It works with
fairseq-interactive
via a new command-line option:fairseq-interactive --constraints [ordered,unordered]
, defaulting toordered
if nothing is provided. When active, it will split lines from STDIN on\t
, with separate constraints each separated by a tab. For example (after downloading the Fairseq WMT19 German--English model):Adding the
--constraints-both
option causes it to batch-decode the input sentence both with and without the constraints. When run with the Fairseq WMT19 German--English model, the following results are produced (here run on a CPU, don't be alarmed by the times!)Note the new tags present in the output:
C-#
records active constraints (after applying preprocessing) for a sentenceW-#
reports the sentence-level translation time (a useful unrelated feature I hope you'll accept)Some unit tests are written (
fairseq/test_constraints.py
) but not yet integrated. Advice here on where to place this is welcome. I also have not run this through lint; if someone can tell me the command to run, I'd appreciate it.Implementation notes
This is largely self-contained, implemented in a new
LexicallyConstrainedBeamSearch
class insearch.py
. It does require a few minimal hooks from_generate()
insequence_generator.py
, to ensure that constraints are updated at each timestep. (Edit: most changes in that file are documentation clarifications, corrections, and updates). Unconstrained sentences that are intermingled with constrained ones will not incur any time penalty, so long as they do not occur in the same batch.Addresses #1536.
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃