You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
###Summary:
I am trying to reproduce the new feature of your sentencepiece version presented in the paper. Although I can run it with your sentencepiece itself, it does not seem to work within the whole Marian's sentencepiece pipeline. The params seem to be passed through marian but lost on the way to sentencepiece.
Bug description
I was running the marian training together with the inbuilt sentencepiece vocabulary.
In the training configuration, I put the following parameters into the sentencepiece options:
However, when sentencepiece is invoked, this param seems lost:
encode_case: 0
decode_case: 0
Necessary to add:
I tried passing other parameters through sentencepiece options (such as --character_coverage), as well as explicit True values of the --treat_whitespace_as_suffix and --encode_unicode_case params. Finally, I tried various orderings of these parameters. Everything resulted with the same thing.
I tried installing the marian's sentencepiece separately with this command:
run spm_train --encode_unicode_case --treat_whitespace_as_suffix --input csuk_toy1M.txt --model_prefix case_encoded
###Summary:
I am trying to reproduce the new feature of your sentencepiece version presented in the paper. Although I can run it with your sentencepiece itself, it does not seem to work within the whole Marian's sentencepiece pipeline. The params seem to be passed through marian but lost on the way to sentencepiece.
Bug description
I was running the marian training together with the inbuilt sentencepiece vocabulary.
In the training configuration, I put the following parameters into the sentencepiece options:
All the parameters were detected by the marian (see stdout.txt):
However, when sentencepiece is invoked, this param seems lost:
Necessary to add:
--character_coverage
), as well as explicitTrue
values of the --treat_whitespace_as_suffix and --encode_unicode_case params. Finally, I tried various orderings of these parameters. Everything resulted with the same thing.and it worked, it also was reflected in the log:
Context
Will appreciate any help!
The text was updated successfully, but these errors were encountered: