Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sha for interpolation replacement #391

Closed

Conversation

npresco
Copy link
Contributor

@npresco npresco commented Sep 7, 2021

Fixes #390

Uses a digest of the interpolation variable to replace/restore during translation rather than zxzxzx plus a counter for the current interpolated variable. We will be sending more characters to google translate with this fix but this seemed an acceptable trade-off.

i += 1
"#{UNTRANSLATABLE_STRING}#{i}"
value.gsub INTERPOLATION_KEY_RE do |m|
Digest::SHA1.hexdigest m
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a require for Digets::SHA1

@glebm
Copy link
Owner

glebm commented Sep 9, 2021

I'm not sure how robust this is: There is a chance that a substring of the SHA-1 hash is an actual word, which may confuse Google Translate.

Perhaps we can simply use a different token, something other than zxzxzx?
I've just tried X__, and it seems to work with all the languages, including Yiddish and Sesotho

@npresco
Copy link
Contributor Author

npresco commented Sep 9, 2021

While there may be a substring that is an actual word inside the hash, since the hash is being sent as one word, I don't believe it will attempt to translate it. Then again the entire point of this PR is because google translate is translating strange combinations of letters, or at least behaving strangely. So it's a good point.

Using a token sounds like a plan, out of curiosity, why include any alphabet character at all? X___ vs __

@npresco
Copy link
Contributor Author

npresco commented Sep 9, 2021

Closing in favor of #392

@npresco npresco closed this Sep 9, 2021
@glebm
Copy link
Owner

glebm commented Sep 9, 2021

out of curiosity, why include any alphabet character at all?

Without a character, the __ gets omitted for some languages:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Google translate is translating zxzxzx from 'en' -> 'fr'
2 participants