Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate translation platforms #1305

Open
mgeisler opened this issue Oct 5, 2023 · 6 comments
Open

Evaluate translation platforms #1305

mgeisler opened this issue Oct 5, 2023 · 6 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@mgeisler
Copy link
Collaborator

mgeisler commented Oct 5, 2023

We have so far let translators come up with their own workflows. This has worked remarkably well, despite having to juggle huge PO files with around 20k lines.

There are alternatives to editing the PO files locally: there exists several different online translation platforms which offers a better interface plus automation. A quick search finds sites such as

There are many others, so this is just a selection.

Features

  • Support for PO files:
  • Free for open source projects.
  • Ability to contribute changes back to this repository via a pull request.
    • I don't think we are allowed to give direct write-access to the platforms, so a PR will be ideal.

I think all of the ones listed above support these features.

I would like someone to evaluate the different platforms and some up with a suggestion for how we can use one. The goal is to make life easier for our translators.

@mgeisler mgeisler added good first issue Good for newcomers help wanted Extra attention is needed labels Oct 5, 2023
@mgeisler
Copy link
Collaborator Author

mgeisler commented Oct 5, 2023

@henrif75, we talked about this at some point in the past, but I figured we could try and crowd-source this.

@mgeisler mgeisler changed the title Set up translation platform Evaluate translation platforms Oct 5, 2023
@henrif75
Copy link
Collaborator

henrif75 commented Oct 6, 2023

I put together a simple spreadsheet to collect the information.

@mgeisler
Copy link
Collaborator Author

Hi @henrif75, I talked with @GuardsmanPanda today about this topic: Bjørn has been looking into this as well and I hope we can move forward with trying out one of the platforms.

@deavid, feel free to chime in if you have input as well 😄

@deavid
Copy link
Collaborator

deavid commented Feb 20, 2024

All I ask is that the PR or whatever needs to be reviewed can be assessed thoroughly in an easy manner, for me having GitHub integration or something that doesn't require much clicks or having to spin a console for review is good.
For example, for the current way of operation as long as I can review on the website, it's good enough. PRs bigger than what GitHub can manage should be no-op, meaning they're the result of running a tool or reformat.

Proving that the PR is no-op is okay even for big ones (although I need to go via console to see the diff). But if a big PR has changes mixed in, it's impossible to review.

If we start using a tool, I'd like that both contributors and reviewers benefit from that. If the tool just sends a regular PR and the reviewer still has to go through thousands of lines in the diff, I think that would be going backwards.

Also, if a tool makes things better, but different, we risk of having some PRs with the tool and some PRs without the tool. Not sure how inconvenient that would be.

@deavid deavid closed this as completed Feb 20, 2024
@deavid deavid reopened this Feb 20, 2024
@GuardsmanPanda
Copy link
Contributor

@deavid Later (this week I hope) I'll post some thoughts and a direction for further investigation / testing, but I am leaning towards suggesting that all translation work happens in the tool, it is much more convenient when sentences can be updated / approved individually.

But, there is still some thinking and tooling work to be done here, for example, right now the translation keys are just the English sentence, so even minor wording or grammar changes would be a 'new' translation key (save for some fancy matching magic), is that a problem we need to fix?

I am imagining the flow as follows.

  1. Change to the course.
  2. Push new phrases to the translation tool.
  3. Update translations in the translation tool.
  4. CI process rebuilds the course with the latest translation, grabbed directly from the tool API.

Not sure if step 2 happens auto magically or manually with human oversight, I lean towards the latter since I doubt we will make many course updates that break translations.

A major point is that the translation tool will / should be "round trip safe" so that we can always revert direction if something doesn't work.

@mgeisler
Copy link
Collaborator Author

I am leaning towards suggesting that all translation work happens in the tool, it is much more convenient when sentences can be updated / approved individually.

Yeah, me too — with a translation platform, I don't think we need to commit the PO files any longer. We thus also don't need any PRs.

But, there is still some thinking and tooling work to be done here, for example, right now the translation keys are just the English sentence, so even minor wording or grammar changes would be a 'new' translation key (save for some fancy matching magic), is that a problem we need to fix?

This solution came about because of two things:

  • It's how Gettext works and I happened to know Gettext from the past.
  • We need something very lightweight and un-intrusive in the Markdown. Having to create and maintain keys in the Markdown would be too complicated.

So to me, this is not a problem that needs fixing.

You mention fancy matching and infact, msgmerge will do such matching. It's not fancy by any means... but it is sometimes helpful 😄 What happens is that it

  • Compares the msgid entries in an old translation (xx.po) with the authoratative msgid entries in a fresh messages.pot file.
  • The msgid entries from the POT file correspond to the current English paragraphs — the msgid entries in the PO file correspond to what was current back when the translation was last updated.
  • If msgmerge finds an old msgid entry in the PO file that looks similar to the msgid entry from the POT file, it will use the old translation for the new msgid key. It marks the whole thing as "fuzzy".

Not sure if step 2 happens auto magically or manually with human oversight, I lean towards the latter since I doubt we will make many course updates that break translations.

We do seem to continue to update the text of the course, so if a translation wants to stay up to date, it would need to incorporate these changes. We do have a mechanism in place whereby we freeze a translation to the time it was last updated. This means that the Danish translation won't include new changes made today — and it thus has a chance of catching up.

I hope this is described okay in TRANSLATIONS.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants