Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add commands for Gettext-based translations i18n #1864

Closed
wants to merge 2 commits into from

Conversation

mgeisler
Copy link
Contributor

This implements the Gettext-based translation support mentioned in #5 (comment). Gettext is a wide-used standard for translating software, with many tools available for translators to maintain and update the translations.

I added two new top-level commands:

  • mdbook xgettext: will extract all strings into a messages.pot file, similar to how xgettext works for source code,
  • mdbook gettext: will use a xx.po file to generate a translated source tree, similar to how gettext works.

The names don't feel great to me, since they assume that one is already familiar with the Gettext system. Perhaps it would be better to have mdbook i18n extract and mdbook i18n translate or similar?

The translated source tree can be used together with the language support from #1306 to get a multi-lingual book.

@mgeisler mgeisler marked this pull request as draft July 24, 2022 21:54
@mgeisler
Copy link
Contributor Author

While this seems to work, I marked this as a draft since I'm sure we need some discussion here.

@mgeisler
Copy link
Contributor Author

Hey @sebras, this is the PR I was working on with the extract and reconstruct scripts — they're no longer scripts but now top-level mdbook commands just so that I could hook into the MDBook struct and easily iterate over the book content.

Please let me know how this works for you — I'll also be testing it out here over the next few weeks.

@mgeisler
Copy link
Contributor Author

mgeisler commented Aug 9, 2022

I'm marking this as non-draft since I would love to get some feedback from people on this.

@aellwein
Copy link

aellwein commented Sep 8, 2022

Hi @mgeisler,
i wanted to test this pull request, however i get an error message upon cargo build:

error[E0433]: failed to resolve: could not find `gettext` in `cmd`
  --> src/main.rs:38:48
   |
38 |         Some(("gettext", sub_matches)) => cmd::gettext::execute(sub_matches),
   |                                                ^^^^^^^ could not find `gettext` in `cmd`

error[E0433]: failed to resolve: could not find `gettext` in `cmd`
  --> src/main.rs:82:26
   |
82 |         .subcommand(cmd::gettext::make_subcommand())
   |                          ^^^^^^^ could not find `gettext` in `cmd`

Is there something missing?

@mgeisler mgeisler force-pushed the gettext-i18n branch 2 times, most recently from 65fb571 to 11d39d8 Compare September 8, 2022 15:31
@mgeisler
Copy link
Contributor Author

mgeisler commented Sep 8, 2022

Is there something missing?

Ups! Yes, there is... I had not added a pub mod gettext; line to src/cmd/mod.rs. Thanks for catching that!

I've updated the branch, please give it a try again.

@mgeisler mgeisler force-pushed the gettext-i18n branch 2 times, most recently from 027d32f to 14d6f9a Compare September 10, 2022 22:17
@aellwein
Copy link

@mgeisler, i'm sorry for the delay, it took me some time to test the PR, but first of all, thank you for your work.

I've tried to create some example content, everything works well but it was not quiet what i've expected.

xgettext command simply converted every line of my chapter into a separate message, but this approach
appears very tiresome to me, just because the whole text is split in lines and it's hard to read and follow
the context and translate afterwards.

In my opinion gettext makes sense, when you are expecting single messages to be translated out of the context (like program info boxes, error messages, buttons etc.), but in creation of a book it's usually the whole text of a chapter which is to be translated (with maybe some small exceptions).

So at least in my expectation, a chapter-by-chapter approach fits better here: i could imagine writing something like chapter1.<lang>.md and chapter1.<other_lang>.mdand just having a simple language switch in my generated markdown book to switch between different languages.

So i would like to know what others think about it, if gettext approach is feasible for book writers.

@sebras
Copy link

sebras commented Sep 13, 2022

So at least in my expectation, a chapter-by-chapter approach fits better here: i could imagine writing something like chapter1.<lang>.md and chapter1.<other_lang>.mdand just having a simple language switch in my generated markdown book to switch between different languages.

In other project where I have been translating online documentation and websites they tend to separate out each paragraph into a gettext translatable message. That gives the translator enough context while also not being overly long as entire chapters may be. Moreover a paragraph per message makes it easier to identify any changes per revision, if the message is too long it may be difficult to identify all differences. Finally, paragraphs may move around unchanged between different revisions, and then having each paragraph as a gettext message would not require retranslation (whereas an entire chapter would).

PS. These are just general observations from the position of a translator, I have not tested this proposed PR yet.

@mgeisler
Copy link
Contributor Author

@mgeisler, i'm sorry for the delay, it took me some time to test the PR, but first of all, thank you for your work.

No worries at all, thanks a lot for giving it a go!

I've tried to create some example content, everything works well but it was not quiet what i've expected.

xgettext command simply converted every line of my chapter into a separate message, but this approach appears very tiresome to me, just because the whole text is split in lines and it's hard to read and follow the context and translate afterwards.

Right, I fully intended to extract paragraphs (lines of text between \n\n+) and not individual lines.

I just tried with cargo run -- xgettext inside the test_book directory of this repository. The resulting messages.pot file looks like this:

#: individual/list.md:1
msgid "# Lists"
msgstr ""

#: individual/list.md:3
msgid ""
"1. A\n"
"2. Normal\n"
"3. Ordered\n"
"4. List"
msgstr ""

#: individual/list.md:8
msgid "---"
msgstr ""

This corresponds to

# Lists

1. A
2. Normal
3. Ordered
4. List

---

I think that's what we both wanted: lines of text is kept together unless it is separated by \n\n+. Do you see something else? Could it perhaps be that you're on Windows? I wrote the code to split on \n only, but I don't see why it could not split on \r\n as well.

Now, this list example is perhaps a poor example: I've been wondering if it makes sense to parse the Markdown more carefully and emit individual msgids for each list item. Similarly, a heading like ## My heading could be put into the messages.pot file as simply My heading. That way the translators will have less markup to deal with (but also slightly less context).

@mgeisler
Copy link
Contributor Author

So at least in my expectation, a chapter-by-chapter approach fits better here: i could imagine writing something like chapter1.<lang>.md and chapter1.<other_lang>.mdand just having a simple language switch in my generated markdown book to switch between different languages.

My experience with this is that it becomes impossible to track changes after a little while. This is in some sense an important role of the structured files created by Gettext: they give you a way to unambiguously say these 17 paragraphs are out of date.

If you just have a stream of changes to chapter1.<lang>.md, then it suddenly becomes a management task of the translator to track where the chapter1.<other_lang>.md file is in relationship to the source. Yes, it's doable, but it would require that the translator would write something like <!-- Translated up to commit abcd1234 --> at the top of the file.

When text is added and removed from the source file, the translator will now have to apply these changes — perhaps a paragraph is added on Monday and revised on Tuesday and Wednesday. If the translator sees this Friday, then they have to manually notice that they can avoid translating the text from Monday and Tuesday and only translate the version from Wednesday.

The "buffer" in the messages.pot file helps here: the translator start the workflow Friday morning by extracting all strings to mesages.pot. This file is then merged into other_lang.po. The translator now sees exactly that needs to be translated and they see what is "fuzzy" because only minor changes have been made to the source paragraph.

@aellwein
Copy link

I think that's what we both wanted: lines of text is kept together unless it is separated by \n\n+. Do you see something else? Could it perhaps be that you're on Windows? I wrote the code to split on \n only, but I don't see why it could not split on \r\n as well.

No, i am not on Windows, but i added additional line breaks after the sentences for better styling (my test text was a poem), this could be the reason.

Now, this list example is perhaps a poor example: I've been wondering if it makes sense to parse the Markdown more carefully and emit individual msgids for each list item. Similarly, a heading like ## My heading could be put into the messages.pot file as simply My heading. That way the translators will have less markup to deal with (but also slightly less context).

Yes, may be it's a good idea to have more "semantic" parsing of Markdown.

@mgeisler
Copy link
Contributor Author

No, i am not on Windows, but i added additional line breaks after the sentences for better styling (my test text was a poem), this could be the reason.

I see, was the poem perhaps indented or in a block quote? That is,

> foo
> bar

will be put into a single msgid. The same happens with the two quoted paragraphs in this example:

> foo
>
> bar

I think there could be a lot of benefit from parsing away such block-level markup and put foo and bar into their own msgid. Similar for code blocks, headings, and list items.

If we parse a list with 3 items into 3 msgids, then there's no way for a translator to add/remove list items. Right now, it seems like that's okay since it can help prevent translation mistakes.

@trdthg
Copy link

trdthg commented Sep 17, 2022

Hi, I'm trying to do some translation with your code and #1306. Here are the steps I took:

  • mdbook xgettext
  • msguniq messages.pot -o messages.pot
  • msginit -i messages.pot --local zh.po
  • mdbook gettext zh.po

Then I use mdbook from #1306 to build, but get this error:

[ERROR] (mdbook::utils): Error: Couldn't open SUMMARY.md in "/home/trdthg/myproject/flutter_rust_bridge/book/src/zh" directory

#1306 needs the translated book to have its own SUMMARY.md. So do I have to translate and copy it manually?

Btw, cloning two extra copies of mdbook is a bad experience)

This command is one half of a Gettext-based translation (i18n)
workflow. It iterates over each chapter and extracts all translatable
text into a `messages.pot` file.

The text is split on paragraph boundaries, which helps ensure less
churn in the output when the text is edited.

The other half of the workflow is a `gettext` command which will take
a source Markdown file and a `xx.po` file and output a translated
Markdown file.

Part of the solution for rust-lang#5.
This command is the second part of a Gettext-based translation (i18n)
workflow. It takes an `xx.po` file with translations and uses this to
translate the chapters of the book. Paragraphs without a translation
are kept in the original language.

Part of the solution for rust-lang#5.
@mgeisler
Copy link
Contributor Author

Hi @trdthg Thanks so much for testing this out!

#1306 needs the translated book to have its own SUMMARY.md. So do I have to translate and copy it manually?

You're completely right that I missed the generation of the SUMMARY.md file. I've pushed a new version of the branch which will also translate this file.

Btw, cloning two extra copies of mdbook is a bad experience)

Yeah, I agree... Perhaps @Ruin0x11 could rebase the branch on top of the latest master so that I in turn can rebase my branch on top. I just looked at the history and I see that the commits are 1-2 years old... so this might be much more work than I had hoped.

@Ruin0x11
Copy link

If I understand correctly this adds better support for translator focused tooling to my original code, is that accurate? I don't mind rebasing again, but I want to make sure there are no blockers for integrating the original code like last time.

@mgeisler
Copy link
Contributor Author

If I understand correctly this adds better support for translator focused tooling to my original code, is that accurate?

Yes, that is precisely the idea. The new commands in this PR allows for a Gettext based workflow for translations. The result is a tree of files which mirror the original files — a tree which should be ready to be put under src/xx/ for the xx language.

I want to make sure there are no blockers for integrating the original code like last time.

Just to be clear, I'm not a developer on the project — I'm just using mdbook myself for training materials and I would like to be able to translate this material to other languages.

@Ruin0x11
Copy link

Okay, thanks for clarifying, I'm also not a major contributor to mdBook, but shared the same need for multilingual support at one point. I'm happy to collaborate if there's some way of getting traction on these code changes.

mgeisler added a commit to google/comprehensive-rust that referenced this pull request Jan 8, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files.

  There are also many websites which allows you to do translation via
  an online flow. An example is Pontoon[4], which is used for the Rust
  website itself. We can consider setting up such an instance
  ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translatins. They are not yet
published or used for anything. Next steps will be

* Add support for switching languages via a bit of JavaScript on each
  page.
* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]: rust-lang/mdBook#5 (comment)
[4]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this pull request Jan 8, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files.

  There are also many websites which allows you to do translation via
  an online flow. An example is Pontoon[4], which is used for the Rust
  website itself. We can consider setting up such an instance
  ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translatins. They are not yet
published or used for anything. Next steps will be

* Add support for switching languages via a bit of JavaScript on each
  page.
* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]: rust-lang/mdBook#5 (comment)
[4]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this pull request Jan 9, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
@mgeisler
Copy link
Contributor Author

mgeisler commented Jan 9, 2023

Hi all, I'll close this PR in favor of google/comprehensive-rust#130. It's the same code there, but it's refactored to not require any changes of mdbook. Instead, I use a renderer (output format) to extract the strings and a preprocessor to do the translations.

You can reuse these tools in your own projects! Please let me know if you do so that we can figure out if we should publish them on crates.io.

@mgeisler mgeisler closed this Jan 9, 2023
mgeisler added a commit to google/comprehensive-rust that referenced this pull request Jan 9, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this pull request Jan 9, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this pull request Jan 10, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this pull request Jan 10, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this pull request Jan 10, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this pull request Jan 10, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this pull request Jan 11, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this pull request Jan 17, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this pull request Jan 18, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this pull request Jan 18, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this pull request Jan 18, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
mgeisler added a commit to google/comprehensive-rust that referenced this pull request Jan 18, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes #115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
@mgeisler
Copy link
Contributor Author

Just in case someone finds this much later: the tooling has been released as a set of mdbook plugins: https://github.com/google/mdbook-i18n-helpers.

NoahDragon pushed a commit to wnghl/comprehensive-rust that referenced this pull request Jul 19, 2023
This implements a translation pipeline using the industry-standard
Gettext[1] system.

I picked Gettext for the reasons described in [2] and [3]:

* It’s widely used in open source software. This means that there are
  graphical editors which will help you in editing the `.po` files. An
  example is Poedit[4], which is available for all major platforms.

  There are also many online systems for doing translations. An
  example is Pontoon[5], which is used for the Rust website itself. We
  can consider setting up such an instance ourselves.

* It is a light-weight yet structured format. This means that nothing
  changes with regards to how you update the original English text. We
  can still accept fixes and PRs like normal.

  The structure means that translators can see exactly which part of
  the course they need to update after a change. This is completely
  lost if you simply copy over the original text and translate it
  in-place in the Markdown files.

The code here only adds support for translations. They are not yet
tested, published or used for anything. Next steps will be:

* Add support for switching languages via a bit of JavaScript on each
  page.

* Update the speaker notes feature to support translations (right now
  “Speaker Notes” is hard-coded into the generated HTML). I think we
  should turn it into a mdbook preprocessor instead.

* Add testing: We should test that the `.po` files are well-formed. We
  should also run `mdbook test` on each language since the
  translations can alter the embedded code.

Fixes google#115.

[1]: https://www.gnu.org/software/gettext/manual/html_node/index.html
[2]: rust-lang/mdBook#1864
[3]:
rust-lang/mdBook#5 (comment)
[4]: https://poedit.net/
[5]: https://pontoon.rust-lang.org/
@mgeisler
Copy link
Contributor Author

Hi all, the latest version of mdbook-i18n-helpers significantly improves on how the text is extracted by removing unnecessary Markdown syntax. Please try it out if you're still interested in translating your mdbook documentation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants