mirror of
https://github.com/google/comprehensive-rust.git
synced 2025-01-23 14:06:16 +02:00
Add support for translations
This implements a translation pipeline using the industry-standard Gettext[1] system. I picked Gettext for the reasons described in [2] and [3]: * It’s widely used in open source software. This means that there are graphical editors which will help you in editing the `.po` files. An example is Poedit[4], which is available for all major platforms. There are also many online systems for doing translations. An example is Pontoon[5], which is used for the Rust website itself. We can consider setting up such an instance ourselves. * It is a light-weight yet structured format. This means that nothing changes with regards to how you update the original English text. We can still accept fixes and PRs like normal. The structure means that translators can see exactly which part of the course they need to update after a change. This is completely lost if you simply copy over the original text and translate it in-place in the Markdown files. The code here only adds support for translations. They are not yet tested, published or used for anything. Next steps will be: * Add support for switching languages via a bit of JavaScript on each page. * Update the speaker notes feature to support translations (right now “Speaker Notes” is hard-coded into the generated HTML). I think we should turn it into a mdbook preprocessor instead. * Add testing: We should test that the `.po` files are well-formed. We should also run `mdbook test` on each language since the translations can alter the embedded code. Fixes #115. [1]: https://www.gnu.org/software/gettext/manual/html_node/index.html [2]: https://github.com/rust-lang/mdBook/pull/1864 [3]: https://github.com/rust-lang/mdBook/issues/5#issuecomment-1144887806 [4]: https://poedit.net/ [5]: https://pontoon.rust-lang.org/
This commit is contained in:
parent
8e3941d2e6
commit
48ec773052
3
.gitignore
vendored
3
.gitignore
vendored
@ -1,2 +1,3 @@
|
||||
/book/
|
||||
/target/
|
||||
target/
|
||||
po/messages.pot
|
||||
|
116
TRANSLATIONS.md
Normal file
116
TRANSLATIONS.md
Normal file
@ -0,0 +1,116 @@
|
||||
# Translations of Comprehensive Rust 🦀
|
||||
|
||||
We would love to have your help with translating the course into other
|
||||
languages! We use the [Gettext] system for translations. This means that you
|
||||
don't modify the Markdown files directly: instead you modify `.po` files in a
|
||||
`po/` directory. The `.po` files are small text-based translation databases.
|
||||
|
||||
There is a `.po` file for each language. They are named after the [ISO 639]
|
||||
language codes: Danish would go into `po/da.po`, Korean would go into
|
||||
`po/ko.po`, etc. The `.po` files contain all the English text plus the
|
||||
translations. They are initialized from a `messages.pot` file (a PO template)
|
||||
which contains only the English text.
|
||||
|
||||
We will show how to update and manipulate the `.po` and `.pot` files using the
|
||||
GNU Gettext utilities below.
|
||||
|
||||
[Gettext]: https://www.gnu.org/software/gettext/manual/html_node/index.html
|
||||
[ISO 639]: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
|
||||
|
||||
## I18n Helpers
|
||||
|
||||
We use two helpers for the translations:
|
||||
|
||||
* `mdbook-xgettext`: This program extracts the English text. It is an mdbook
|
||||
renderer.
|
||||
* `mdbook-gettext`: This program translates the book into a target language. It
|
||||
is an mdbook preprocessor.
|
||||
|
||||
Install both helpers with the following command from the root of the course:
|
||||
|
||||
```shell
|
||||
$ cargo install --path i18n-helpers
|
||||
```
|
||||
|
||||
## Creating and Updating Translations
|
||||
|
||||
First, you need to know how to update the `.pot` and `.po` files.
|
||||
|
||||
As a general rule, you should never touch the auto-generated `po/messages.pot`
|
||||
file. You should also not edit the `msgid` entries in a `po/xx.po` file. If you
|
||||
find mistakes, you need to update the original English text instead. The fixes
|
||||
to the English text will flow into the `.po` files the next time the translators
|
||||
update them.
|
||||
|
||||
### Generating the PO Template
|
||||
|
||||
To extract the original English text and generate a `messages.pot` file, you run
|
||||
`mdbook` with a special renderer:
|
||||
|
||||
```shell
|
||||
$ MDBOOK_OUTPUT='{"xgettext": {"pot-file": "messages.pot"}}' \
|
||||
mdbook build -d po
|
||||
```
|
||||
|
||||
You will find the generated POT file as `po/messages.pot`.
|
||||
|
||||
### Initialize a New Translation
|
||||
|
||||
To start a new translation, first generate the `po/messages.pot` file. Then use
|
||||
`msginit` to create a `xx.po` file for the fictional `xx` language:
|
||||
|
||||
```shell
|
||||
$ msginit -i po/messages.pot -l xx -o po/xx.po
|
||||
```
|
||||
|
||||
You can also simply copy `po/messages.pot` to `po/xx.po`. Then update the file
|
||||
header (the first entry with `msgid ""`) to the correct language.
|
||||
|
||||
### Updating an Existing Translation
|
||||
|
||||
As the English text changes, translations gradually become outdated. To update
|
||||
the `po/xx.po` file with new messages, first extract the English text into a
|
||||
`po/messages.pot` template file. Then run
|
||||
|
||||
```shell
|
||||
$ msgmerge --update po/xx.po po/messages.pot
|
||||
```
|
||||
|
||||
Unchanged messages will stay intact, deleted messages are marked as old, and
|
||||
updated messages are marked "fuzzy". A fuzzy entry will reuse the previous
|
||||
translation: you should then go over it and update it as necessary before you
|
||||
remove the fuzzy marker.
|
||||
|
||||
## Using Translations
|
||||
|
||||
This will show you how to use the translations to generate localized HTML
|
||||
output.
|
||||
|
||||
## Building a Translation
|
||||
|
||||
To use the `po/xx.po` file for your output, run the following command:
|
||||
|
||||
```shell
|
||||
$ MDBOOK_BOOK__LANGUAGE='xx' \
|
||||
MDBOOK_PREPROCESSOR__GETTEXT__PO_FILE='po/xx.po' \
|
||||
MDBOOK_PREPROCESSOR__GETTEXT__RENDERERS='["html"]' \
|
||||
MDBOOK_PREPROCESSOR__GETTEXT__BEFORE='["svgbob"]' \
|
||||
mdbook build -d book/xx
|
||||
```
|
||||
|
||||
This will update the book's language to `xx`, it will make the `mdbook-gettext`
|
||||
preprocessor become active and tell it to use the `po/xx.po` file, and finally
|
||||
it will redirect the output to `book/xx`.
|
||||
|
||||
## Serving a Translation
|
||||
|
||||
Like normal, you can use `mdbook serve` to view your translation as you work on
|
||||
it. You use the same command as with `mdbook build` above, but additionally
|
||||
we'll tell `mdbook` to watch the `po/` directory for changes:
|
||||
|
||||
```shell
|
||||
$ MDBOOK_BOOK__LANGUAGE=xx \
|
||||
MDBOOK_PREPROCESSOR__GETTEXT__PO_FILE=po/xx.po
|
||||
MDBOOK_BUILD__EXTRA_WATCH_DIRS='["po"]'
|
||||
mdbook serve -d book/xx
|
||||
```
|
@ -8,7 +8,14 @@ title = "Comprehensive Rust 🦀"
|
||||
[rust]
|
||||
edition = "2021"
|
||||
|
||||
[preprocessor.links]
|
||||
renderers = ["html"]
|
||||
|
||||
[preprocessor.index]
|
||||
renderers = ["html"]
|
||||
|
||||
[preprocessor.svgbob]
|
||||
renderers = ["html"]
|
||||
class = "bob"
|
||||
|
||||
[output.html]
|
||||
|
1941
i18n-helpers/Cargo.lock
generated
Normal file
1941
i18n-helpers/Cargo.lock
generated
Normal file
File diff suppressed because it is too large
Load Diff
14
i18n-helpers/Cargo.toml
Normal file
14
i18n-helpers/Cargo.toml
Normal file
@ -0,0 +1,14 @@
|
||||
[package]
|
||||
name = "i18n-helpers"
|
||||
version = "0.1.0"
|
||||
edition = "2021"
|
||||
publish = false
|
||||
|
||||
[dependencies]
|
||||
anyhow = "1.0.68"
|
||||
mdbook = "0.4.25"
|
||||
once_cell = "1.17.0"
|
||||
polib = "0.1.0"
|
||||
regex = "1.7.0"
|
||||
semver = "1.0.16"
|
||||
serde_json = "1.0.91"
|
230
i18n-helpers/src/bin/mdbook-gettext.rs
Normal file
230
i18n-helpers/src/bin/mdbook-gettext.rs
Normal file
@ -0,0 +1,230 @@
|
||||
// Copyright 2023 Google LLC
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
//! `gettext` for `mdbook`
|
||||
//!
|
||||
//! This program works like `gettext`, meaning it will translate
|
||||
//! strings in your book.
|
||||
//!
|
||||
//! The translations come from GNU Gettext `xx.po` files. You must set
|
||||
//! preprocessor.gettext.po-file to the PO file to use. If unset, a
|
||||
//! warning is issued while building the book.
|
||||
//!
|
||||
//! See `TRANSLATIONS.md` in the repository root for more information.
|
||||
|
||||
use anyhow::{anyhow, Context};
|
||||
use i18n_helpers::extract_paragraphs;
|
||||
use mdbook::book::Book;
|
||||
use mdbook::preprocess::{CmdPreprocessor, PreprocessorContext};
|
||||
use mdbook::BookItem;
|
||||
use polib::catalog::Catalog;
|
||||
use polib::po_file;
|
||||
use semver::{Version, VersionReq};
|
||||
use std::io;
|
||||
use std::path::Path;
|
||||
use std::process;
|
||||
|
||||
fn translate(text: &str, catalog: &Catalog) -> String {
|
||||
let mut output = String::with_capacity(text.len());
|
||||
let mut target_lineno = 1;
|
||||
|
||||
for (lineno, paragraph) in extract_paragraphs(text) {
|
||||
// Fill in blank lines between paragraphs. This is important
|
||||
// for code blocks where blank lines are significant.
|
||||
while target_lineno < lineno {
|
||||
output.push('\n');
|
||||
target_lineno += 1;
|
||||
}
|
||||
// Subtract 1 because the paragraph is missing a final '\n'
|
||||
// due to the splitting in `extract_paragraphs`.
|
||||
target_lineno += paragraph.lines().count() - 1;
|
||||
|
||||
let translated = catalog
|
||||
.find_message(paragraph)
|
||||
.and_then(|msg| msg.get_msgstr().ok())
|
||||
.filter(|msgstr| !msgstr.is_empty())
|
||||
.map(|msgstr| msgstr.as_str())
|
||||
.unwrap_or(paragraph);
|
||||
output.push_str(translated);
|
||||
}
|
||||
|
||||
let suffix = &text[text.trim_end_matches('\n').len()..];
|
||||
output.push_str(suffix);
|
||||
output
|
||||
}
|
||||
|
||||
fn translate_book(ctx: &PreprocessorContext, mut book: Book) -> anyhow::Result<Book> {
|
||||
let cfg = ctx
|
||||
.config
|
||||
.get_preprocessor("gettext")
|
||||
.ok_or_else(|| anyhow!("Could not read preprocessor.gettext configuration"))?;
|
||||
let path = cfg
|
||||
.get("po-file")
|
||||
.ok_or_else(|| anyhow!("Missing preprocessor.gettext.po-file config value"))?
|
||||
.as_str()
|
||||
.ok_or_else(|| anyhow!("Expected a string for preprocessor.gettext.po-file"))?;
|
||||
let catalog = po_file::parse(Path::new(path))
|
||||
.map_err(|err| anyhow!("{err}"))
|
||||
.with_context(|| format!("Could not parse {path} as PO file"))?;
|
||||
|
||||
book.for_each_mut(|item| match item {
|
||||
BookItem::Chapter(ch) => {
|
||||
ch.content = translate(&ch.content, &catalog);
|
||||
ch.name = translate(&ch.name, &catalog);
|
||||
}
|
||||
BookItem::Separator => {}
|
||||
BookItem::PartTitle(title) => {
|
||||
*title = translate(title, &catalog);
|
||||
}
|
||||
});
|
||||
|
||||
Ok(book)
|
||||
}
|
||||
|
||||
fn preprocess() -> anyhow::Result<()> {
|
||||
let (ctx, book) = CmdPreprocessor::parse_input(io::stdin())?;
|
||||
let book_version = Version::parse(&ctx.mdbook_version)?;
|
||||
let version_req = VersionReq::parse(mdbook::MDBOOK_VERSION)?;
|
||||
if !version_req.matches(&book_version) {
|
||||
eprintln!(
|
||||
"Warning: The gettext preprocessor was built against \
|
||||
mdbook version {}, but we're being called from version {}",
|
||||
mdbook::MDBOOK_VERSION,
|
||||
ctx.mdbook_version
|
||||
);
|
||||
}
|
||||
|
||||
let translated_book = translate_book(&ctx, book)?;
|
||||
serde_json::to_writer(io::stdout(), &translated_book)?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn main() -> anyhow::Result<()> {
|
||||
if std::env::args().len() == 3 {
|
||||
assert_eq!(std::env::args().nth(1).as_deref(), Some("supports"));
|
||||
// Signal that we support all renderers.
|
||||
process::exit(0);
|
||||
}
|
||||
|
||||
preprocess()
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use polib::message::Message;
|
||||
|
||||
fn create_catalog(translations: &[(&str, &str)]) -> Catalog {
|
||||
let mut catalog = Catalog::new();
|
||||
for (msgid, msgstr) in translations {
|
||||
let message = Message::new_singular("", "", "", "", msgid, msgstr);
|
||||
catalog.add_message(message);
|
||||
}
|
||||
catalog
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_translate_single_line() {
|
||||
let catalog = create_catalog(&[("foo bar", "FOO BAR")]);
|
||||
assert_eq!(translate("foo bar", &catalog), "FOO BAR");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_translate_single_paragraph() {
|
||||
let catalog = create_catalog(&[("foo bar", "FOO BAR")]);
|
||||
assert_eq!(translate("foo bar\n", &catalog), "FOO BAR\n");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_translate_paragraph_with_leading_newlines() {
|
||||
let catalog = create_catalog(&[("foo bar", "FOO BAR")]);
|
||||
assert_eq!(translate("\n\n\nfoo bar\n", &catalog), "\n\n\nFOO BAR\n");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_translate_paragraph_with_trailing_newlines() {
|
||||
let catalog = create_catalog(&[("foo bar", "FOO BAR")]);
|
||||
assert_eq!(translate("foo bar\n\n\n", &catalog), "FOO BAR\n\n\n");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_translate_multiple_paragraphs() {
|
||||
let catalog = create_catalog(&[("foo bar", "FOO BAR")]);
|
||||
assert_eq!(
|
||||
translate(
|
||||
"first paragraph\n\
|
||||
\n\
|
||||
foo bar\n\
|
||||
\n\
|
||||
last paragraph\n",
|
||||
&catalog
|
||||
),
|
||||
"first paragraph\n\
|
||||
\n\
|
||||
FOO BAR\n\
|
||||
\n\
|
||||
last paragraph\n"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_translate_multiple_paragraphs_extra_newlines() {
|
||||
// Notice how the translated paragraphs have more lines.
|
||||
let catalog = create_catalog(&[
|
||||
(
|
||||
"first\n\
|
||||
paragraph",
|
||||
"FIRST\n\
|
||||
TRANSLATED\n\
|
||||
PARAGRAPH",
|
||||
),
|
||||
(
|
||||
"last\n\
|
||||
paragraph",
|
||||
"LAST\n\
|
||||
TRANSLATED\n\
|
||||
PARAGRAPH",
|
||||
),
|
||||
]);
|
||||
// Paragraph separation is kept intact while translating.
|
||||
assert_eq!(
|
||||
translate(
|
||||
"\n\
|
||||
first\n\
|
||||
paragraph\n\
|
||||
\n\
|
||||
\n\
|
||||
\n\
|
||||
last\n\
|
||||
paragraph\n\
|
||||
\n\
|
||||
\n",
|
||||
&catalog
|
||||
),
|
||||
"\n\
|
||||
FIRST\n\
|
||||
TRANSLATED\n\
|
||||
PARAGRAPH\n\
|
||||
\n\
|
||||
\n\
|
||||
\n\
|
||||
LAST\n\
|
||||
TRANSLATED\n\
|
||||
PARAGRAPH\n\
|
||||
\n\
|
||||
\n"
|
||||
);
|
||||
}
|
||||
}
|
128
i18n-helpers/src/bin/mdbook-xgettext.rs
Normal file
128
i18n-helpers/src/bin/mdbook-xgettext.rs
Normal file
@ -0,0 +1,128 @@
|
||||
// Copyright 2023 Google LLC
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
//! `xgettext` for `mdbook`
|
||||
//!
|
||||
//! This program works like `xgettext`, meaning it will extract
|
||||
//! translatable strings from your book. The strings are saved in a
|
||||
//! GNU Gettext `messages.pot` file in your build directory (typically
|
||||
//! `po/messages.pot`).
|
||||
//!
|
||||
//! See `TRANSLATIONS.md` in the repository root for more information.
|
||||
|
||||
use anyhow::{anyhow, Context};
|
||||
use mdbook::renderer::RenderContext;
|
||||
use mdbook::BookItem;
|
||||
use polib::catalog::Catalog;
|
||||
use polib::message::Message;
|
||||
use std::fs;
|
||||
use std::io;
|
||||
|
||||
fn add_message(catalog: &mut Catalog, msgid: &str, source: &str) {
|
||||
let sources = match catalog.find_message(msgid) {
|
||||
Some(msg) => format!("{}\n{}", msg.source, source),
|
||||
None => String::from(source),
|
||||
};
|
||||
let message = Message::new_singular("", &sources, "", "", msgid, "");
|
||||
|
||||
// Carefully update the existing message or add a new one. It's an
|
||||
// error to create a catalog with duplicate msgids.
|
||||
match catalog.find_message_index(msgid) {
|
||||
Some(&idx) => catalog.update_message_by_index(idx, message).unwrap(),
|
||||
None => catalog.add_message(message),
|
||||
}
|
||||
}
|
||||
|
||||
fn create_catalog(ctx: &RenderContext) -> anyhow::Result<Catalog> {
|
||||
let mut catalog = Catalog::new();
|
||||
if let Some(title) = &ctx.config.book.title {
|
||||
catalog.metadata.project_id_version = String::from(title);
|
||||
}
|
||||
if let Some(lang) = &ctx.config.book.language {
|
||||
catalog.metadata.language = String::from(lang);
|
||||
}
|
||||
catalog.metadata.mime_version = String::from("1.0");
|
||||
catalog.metadata.content_type = String::from("text/plain; charset=UTF-8");
|
||||
catalog.metadata.content_transfer_encoding = String::from("8bit");
|
||||
|
||||
let summary_path = ctx.config.book.src.join("SUMMARY.md");
|
||||
let summary = std::fs::read_to_string(ctx.root.join(&summary_path))?;
|
||||
|
||||
// First, add all chapter names and part titles from SUMMARY.md.
|
||||
// The book items are in order of the summary, so we can assign
|
||||
// correct line numbers for duplicate lines by tracking the index
|
||||
// of our last search.
|
||||
let mut last_idx = 0;
|
||||
for item in ctx.book.iter() {
|
||||
let line = match item {
|
||||
BookItem::Chapter(chapter) => &chapter.name,
|
||||
BookItem::PartTitle(title) => title,
|
||||
BookItem::Separator => continue,
|
||||
};
|
||||
|
||||
let idx = summary[last_idx..].find(line).ok_or_else(|| {
|
||||
anyhow!(
|
||||
"Could not find {line:?} in SUMMARY.md after line {} -- \
|
||||
please remove any formatting from SUMMARY.md",
|
||||
summary[..last_idx].lines().count()
|
||||
)
|
||||
})?;
|
||||
last_idx += idx;
|
||||
let lineno = summary[..last_idx].lines().count();
|
||||
let source = format!("{}:{}", summary_path.display(), lineno);
|
||||
add_message(&mut catalog, line, &source);
|
||||
}
|
||||
|
||||
// Next, we add the chapter contents.
|
||||
for item in ctx.book.iter() {
|
||||
if let BookItem::Chapter(chapter) = item {
|
||||
let path = match &chapter.path {
|
||||
Some(path) => ctx.config.book.src.join(path),
|
||||
None => continue,
|
||||
};
|
||||
for (lineno, paragraph) in i18n_helpers::extract_paragraphs(&chapter.content)
|
||||
{
|
||||
let source = format!("{}:{}", path.display(), lineno);
|
||||
add_message(&mut catalog, paragraph, &source);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(catalog)
|
||||
}
|
||||
|
||||
fn main() -> anyhow::Result<()> {
|
||||
let ctx = RenderContext::from_json(&mut io::stdin()).context("Parsing stdin")?;
|
||||
let cfg = ctx
|
||||
.config
|
||||
.get_renderer("xgettext")
|
||||
.ok_or_else(|| anyhow!("Could not read output.xgettext configuration"))?;
|
||||
let path = cfg
|
||||
.get("pot-file")
|
||||
.ok_or_else(|| anyhow!("Missing output.xgettext.pot-file config value"))?
|
||||
.as_str()
|
||||
.ok_or_else(|| anyhow!("Expected a string for output.xgettext.pot-file"))?;
|
||||
fs::create_dir_all(&ctx.destination)
|
||||
.with_context(|| format!("Could not create {}", ctx.destination.display()))?;
|
||||
let output_path = ctx.destination.join(path);
|
||||
if output_path.exists() {
|
||||
fs::remove_file(&output_path)
|
||||
.with_context(|| format!("Removing {}", output_path.display()))?
|
||||
}
|
||||
let catalog = create_catalog(&ctx).context("Extracting messages")?;
|
||||
polib::po_file::write(&catalog, &output_path)
|
||||
.with_context(|| format!("Writing messages to {}", output_path.display()))?;
|
||||
|
||||
Ok(())
|
||||
}
|
123
i18n-helpers/src/lib.rs
Normal file
123
i18n-helpers/src/lib.rs
Normal file
@ -0,0 +1,123 @@
|
||||
// Copyright 2023 Google LLC
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
use once_cell::sync::Lazy;
|
||||
use regex::Regex;
|
||||
|
||||
static PARAGRAPH_SEPARATOR: Lazy<Regex> = Lazy::new(|| Regex::new(r"\n\n+").unwrap());
|
||||
|
||||
/// Extract paragraphs from text.
|
||||
///
|
||||
/// Paragraphs are separated by at least two newlines. Returns an
|
||||
/// iterator over line numbers (starting from 1) and paragraphs.
|
||||
pub fn extract_paragraphs(text: &str) -> impl Iterator<Item = (usize, &str)> {
|
||||
// TODO: This could be made more sophisticated by parsing the
|
||||
// Markdown and stripping off the markup characters.
|
||||
//
|
||||
// As an example, a header like "## My heading" could become just
|
||||
// "My heading" in the `.pot` file. Similarly, paragraphs could be
|
||||
// unfolded and list items could be translated one-by-one.
|
||||
|
||||
// Skip over leading empty lines.
|
||||
let trimmed = text.trim_start_matches('\n');
|
||||
let mut matches = PARAGRAPH_SEPARATOR.find_iter(trimmed);
|
||||
let mut lineno = 1 + text.len() - trimmed.len();
|
||||
let mut last = 0;
|
||||
|
||||
std::iter::from_fn(move || match matches.next() {
|
||||
Some(m) => {
|
||||
let result = (lineno, &trimmed[last..m.start()]);
|
||||
lineno += trimmed[last..m.end()].lines().count();
|
||||
last = m.end();
|
||||
Some(result)
|
||||
}
|
||||
None => {
|
||||
if last < trimmed.len() {
|
||||
let result = (lineno, trimmed[last..].trim_end_matches('\n'));
|
||||
last = trimmed.len();
|
||||
Some(result)
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
macro_rules! assert_iter_eq {
|
||||
($left_iter:expr, $right:expr) => {
|
||||
assert_eq!($left_iter.collect::<Vec<_>>(), $right)
|
||||
};
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_extract_paragraphs_empty() {
|
||||
assert_iter_eq!(extract_paragraphs(""), vec![]);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_extract_paragraphs_single_line() {
|
||||
assert_iter_eq!(
|
||||
extract_paragraphs("This is a paragraph."),
|
||||
vec![(1, "This is a paragraph.")]
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_extract_paragraphs_simple() {
|
||||
assert_iter_eq!(
|
||||
extract_paragraphs(
|
||||
"This is\n\
|
||||
the first\n\
|
||||
paragraph.\n\
|
||||
\n\
|
||||
Second paragraph."
|
||||
),
|
||||
vec![
|
||||
(1, "This is\nthe first\nparagraph."),
|
||||
(5, "Second paragraph.")
|
||||
]
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_extract_paragraphs_leading_newlines() {
|
||||
assert_iter_eq!(
|
||||
extract_paragraphs(
|
||||
"\n\
|
||||
\n\
|
||||
\n\
|
||||
This is the\n\
|
||||
first paragraph."
|
||||
),
|
||||
vec![(4, "This is the\nfirst paragraph.")]
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_extract_paragraphs_trailing_newlines() {
|
||||
assert_iter_eq!(
|
||||
extract_paragraphs(
|
||||
"This is\n\
|
||||
a paragraph.\n\
|
||||
\n\
|
||||
\n"
|
||||
),
|
||||
vec![(1, "This is\na paragraph.")]
|
||||
);
|
||||
}
|
||||
}
|
@ -28,19 +28,19 @@
|
||||
- [References](basic-syntax/references.md)
|
||||
- [Dangling References](basic-syntax/references-dangling.md)
|
||||
- [Slices](basic-syntax/slices.md)
|
||||
- [`String` vs `str`](basic-syntax/string-slices.md)
|
||||
- [String vs str](basic-syntax/string-slices.md)
|
||||
- [Functions](basic-syntax/functions.md)
|
||||
- [Methods](basic-syntax/methods.md)
|
||||
- [Overloading](basic-syntax/functions-interlude.md)
|
||||
- [Exercises](exercises/day-1/morning.md)
|
||||
- [Implicit Conversions](exercises/day-1/implicit-conversions.md)
|
||||
- [Arrays and `for` Loops](exercises/day-1/for-loops.md)
|
||||
- [Arrays and for Loops](exercises/day-1/for-loops.md)
|
||||
|
||||
# Day 1: Afternoon
|
||||
|
||||
- [Variables](basic-syntax/variables.md)
|
||||
- [Type Inference](basic-syntax/type-inference.md)
|
||||
- [`static` & `const`](basic-syntax/static-and-const.md))
|
||||
- [static & const](basic-syntax/static-and-const.md))
|
||||
- [Scopes and Shadowing](basic-syntax/scopes-shadowing.md)
|
||||
- [Memory Management](memory-management.md)
|
||||
- [Stack vs Heap](memory-management/stack-vs-heap.md)
|
||||
@ -93,23 +93,23 @@
|
||||
|
||||
- [Control Flow](control-flow.md)
|
||||
- [Blocks](control-flow/blocks.md)
|
||||
- [`if` expressions](control-flow/if-expressions.md)
|
||||
- [`if let` expressions](control-flow/if-let-expressions.md)
|
||||
- [`while` expressions](control-flow/while-expressions.md)
|
||||
- [`while let` expressions](control-flow/while-let-expressions.md)
|
||||
- [`for` expressions](control-flow/for-expressions.md)
|
||||
- [`loop` expressions](control-flow/loop-expressions.md)
|
||||
- [`match` expressions](control-flow/match-expressions.md)
|
||||
- [`break` & `continue`](control-flow/break-continue.md)
|
||||
- [if expressions](control-flow/if-expressions.md)
|
||||
- [if let expressions](control-flow/if-let-expressions.md)
|
||||
- [while expressions](control-flow/while-expressions.md)
|
||||
- [while let expressions](control-flow/while-let-expressions.md)
|
||||
- [for expressions](control-flow/for-expressions.md)
|
||||
- [loop expressions](control-flow/loop-expressions.md)
|
||||
- [match expressions](control-flow/match-expressions.md)
|
||||
- [break & continue](control-flow/break-continue.md)
|
||||
- [Standard Library](std.md)
|
||||
- [`String`](std/string.md)
|
||||
- [`Option` and `Result`](std/option-result.md)
|
||||
- [`Vec`](std/vec.md)
|
||||
- [`HashMap`](std/hashmap.md)
|
||||
- [`Box`](std/box.md)
|
||||
- [`Recursive Data Types`](std/box-recursive.md)
|
||||
- [`Niche Optimization`](std/box-niche.md)
|
||||
- [`Rc`](std/rc.md)
|
||||
- [String](std/string.md)
|
||||
- [Option and Result](std/option-result.md)
|
||||
- [Vec](std/vec.md)
|
||||
- [HashMap](std/hashmap.md)
|
||||
- [Box](std/box.md)
|
||||
- [Recursive Data Types](std/box-recursive.md)
|
||||
- [Niche Optimization](std/box-niche.md)
|
||||
- [Rc](std/rc.md)
|
||||
- [Modules](modules.md)
|
||||
- [Visibility](modules/visibility.md)
|
||||
- [Paths](modules/paths.md)
|
||||
@ -128,17 +128,17 @@
|
||||
- [Deriving Traits](traits/deriving-traits.md)
|
||||
- [Default Methods](traits/default-methods.md)
|
||||
- [Important Traits](traits/important-traits.md)
|
||||
- [`Iterator`](traits/iterator.md)
|
||||
- [`FromIterator`](traits/from-iterator.md)
|
||||
- [`From` and `Into`](traits/from-into.md)
|
||||
- [`Read` and `Write`](traits/read-write.md)
|
||||
- [`Add`, `Mul`, ...](traits/operators.md)
|
||||
- [`Drop`](traits/drop.md)
|
||||
- [Iterator](traits/iterator.md)
|
||||
- [FromIterator](traits/from-iterator.md)
|
||||
- [From and Into](traits/from-into.md)
|
||||
- [Read and Write](traits/read-write.md)
|
||||
- [Add, Mul, ...](traits/operators.md)
|
||||
- [Drop](traits/drop.md)
|
||||
- [Generics](generics.md)
|
||||
- [Generic Data Types](generics/data-types.md)
|
||||
- [Generic Methods](generics/methods.md)
|
||||
- [Trait Bounds](generics/trait-bounds.md)
|
||||
- [`impl Trait`](generics/impl-trait.md)
|
||||
- [impl Trait](generics/impl-trait.md)
|
||||
- [Closures](generics/closures.md)
|
||||
- [Monomorphization](generics/monomorphization.md)
|
||||
- [Trait Objects](generics/trait-objects.md)
|
||||
@ -151,7 +151,7 @@
|
||||
- [Panics](error-handling/panics.md)
|
||||
- [Catching Stack Unwinding](error-handling/panic-unwind.md)
|
||||
- [Structured Error Handling](error-handling/result.md)
|
||||
- [Propagating Errors with `?`](error-handling/try-operator.md)
|
||||
- [Propagating Errors with ?](error-handling/try-operator.md)
|
||||
- [Converting Error Types](error-handling/converting-error-types.md)
|
||||
- [Deriving Error Enums](error-handling/deriving-error-enums.md)
|
||||
- [Adding Context to Errors](error-handling/error-contexts.md)
|
||||
@ -182,12 +182,12 @@
|
||||
- [Unbounded Channels](concurrency/channels/unbounded.md)
|
||||
- [Bounded Channels](concurrency/channels/bounded.md)
|
||||
- [Shared State](concurrency/shared_state.md)
|
||||
- [`Arc`](concurrency/shared_state/arc.md)
|
||||
- [`Mutex`](concurrency/shared_state/mutex.md)
|
||||
- [Arc](concurrency/shared_state/arc.md)
|
||||
- [Mutex](concurrency/shared_state/mutex.md)
|
||||
- [Example](concurrency/shared_state/example.md)
|
||||
- [`Send` and `Sync`](concurrency/send-sync.md)
|
||||
- [`Send`](concurrency/send-sync/send.md)
|
||||
- [`Sync`](concurrency/send-sync/sync.md)
|
||||
- [Send and Sync](concurrency/send-sync.md)
|
||||
- [Send](concurrency/send-sync/send.md)
|
||||
- [Sync](concurrency/send-sync/sync.md)
|
||||
- [Examples](concurrency/send-sync/examples.md)
|
||||
- [Exercises](exercises/day-4/morning.md)
|
||||
- [Dining Philosophers](exercises/day-4/dining-philosophers.md)
|
||||
|
Loading…
x
Reference in New Issue
Block a user