Site icon R-bloggers

How to Translate a Hugo Blog Post with Babeldown

[This article was first published on rOpenSci - open tools for open science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

As part of our multilingual publishing project, and with funding from the R Consortium, we’ve worked on the R package babeldown for translating Markdown-based content using the DeepL API. In this tech note, we’ll show how you can use babeldown to translate a Hugo blog post!

Motivation

Translating a Markdown blog post from your R console is not only more comfortable (when you’ve already written said blog post in R), but also less frustrating. With babeldown, compared to copy-pasting the content of a blog post into some translation service, the Markdown syntax won’t be broken1, and code chunks won’t be translated. This works, because under the hood, babeldown uses tinkr to produce XML which it then sends to the DeepL API, flagging some tags as not to be translated. It then converts the XML translated by DeepL back into Markdown again.

Now, as you might expect this machine-translated content isn’t perfect yet! You will still need a human or two to review and amend the translation. Why not have the humans translate the post from scratch then? We have observed that editing an automatic translation is faster than translating the whole post, and that it frees up mental space for focusing on implementing translation rules such as gender-neutral phrasing.

Setup

Pre-requisites on the Hugo website

babeldown::deepl_translate_hugo() assumes the Hugo website uses

babeldown could be extended work with other Hugo multilingual setups. If you’d be interested in using babeldown with a different setup, please open an issue in the babeldown repository!

Note that babeldown won’t be able to determine the default language of your website2 so even if your website’s default language is English, babeldown will place an English translation in a file called “.en.md” not “.md”. Hugo will recognize the new file all the same (at least in our setup).

DeepL pre-requisites

First check that your desired source and target languages are supported by the DeepL API! Look up the docs of the source_lang and target_lang API parameters for a full list.

Once you know you’ll be able to take advantage of the DeepL API, you’ll need to create an account for DeepL’s translation service API. Note that even getting a free account requires registering a payment method with them.

R pre-requisites

You’ll need to install babeldown from rOpenSci R-universe:

install.packages('babeldown', repos = c('https://ropensci.r-universe.dev', 'https://cloud.r-project.org'))

Then, in each R session, set your DeepL API key via the environment variable DEEPL_API_KEY. You could store it once and for all with the keyring package and retrieve it in your scripts like so:

Sys.setenv(DEEPL_API_KEY = keyring::key_get("deepl"))

Lastly, the DeepL API URL depends on your API plan. babeldown uses the DeepL free API URL by default. If you use a Pro plan, set the API URL in each R session/script via

Sys.setenv("DEEPL_API_URL" = "https://api.deepl.com")

Translation!

You could run the code below

babeldown::deepl_translate_hugo(
 post_path = <path-to-post>,
 source_lang = "EN",
 target_lang = "ES",
 formality = "less" # that's how we roll here!
)

but we’d recommend a tad more work for your own good.

Translation using a Git/GitHub workflow

If you use version control, having the translation as a diff is very handy!

First: In words and pictures

Again: In code

Now let’s go over this again, but with a coding workflow. Here, we’ll use fs and gert (but you do you!), and we’ll assume your current directory is the root of the website folder, and also the root of the git repository.

fs::file_copy(
 file.path("content", "blog", "2023-10-01-r-universe-interviews", "index.es.md"),
 file.path("content", "blog", "2023-10-01-r-universe-interviews", "index.en.md")
)
gert::git_add(file.path("content", "blog", "2023-10-01-r-universe-interviews", "index.en.md"))
gert::git_commit("Add translation placeholder")
gert::git_push()
gert::git_branch_create("translation-tech-note")
babeldown::deepl_translate_hugo(
 post_path = file.path("content", "blog", "2023-10-01-r-universe-interviews", "index.es.md"),
 force = TRUE,
 yaml_fields = c("title", "description", "tags"),
 source_lang = "ES",
 target_lang = "EN-US"
)

You can also omit the post_path argument if you’re running the code from RStudio IDE and if the open and focused file (the one you see above your console) is the post to be translated.

babeldown::deepl_translate_hugo(
 force = TRUE,
 yaml_fields = c("title", "description", "tags"),
 source_lang = "ES",
 target_lang = "EN-US"
)
gert::git_add(file.path("content", "blog", "2023-10-01-r-universe-interviews", "index.en.md"))
gert::git_commit("Add translation")
gert::git_push()

Summary of branches and PRs

In the end there should be two to three branches:

The PR are merged in this order:

Real example

Yanina tweaked the automatic translation by suggesting changes on the PR, then accepting them.

YAML fields

By default babeldown translates the YAML fields “title” and “description”. If you have text in more of them, use the yaml_fields argument of babeldown::deepl_translate_hugo().

Note that if babeldown translates the title, it updates the slug.

Glossary

Imagine you have a few preferences for some words – something you’ll build up over time.

readr::read_csv(
 system.file("example-es-en.csv", package = "babeldown"),
 show_col_types = FALSE
)

## # A tibble: 2 × 2
## Spanish English
## <chr> <chr>
## 1 paquete package
## 2 repositorio repository

You can record these preferred translations in a glossary in your DeepL account

deepl_upsert_glossary(
 <path-to-csv-file>,
 glossary_name = "rstats-glosario",
 target_lang = "Spanish",
 source_lang = "English"
)

You’d use the exact same code to update the glossary hence the name “upsert” for the function. You need one glossary per source language / target language pair: the English-Spanish glossary can’t be used for Spanish to English for instance.

In your babeldown::deepl_translate_hugo() call you then use the glossary name (here “rstats-glosario”) for the glossary argument.

Formality

deepl_translate_hugo() has a formality argument. Now, the DeepL API only supports this for some languages as explained in the documentation of the formality API parameter:

Sets whether the translated text should lean towards formal or informal language. This feature currently only works for target languages DE (German), FR (French), IT (Italian), ES (Spanish), NL (Dutch), PL (Polish), PT-BR and PT-PT (Portuguese), JA (Japanese), and RU (Russian). (…) Setting this parameter with a target language that does not support formality will fail, unless one of the prefer_… options are used.

Therefore to be sure a translation will work, instead of writing formality = "less" you can write formality = "prefer_less" which will only use formality if available.

Conclusion

In this post we explained how to translate a Hugo blog post using babeldown. Although the gist is to use one call to babeldown::deepl_translate_hugo(),

babeldown has functions for translating Quarto book chapters, any Markdown file, and any Markdown string, with similar arguments and recommended usage, so explore its reference!

We’d be happy to hear about your use cases.


  1. But you should refer to tinkr docs to see what might change in the Markdown syntax style. ↩︎

  2. adding code to handle Hugo’s “bewildering array of possible config locations” and two possible formats (YAML and TOML) is out of scope for babeldown at this point. ↩︎

To leave a comment for the author, please follow the link and comment on their blog: rOpenSci - open tools for open science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version