Consistent Translations Glossary across R

Author
Affiliation

Saranjeet Kaur Bhogal

Imperial College London

Published

2025-03-25

Signatories

Project team

Saranjeet Kaur Bhogal will be the lead developer for this project. She is a Research Software Engineer in the central Research Software Engineering team at Imperial College London and has experience in developing R packages and contributing to the R community. She will be responsible for developing the proposed R package and coordinating with stakeholders, in particular the R Contribution Working Group (RCWG), to ensure the package integrates with their translation process. Saranjeet is a member of the RCWG, where she has co-authored the R Development Guide and has also contributed to translations of R messages to Hindi. She has also contributed to the R community through the R Dev Days.

Contributors

The following people (in alphabetical order) have contributed to the proposal:

Consulted

The following people (in alphabetical order) have been consulted and have provided feedback on the proposal:

The Problem

The problem being addressed is the inconsistency in the translations glossary across languages in R. This proposal aims to address this issue by creating a data package in R package that will help maintain a consistent glossary to improve the quality of translations. This will help in making R more accessible to users who do not speak English and improve the overall user experience of R. Previously this problem has been discussed within the R Contribution Working Group (RCWG).

The proposal

The R Contribution Working Group (RCWG) has been involved in a number of projects to foster a wider, more diverse, community of contributors to base R. The group facilitates active contribution by the community, especially through the R Dev Days. This proposal is specifically related to efforts to support contributions to the translations of messages in base R. The aim is to improve the technical infrastructure of the translations process by providing a common glossary that can be used by all translation projects in the R ecosystem, sharing the translations across projects to avoid duplicated work.

Overview

The goal of this project is to improve the technical infrastructure of the translations process by providing a common, shared glossary across different translation projects in the R ecosystem. It will help in improving the quality of translations and make it easier for users to contribute to the translations process. This will help in making R more accessible to users who do not speak English and improve the overall user experience of R.

At present, different R projects use different translation workflows:

  • base R and the recommended packages - use Weblate, using a common glossary across base R and recommended packages.
  • rOpenSci - uses a GitHub-based workflow. They also have a work in progress glossary.
  • Bioconductor - uses Crowdin and is most likely not using any glossaries.

One-off projects, such as translating an R book, could use one of the above or yet another workflow, as the translators prefer.

Hence, we want to store the glossary and translations in an R package, provide a process for proposing changes outside of Weblate, and provide tools for syncing with Weblate (possibly also Crowdin), potentially benefiting base R and recommended packages, rOpenSci, Bioconductor, and others working on internationalization within the R community.

Detail

The R Contributors, as a part of R Dev Days, have been compiling a combined list of terms from the language-specific glossaries on the R Project Weblate server and flagged 175 terms as “terminology” in the English glossary. This flag ensures this common set of terms appear in all the language-specific glossaries. Most languages are using essentially these terms (most have a glossary of 178 terms, including a few that are still in the glossary without the “terminology” flag). The Translation Team leads can also add further terms to the terminology set.

This project aims for the following:

  • Creation of common glossary, starting by combining Weblate glossary terminology terms and rOpenSci WIP glossary.
    • Columns to include: date added, source string, translations for different languages, optional explanation for different languages (e.g. can explain that a particular word should be kept in English, or note synonyms), explanation in English (can be used to explain the meaning of a term), added via (Weblate/GitHub), updated via (Weblate/GitHub), Weblate (include/exclude/NA),
  • Documented process for people to suggest additions to and deletions from the common glossary.
    • Preferred: pull request on the underlying CSV so that it is easy to update the common glossary once proposed changes are reviewed
    • Alternative: proposal via GitHub issue, perhaps using template to ensure essential information is given
  • Tools to update common glossary with updates from Weblate
    • Get current glossary and translations from Weblate.
    • Update common glossary with new terminology and their translations (set added_via column to Weblate and weblate to include)
    • Update common glossary with new/updated translations of existing terms (set update_via column to Weblate)
    • Update common glossary with deleted terms. These terms should not be removed, but the weblate column should be set to exclude)
    • GitHub Action to regularly update common glossary based on updates to Weblate
  • Tools to update Weblate with changes in common glossary
    • Get current glossary and translations from Weblate. Update Weblate glossary with terms added via GitHub that are flagged to be included. (Should be added to English glossary with flag “terminology”.)
    • Update Weblate glossary with new/updated translations of terminology terms.
    • GitHub action to regularly update Weblate based on changes to common glossary. Need to ask Gergely if this is possible - may need to be semi-manual process.
  • Tools for CrowdIn
    • Could be similar to those for Weblate, but currently no glossary to test on. Could perhaps start one on an active project.
  • Maintenance tools (for tasks requiring manual review, that may need doing from time to time).
    • Identify strings in common glossary that have been removed from Weblate (added_via is Weblate but weblate is exclude). Select to keep (change added_via to GtiHub) or delete (remove term from common glossary).
    • Identify strings in common glossary that are not on Weblate (added_via is GitHub and weblate is NA). Select to add (set weblate to include) or ignore (set weblate to exclude).
    • Identify strings in the Weblate glossary that are not flagged as “terminology”. Select to add terminology flag or remove.
    • Identify strings in the Weblate glossary that are not flagged as “terminology”. Remove from the Weblate glossary for all languages.
    • Identify strings in that are in language-specific glossaries but not in the English glossary. Select to add terminology flag or remove (language leads can propose new terms, but should end up using for all languages or none).

Design principles: - automate sync as much as possible, but allow control over what is included in each glossary

Minimum Viable Product

A data package on R-universe that contains the common glossary and translations. The package will have a function that will allow users to update the glossary. The package will also have a function that will allow users to sync the glossary with Weblate.

Architecture

The package will use the Weblate API. The package will have a data frame that will store the glossary. The package will also have a function that will allow users to update the glossary.

Assumptions

The assumptions for this project are:

The Weblate API will be available and accessible.

Project plan

Start-up phase

The project will be setup as a GitHub repository and the various steps for the development will be opened as issues on GitHub. Development will take place and be tracked by creating pull requests associated to the issues. Appropriate license will be chosen for the package. The contributors to the project will be acknowledged using the all contributors bot on GitHub. The work on the project will be regularly reported to the community through the R Contribution Working Group meetings and occasionally through the R Contributor Office Hours.

Technical delivery

The work on the package development will begin around July 2025 until October 2025. The development work will be associated with milestones and deliverables which will be tracked through GitHub. A final blog post will be published to share the work done along with social media announcements. If possible, the work will also be publicised at relevant events, including R-Ladies+ and RUG meetups.

Other aspects

The project will be promoted through the RCWG to ensure that the R community is aware of the project. Regular updates will be provided to the R community on the progress of the project. Feedback will be sought from the community to ensure that the package meets the needs of the users. The updates will be provided through regular blog posts and announcements on social media platforms. The project will also be discussed with local R-ladies and R User Groups.

Requirements

The project requires the following to make it happen:

  1. People: The project requires a lead developer who has experience in developing R packages and has contributed to the R community. The lead developer will be responsible for developing the R package and coordinating with the RCWG.

  2. Processes: The project will follow the best practices for R package development and will seek feedback from the R community to ensure that the package meets the needs of the users.

  3. Tools & Tech: The project will use R, devtools, testthat, roxygen2, pkgdown, GitHub, and the Weblate API to develop the package. The package will eventually be released on CRAN so that it is easily accessible to users.

  4. Funding: The project requires funding to support the development of the R package. The funding will be used to cover the costs associated with the development, testing, documentation, and release of the package by the lead developer.

The project will require coordination between the lead developer, the RCWG, and the R community to ensure that the package meets the needs of the users and is easy to use.

People

Saranjeet Kaur Bhogal, who is a Research Software Engineer at Imperial College London, will be the lead developer for this project. She has experience in developing R packages and has contributed to the R community. She will be responsible for developing the R package and coordinating with the RCWG to ensure that the glossary is consistent across languages. Saranjeet is a member of the RCWG, where she has co-authored the R Development Guide and has also contributed to translations of R message to Hindi.

The RCWG will be involved in this project to provide guidance and support. The RCWG is a group of volunteers who are actively involved in improving the R ecosystem and have experience in translation projects.

Feedback on the work will be sought from the R community at large, especially from users who are involved in the translations process. This will help in ensuring that the package meets the needs of the community and is easy to use.

Processes

The project will follow the following processes:

  1. Development: The R package will be developed using the devtools package. The package will be hosted on GitHub and will follow the best practices for R package development.

  2. Testing: The package will be tested using the testthat package. Unit tests will be written to ensure that the package functions as expected.

  3. Documentation: The package will be documented using roxygen2 and pkgdown. The documentation will be available on the package website.

  4. Translation: Feedback will be sought from the R community to ensure that the glossary is consistent across languages.

  5. Release: The plan is to eventually release the package on CRAN so that it is easily accessible to users.

  6. Community engagement: Regular updates will be provided to the R community on the progress of the project through the R Contributor Office Hours. Feedback will be sought from the community to ensure that the package meets the needs of the users.

  7. Handover: The package will be handed over to the R community at large so that they can contribute to its development and maintenance.

  8. Code of conduct: A code of conduct will be put in place to ensure that the project is a safe and welcoming space for all contributors.

  9. Governance: The project will be governed by the RCWG to ensure that it aligns with the goals of the R community.

  10. Sustainability: The project will be designed to be sustainable in the long term.

  11. Feedback: Feedback will be sought from the R community to ensure that the package meets the needs of the users.

Tools & Tech

The project will use the following tools and technologies:

  1. R: The project will be developed using the R programming language.

  2. devtools: The package will be developed using the devtools package.

  3. testthat: The package will be tested using the testthat package.

  4. roxygen2: The package will be documented using the roxygen2 package.

  5. pkgdown: The documentation will be available on the package website.

  6. GitHub: The package will be hosted on GitHub.

  7. Weblate API: The package will use the Weblate API.

  8. CRAN: The plan is to eventually release the package on CRAN.

Most of the tools stated above are open-source and widely used in the R community. The Weblate API is used by the RCWG to track and translate messages in R.

Funding

The project seeks funding from the R Consortium Infrastructure Steering Committee (ISC) to support the development of the R package. The funding will be used to cover the costs associated with the development, testing, documentation, and release of the package by the lead developer.

The total funding expected for the project is $8,000. The breakdown of the costs is as follows:

Summary

The project requires funding to support the development of a data package in R and streamlining the translation process for the R community. The lead developer will be responsible for developing the R package and coordinating with the RCWG to ensure that the glossary is consistent across languages. The project will follow best practices for R package development and will seek feedback from the R community to ensure that the package meets the needs of the users. Thus improving the translations infrastructure in R and making it easier for users to contribute to the translations process. The funding will be used to cover the costs associated with the development, testing, documentation, and release of the package by the lead developer.

Success

The project will be considered a success if the following criteria are met:

  • The key deliverables are met: the package includes the planned functionality, which is documented and tested.
  • The documentation is available on the package website.
  • At least 75% of the open issues on the planned work are closed or at least become work-in-progress.
  • Feedback is sought (and incorporated as far as possible) from the R community.
  • At least two community members that are not directly involved in the project contribute to the project, by submitting issues, making a pull request or making a commit to the git repository.

Definition of done

The project will be considered done when the following criteria are met:

  • The R package is developed and tested.

  • The package is documented and available on the package website.

  • Feedback will be sought from the R community to ensure that the package meets the needs of the users.

Measuring success

The success of the project will be measured by the following metrics:

  • At least 75% of the open issues on the planned work are closed or at least become work-in-progress.

  • At least 85% of the pull requests are reviewed and merged.

  • The package is documented and available on the package website.

Future work

The project can be extended in the following ways:

  1. Improve the translation process by adding more languages to the glossary.

  2. Add more features to the package to make it more useful for users.

  3. Improve the documentation and make it more accessible to users.

Key risks

The are not any major risks associated with this project. The idea for the project is based on an actual need by the R community and has the support of the RCWG. The lead developer has experience in developing R packages and has contributed to the R community. The project will follow the best practices for R package development and will seek feedback from the R community at appropriate stages.