MTFirst: From Default to Localized

machine translation of technical documentation

 

Sign up here for our newsletter on globalization and localization matters.

 

Since July 2021, Red Hat has greatly expanded one aspect of our service to our Japanese customers: 100% of new or updated product manuals are now  translated into Japanese, and all within one week of the English publication or revision of a document.  Our aim was to maximize our translation coverage and minimize customers' wait time.

In the past, we  received so many requests to update Japanese documents to match the original English documents (or just simply translate them due to the lack of Japanese versions) that we could not keep up. The Red Hat content team creates or updates 1.5 million words of content per week.

We solved this by implementing  two strategies: MTfirst and Light Post-editing. You can see the results here.

 

 

Coverage

Median wait time for Japanese translation

Before July 2021

Less than 45% of documentation

Over 100 days

Now

100% of documentation

4 days

            

So, what is the MTfirst approach?

MTfirst is a strategy where we translate and publish documents that incorporate machine translation (MT), then have them checked by human translators only after publication. This approach has been made possible by the tremendous advances in Machine Translation (MT) and Machine Learning (ML) in the last few years.

Our old process looked as follows:

  1. MT was incorporated  in the preparation of the document.
  2. Human translators checked the document.
  3. We published the translated document.


Our new process looks as follows:

  1. MT is incorporated  in the preparation of the document.
  2. We publish the translated document.
  3. Human translators check the document.

If you compare the new and old processes, we have only swapped the order of steps 2 and 3. This way, customers do not have to wait for the Japanese translation to be reviewed nor are they forced to use a generic MT service like Google Translate.

Localization Services has been using MT as part of Computer-Assisted Translation (CAT) tools for over two years. In this process, new or changed text in documents is prefilled with MT as a suggestion to translators. In the past, translators would  fully post-edit this hybrid document before publication. Therefore, with this process, the volume of documents we published in Japanese was limited by the capacity of our team to provide a human check to every document. This limit left many documents untranslated: We defaulted to English.

However,  with the new process in place, instead of providing high-quality translation at a slower pace, we aim to produce massive volumes of translation at the “OK” quality levels that MT engines now provide. Machine-translation output is sometimes still awkward, but serious errors are now very rare.  Only a few years ago, the poor quality MT output from English into Japanese made this impossible. We now default to localized.

Benefits of the MTFirst approach

Today, MT is everywhere, even built into browsers. So why don't we just leave content in English for customers to MT themselves?

Our process offers Red Hat customers two crucial advantages:

●      The documents we publish are hybrids that include as much existing human-checked translation as possible. This text usually comes from older versions of the same document, but occasionally comes from other documents in the same product suite (or even from completely unrelated documents in our library). A generic machine translation of a whole page will not incorporate this.

●      We train our MT engine with previous Red Hat documents that have been translated by or checked by our human  translators. This means that the translated document we publish is much more likely to represent Red Hat technical terms and concepts correctly than a page translated by a generic MT.

The revision step

When revising, we decide the priority of  the documents to be revised by human translators based on:

●      Document type

●      Regional product priority

●      Readership


Some document typologies are more likely to have a customer impact than others. We prioritize release notes, installation and migration content, getting started guides, and security guides.

Localization Services has regular meetings with Red Hat leadership in the regions that we service. Based on this information, we prioritize products that are important to regional strategies and trends.

Lastly, we monitor page views of documentation and prioritize revising more-used content over less-used content.

Light post-editing requirements

To post-edit and revise MT output to the same quality level that professional human translators can achieve can easily take as long as to translate it from scratch.

“Light” post-editing (LPE) aims to ensure that the MT included in documents is “good enough” or “fit for purpose”, instead of bringing it in line with what human translators could achieve.

Thanks to improvements in MT quality, and the volume of past translations that Localization Services now has in our Translation Memory database, we can use light post-editing to greatly increase our translation coverage and speed.  Compared to the effort required to bring an MT to full human quality, light post-editing is typically three times faster, and in optimal cases, may be up to eight times faster.

Translation industry association Translation Automation User Society (Taus) compares full post-editing and light post-editing as follows.

Full Post-Editing requirements

Light Post-Editing requirements

●     
Aim for grammatically, syntactically and semantically correct translation.

●      Ensure that key terminology is correctly translated and that untranslated terms belong to the client’s list of “Do Not Translate” terms”.

●      Ensure that no information has been accidentally added or omitted.

●      Edit any offensive, inappropriate or culturally unacceptable content.

●      Use as much of the raw MT output as possible.

●      Basic rules regarding spelling, punctuation and hyphenation apply.

●      Ensure that formatting is correct.

●     
Aim for semantically correct translation.

●      Ensure that no information has been accidentally added or omitted.

●      Edit any offensive, inappropriate or culturally unacceptable content.

●      Use as much of the raw MT output as possible.

●      Basic rules regarding spelling apply.

●      No need to implement corrections that are of a stylistic nature only.

●      No need to restructure sentences solely to improve the natural flow of the text.

 

Changes made at the translators’ side

As our approach has changed, we translators have changed our work style as well:

●      adjusting our style guide to align to standard MT output

●      choosing the right MT engine for specific use-cases

●      enhancing training data for MT engines to further streamline our MTfirst process.

Adjusting our style guide to align to standard MT output means fewer changes to  pre-filled MT entries, therefore reducing our workload and increasing our throughput .

Secondly, we are deciding which MT engines to use for which products. There are many MT engines available, each of which have different pros and cons. #1 MT engine has higher quality output but costs more,  #2 MT engine has less quality but costs nothing, and #3 MT engine has less quality but follows our style guide and so on.

Thirdly, we enhance training data for our MT, by diving into product categories (i.e. platform, middleware, and cloud) and conducting data cleansing.

Effects of the MTfirst Approach

Since our first tentative steps into the world of machine translation, we've learned how to:

●      Monitor and measure what Content Services is doing from week-to-week.

●      Adjust our style of working to rely on "good enough" machine translation, with expert human translators improving and adding value to the most valuable content.

●      Build tools and a repeatable workflow to deliver hundreds of translated documents in just a few hours.

For every hour we invest in translators improving the quality of our Japanese content, we deliver around 30x that amount of published content.

And compared to what we could deliver with human translation, we now deliver 90x that amount.

With the reduced time to translate documents, the Localization Services is exploring more areas for localization. We are currently checking if we could localize these Red Hat Blog sites into Japanese.

Feedback from Customers

The localized documents are very useful for supporting customers. Some of our staff members are not good at English, so they normally check the localized documents to understand the contents. Then, they compare it with the English version to see whether the information in the localized version is the latest and accurate.

”We spend less time comparing Japanese and English documents, now that all of the documents are localized.” (Associate Manager, CCS Technical Support)