E.g., 11/18/2019
E.g., 11/18/2019

Post-editing 2.0: Dynamic Quality Targets in the Post-editing Supply Chain

Welocalize


Monday, 24 November, 2014

Lena explores the concept of “dynamic quality” and ways it ties in with various levels of post-editing. Drawing on her years in the field, she looks at different types of translation tasks and possible quality evaluation (QE) scenarios, which in turn help define the necessary level of post-editing.


In recent years there has been a renewed interest in translation quality and how to measure it. Stakeholders involved in the translation and localization process have been asking for a more “dynamic” understanding of quality and for metrics that reflect this.

This interest in a more dynamic, flexible, multidimensional understanding of quality is motivated by several factors. One of them would certainly be the oft-quoted “content explosion” with increasing amounts of text published on the internet plus entirely new content types and authors. Companies increasingly recognize the value of these newer content types (like user-generated content) as powerful tools for increasing their reach, “going local” while growing globally, and entering new markets. At the same time, more traditional text types have continued to grow, while becoming more diversified as they do so. We see text types becoming more fluid (e.g. highly technical eLearning courses with voice over, technical documentation with embedded links to videos) and adapting to different formats (e.g. website, smartphone, tablet).

Another key factor is the coming-of-age of newer approaches to translation, primarily machine translation, post-editing, and crowd translation. They are going mainstream and offering companies new, more affordable options for translating a never-ending stream of content. And needless to say, cost and time constraints also play a crucial role in driving exploration of new translation and quality models.

In this article, I will look at this “dynamic” concept of translation quality, and how it comes into play in different post-editing scenarios. Based on our experience at Welocalize, concepts such as “light” and “full” post-editing seem too vague and abstract to be useful for professional translators, and can initially confuse them. It always comes down to the question, “What is the final quality expected, and which errors will be penalized and how?” We must therefore first look at different types of translation assignments and possible quality evaluation (QE) scenarios which will help define the post-editing task.

What do we mean by “dynamic” quality?

Reintroducing translation purpose

After wide acceptance of quality standards such as LISA’s, which at the time offered more objective models and toolkits for the assessment of translation quality, the industry seems to now be ready to take the next step towards more adaptable and flexible quality models and metrics (while remaining as objective as possible). 

Such quality models are centered on the purpose or impact of a given translation assignment, adding factors such as perishability, business value or ROI, time, and cost constraints to the equation, alongside more commonplace questions such as end-user, format etc. These are important factors that should be considered before every decision is made on the most appropriate translation strategy, whether to evaluate quality, and if so, which metrics to apply.

Time, Cost AND Quality

A frequent concern around more dynamic, flexible, multidimensional quality models is that it might imply that “bad quality is now OK.” However, this is clearly not the idea: the objective is rather to align the quality expectations to the purpose, expected impact, and business value of the translation assignment.

Quality concerns continue to be raised as soon as machine translation and post-editing are introduced. Two aspects need to be considered in this regard: first, an increasing number of comparative analyses (presented at MT conferences as well as based on our own data) confirm that quality does not suffer from switching to machine translation and post-editing per se. (The key lies in setting out expectations clearly and sharing them with the (qualified) post-editors.) Second, translation requests involving machine translation tend to be motived by cost and time pressures – whether raw MT or “fully” post-edited.

As mentioned earlier, this drives the demand for aligned quality levels and evaluation methods. In other words, the quality does not suffer because machine translation was injected into the process, but quality expectations may have become more relaxed (why run a full-blown quality review cycle, penalizing spelling and punctuation, when I requested 6000 words of “information only” database FAQs by tomorrow?) and these expectations need to be communicated in practical detail to the supply chain.

But what does all this mean in practical terms?

The Translation Buyer

For a company looking to invest in the translation of their content, the overarching question would be: “Why should this content be translated?” and “What is the return I expect to see from this investment?” This goes hand-in-hand with an analysis of the content itself:

  • What is its main purpose? (Sell my products, facilitate communication inside the company, provide support to customers, documentation for legal purposes, provide a chat platform to internal staff, provide our main international branches with a gist of latest proposals…)
  • Who authored it? (Technical staff, creative writers, non-native speakers, a third-party publisher with professional technical writers…)
  • Who uses it and in what format? (Website, printed collaterals, a popular app currently only available with certain technologies, optional help feature embedded in the software…)
  • What is its life expectation? (Archived for legal purposes, relevant for the duration of a marketing campaign, “old by the time it is written”, critical information for the use of all related products on the market…)
  • How much of it is there and does all of it warrant translation? (Do all the different materials fulfill the same purpose and criteria?)

And of course, the available budget and any time constraints must be factored in. Once this is clearer, it becomes easier to look at possible translation approaches.

Here is an example of a decision-making grid:

 

Content Characteristics 

Translation purpose

Translation approach

Quality expectations

A

Website content authored by creative staff; medium volumes but ongoing; and visible

Translation purpose Of high strategic importance; Increase reach, win new markets; "glocalization"

Transcreation professionals or professional translations with expertise in marketing

Idiomatic feel; marketing appeal; Linguistic accuracy (spelling, grammar, punctuation);

B

eLearning course for in-house staff on a major new energy saving component in company's product line; authored by company's technical writers; big, one-of drop with half-yearly updates

Provide training to existing and future internal technical staff in all local branches

Machine translation with post-editing by professionals with expertise in the domain

Technical accuracy; didactic focus; consistent terminology and phrasing; functionality (embedded links, titles quizzes, etc.); stylistically simple; linguistica accuracy insofar as it impacts accuracy

C

Online Hotel reviews; authored by a variety of users; high volumes, constant stream of new content, highly perishable

Bigger presence on international markets; fast throughputs of high volumes

Machine translation with crowd post-editing or crowd human translation; resources fluent in target language and solid knowledge of source language; basic research skills

Maintain the "many voices" of the original user-generated content; transfer idioms / colloquialisms / acronyms appropriately

 

The Quality Auditor

Once the translation buyer is satisfied with the assessment of translation needs and the most suitable approach has been decided upon, they must agree on what form of quality evaluation (QE) – if any – is needed. This step is critical when translating vague ideas of quality into measurable quality metrics. It is also valuable as detailed instructions for translators, post-editors, and crowd resources, and in defining realistic throughputs and discounts.

Just as importantly, defining the model and metrics together with the quality auditors on the translation buyer’s side (as well as any additional QE layers) ensures that they are on board and in agreement. A frequent scenario is a translation buyer who understands the need for an adjusted quality model based on their request. The buyer might, for instance, request “light post-editing” from the translation service provider with corresponding discount and turnaround expectations. However, the quality auditors might insist on stringent QE models already in place and penalize translations that are perfectly in line with “light post-editing” definitions. If this gap is not addressed, the delivered quality is likely to be misaligned; it will result in a confused and frustrated supply chain and many hours spent on quality review arbitration.

There are several proposals that provide guidance and toolkits for setting up more flexible QE processes. The TAUS DQF Framework or QTLaunchpad’s Multidimensional Quality Metrics are examples. These, as well as older models, can be adapted to allow for more purpose-targeted QE.

Using the examples above, an aligned QE process could look as follows:

 

Translation approach

Quality expectations

QE model

A

Transcreation professionals or professional translations with expertise in marketing

Linguistic accuracy (spelling, grammar, punctuation); idiomatic feel; marketing appeal

Focus on brand appeal, “idiomacy”

 

QE scoring process with penalties for

 

mistranslation, language errors (counted as major);

B

Machine translation with post-editing by professionals with expertise in the domain

Technical accuracy; didactic focus; consistent terminology and phrasing; functionality (embedded links, titles, quizzes, etc.);

 

Stylistically simple; linguistic accuracy insofar as it impacts accuracy

Focus on utility:

 

Functionality checks (DTP);

 

QE scoring process with penalties for inaccurate translation, terminology & consistency errors, language errors (counted as minor)

C

Machine translation with crowd post-editing; resources fluent in target language and solid knowledge of source; familiar with “UGC speak”, basic research skills

Natural sounding, maintain the “many voices” of the original user-generated content; transfer idioms, colloquialisms, and acronyms appropriately

No formal process, no language checks.

 

Community feedback or measured by success rate / clicks on the website

(*Note: QE is considered as separate from any review steps deemed necessary on the translation provider side, given the experience of their resource, volume, and timelines of the project. For the purpose of this article, such review steps are at the discretion of the translation provider, the “translation” refers to the delivered product.)

Example C, on user-generated content in the above table, is a newer area experiencing increasing interest and growth of volumes. It is a vast field with a huge variety of content types, and purpose as well as quality differences at source. Even perishability levels differ, as some technical user forums might be very useful to refer to for a long period, whereas most social media chat forums tend to have short-term relevance.

Content volumes generated by users are phenomenal and have been discovered by companies to be key assets to their global marketing strategies, and important in engaging with their customers directly. Translating this content is a new challenge for machine translation, post-editing, translation, and QE models. Quality evaluation in this context is entirely different, as it requires a distinct separation from formal QE methods and more interactive platforms. Quality can, for example, be measured by “likes” or “dislikes” from fellow users.

The Post-editor

As mentioned in the introduction, two terms frequently used to express different levels of final quality in the context of machine translation are “light” and “full” post-editing. Light post-editing is generally understood to deliver understandable quality, while full post-editing delivers publishable quality “at the same level as human translation.”

However, these terms can be problematic in practical terms. While the underlying idea and concept are clear enough, it is not so simple to apply them in an actual translation scenario. Post-editors – and possibly also quality auditors themselves – often remain unsure of what exactly is required from them.

In the context of full post-editing, we observe that seemingly simple instructions like “use as much of the raw MT output as possible” can be the cause of endless debate and even frustrations, as post-editors consider it to be in direct contradiction with stringent 60-page Style Guides and standard QE processes (penalizing non-conformity with said Style Guide).  This type of instruction can also lead to under-editing and final products that do not meet the quality expectations.

Light post-editing usually causes even more uncertainty, also because there is still less experience and anecdotal feedback tells us that professional translators can find it harder to decide what to accept and not edit, than to just edit to the Style Guide quality they are familiar with.

Based on our experience, the approach that tends to render the best results is to stay away from theoretical concepts (though very useful for post-editor training) in the actual translation brief and instead establish clearly what needs to be edited and what final quality is expected and what kind of issues will or will not be penalized, based closely on the agreed-upon QE process with the client and quality auditors.

In other words, the brief should contain specifications like:

  • The translation purpose is utility, branding, gisting, fulfilling legal requirements
  • Final quality needs to be stylistically of marketing level, show creativity, be upbeat and fun, be tailored very specifically to the target culture and to a specific user group, simple and consistent, instructive, stay close to the source
  • There is specific terminology provided by the client that must be referenced
  • Technical terminology must be accurate and in line with industry standards, errors will be penalized
  • General terminology (not client product-specific) has to be in line with specific references
  • Measurements need to be adjusted to the locale specific format
  • Software options need to be translated in line with a specific reference
  • Software options need to be surrounded by brackets, quotation marks
  • Hyperlinks need to be replaced with the correct link to the localized website for the respective language
  • Language errors are a major, minor, no concern and will be penalized (major, minor etc.)
  • Punctuation needs to comply with target language standard / needs to be kept as per source for specific code reasons
  • Spelling variations are acceptable
  • And more

The list of items that could be covered will depend on the particular translation assignment, and additional items can be added in the form of post-editing instructions as the program matures. But the salient idea is that once variables like the level of expected quality and how much work is required from the post-editor to meet this have been established, it will be easier and fairer to assess whether these expectations are in line with turnaround and discount expectations.

Lena Marg has been working in the localization industry since 2005, after graduating in translation and conference interpreting. Her focus in the localization industry has always been on machine translation, from several years of hands-on post-editing, to system customisation and output evaluation as well as supply chain on-boarding. She currently holds the position of Senior Training Manager in the Language Tools team at Welocalize.

randomness