Effective Post-Editing in Human and Machine Translation Workflows
By Stephen Doherty & Federico Gaspari, QT LaunchPad Project
As a relatively new skill in the industry, post-editing is becoming a widespread activity all over the world. While clear advantages in industry applications in terms of productivity have been demonstrated, the absence of best practice and the lack of training materials and accessible resources often mean that we approach post-editing by trial and error or not at all due to unknown return-on-investment and potential risk. ***
Post-editing is the human intervention of fixing or tidying up raw machine translation (MT) output to the desired quality level. The extent of post-editing required will vary from one job to another. However, the key skill is to be able to assess the amount of editing required so as to avoid spending valuable time and resources on fixing errors, while a human translation from scratch would be faster.
Post-editing is often used in tandem with other techniques to maximize the effectiveness of MT. Successful approaches have typically carried out some form of linguistic pre-processing of the input before it goes into the MT system. This often takes the form of concise styles guides, comprehensive controlled language/authoring, and adherence to a set of permissible glossary or in-house rules.
Alongside system customization for domains and text types, both pre-processing and post-editing activities can be linked to further optimize the MT processes. Often referred to as human-in-the-loop workflows, these approaches capture the post-editing carried out by humans in order to automate as much of the process as possible, by augmenting the MT system with the post-edited translations or by adding an automated post-editing module to the system, so that the repetitive errors already corrected by the post-editor do not reoccur. Such post-editing modules are often simple regular expressions such as find and replace routines, and can extend into more complex natural language processing functions using rule-based and statistical approaches.
Productivity gains (e.g. time and throughput) depend upon: the experience of the post-editor with the post-editing task itself; their expertise in the domain; familiarity with the language pair; and knowledge of the specific MT software being used. In addition to such gains, post-editing is helpful in translating texts that would otherwise remain inaccessible to users outside of that language.
While the aim of post-editing is to improve MT output to the desired quality level, this may not always mean that a perfect translation is required. In some cases, usable and comprehensible translations may be fitting, especially for more perishable content or internal communications, rather than the high quality that may be expected for external dissemination. In this sense, the priority should be seen as saving time and effort, i.e. not to lose time and money.
Depending on the client’s needs and the intended usage of the translation, there are three main options:
- no post-editing (e.g. internal circulation, gisting);
- minimum post-editing (e.g. internal circulation of more sensitive nature, on-the-fly information and instructions);
- full post-editing (e.g. external publications, inter-organization communications).
Critical skills for effective post-editing:
- Excellent word-processing and editing skills;
- Ability to work and make corrections directly on-screen;
- Strong knowledge of general MT and its shortcomings;
- Specific knowledge of the approach of the MT systems in use, especially their weaknesses and the reason for producing the encountered errors;
- Knowledge of the source and target language and audience;
- Ability to make quick decisions and ‘fire fight’ errors as needed;
- Ability to balance post-editing speed and effort with respect to the required quality level;
- Knowledge of CAT environments and of the differences between post-editing MT output as opposed to editing the past human translations present in fuzzy matches.
Considerations should be made with respect to the MT system in use. More generally, we should be aware that MT systems do not have any real-world knowledge or contextual awareness. Errors from MT are possible at any level: lexical, grammatical, syntactic, etc. MT errors may also be extra-linguistic: while MT systems are less likely than human translators to make factual mistakes such as errors in sequences of numbers or measures, they can produce rigid and garbled output as well as relatively subtle errors that are difficult to detect and correct, e.g. statistical MT systems sometimes omit negations.
Given the nature of post-editing, an error-based approach is typically taken in order to: evaluate the output to see error types, focus on specific error types, refine the MT system or inform linguistic pre-processing, avoid repetitive errors (and time and frustration!). However, it is often the case that the post-editor has little or no control of the MT process and pre-processing, so errors can be repeated and even propagated. Thus the onus of input and process quality is often shared, unknown, or impossible to consider.
In terms of training, we argue that post-editing provides a variety of unique and transferable skills that are valuable to many professionals in the translation and localization industry. These transferable skills are directly applicable to other aspects of translation and language tasks, namely, editing, and pre-processing or adhering to style guides and glossaries. These go hand-in-hand with the improvement of general knowledge of computer-aided translation, direct experience with MT systems, and awareness of what they do well, what they cannot do, and where the human translator can add value to the MT process.
In closing, we note several recent projects and tools that aim to advance the theory and practice of post-editing:
- PET (Post Editing Tool – available from is a stand-alone, open-source tool that allows post-editing and assessment of machine and human translations, while gathering detailed statistics about post-editing time and effort;
- MateCAT is a web-based CAT tool that uses MT, machine learning, and quality estimation techniques and provides an environment where post-editing activities carried out with the tool can be learned from;
- The Accept project aims to identify and incorporate post-editing strategies into a unified pre- and post-processing MT environment;
- Other popular commercial options are also available, including Microsoft Translator Hub, SmartMATE and KantanMT.
Lastly, the EU-funded QTLaunchPad project has developed a set of freely available Multidimensional Quality Metrics (MQM) that can be used for assessing translation quality and post-editing effort. MQM has a free, open, and flexible platform that supports requirements specification, bidding, translation quality assessment/assurance, and other business processes in one uniform model, as well as in-line mark-up with issue resolution and auditing trails. Standardization for it has been built upon existing ISO specifications and the popular models described above, and does not exclude openness and compatibility with these existing models which allows existing workflows to be kept while still taking advantage of MQM’s features and extensions.
Stephen Doherty is a lecturer and researcher at the University of New South Wales, where he teaches specialized translation and translation technology. His research examines cognitive and affective human-computer interactions in translation and language processing, including translation technology, quality assessment, and the usability of translation tools and machine translation.
Federico Gaspari has a background in translation studies and holds a PhD from the University of Manchester. He has 15 years' experience as a researcher and lecturer in translation technology, specialized translation and corpus linguistics, previously at the Universities of Manchester and Salford in the UK, and currently at the Universities of Bologna (Forlì campus) and Macerata in Italy. In addition, he holds a postdoctoral research position at Dublin City University, where he works on European projects related to translation technology and translation quality evaluation.