From Lab to Market: Can an industry research collaboration fix the post-editing pricing problem?
By: John Moran (Centre for Next Generation Localisation)
14 November 2014
John Moran shares his views on an aspect of post-editing that is both of great interest and of utmost importance for buyers and service providers alike: what is the right and, most importantly, the fairest way of pricing the post-editing effort?
For me, the clearest indication that the pricing model for MT post-editing is immature came from a conversation I had a few weeks ago with a freelance German to English translator at the American Machine Translation Association (AMTA) conference in Vancouver. We had met two years prior at the same conference in San Diego. As he specializes in medical device work and had worked on a project for a translation company I co-own, I was aware of his word rates and average daily throughput.
Over coffee between the sessions I asked him how much post-editing work he had done since we had first met. He seemed embarrassed that his only project had been a three-week post-editing job for his main client, a well-known LSP with several thousand employees. The job had been a nightmare. Not only had the MT been useless, the source material was what we translators often refer to as “treacle.” Almost every sentence contained terms that required research in the translation memory or on the web.
Had the MT not been provided he would have earned about two thirds of his usual daily rate. Unfortunately, he had accepted a 40% discount on his normal rate because the project manager who contacted him told him the MT was going to help him translate the material at least that much faster. In the end his daily earnings on the project were about one third of his average. On an unusually hard job, the MT discount had simply added insult to injury.
We both agreed he should have turned down the work but sometimes in technical translation a project that looks straightforward can turn into an uphill struggle once you begin. At that point you just have to push through to the end as a matter of professional pride. Unsurprisingly, he does not plan to take on any more post-editing for the LSP in question.
A superficial analysis of this situation might be that either the agency or the buyer did well on this deal; the translator earned over two thousand dollars less than usual. But MT can save translators time. 30% to 60% speed gains are not unusual for certain kinds of technical content. It can improve terminological consistency and discounts of 40% for post-editing can, in fact, be fair. If that translator never accepts another post-editing job, then the LSP in question has forfeit hundreds of thousands of dollars of time saved over his working lifetime. Maybe he will not eschew post-editing projects for that client indefinitely. After all, this was a translator who paid thousands of dollars to attend two MT conferences. But what about the translators who are skeptical or opposed to MT? Unfair discounts just confirm their prejudice.
Fast-forward a few days to LocWorld. In a conversation over dinner with an agency owner, I was informed that MT post-editing was becoming a more common request in his 35-person agency. Like most LSPs his business model is to work with freelance translators with in-house project management and quality control. He told me in the language pairs he deals with most, it is not hard to find translators who will post-edit user-generated content or carry out light post-editing to remove egregious translation errors from Google Translate output. However, finding specialized freelance translators willing to post-edit customized MT on technical accounts where the MT should not impact quality was difficult.
For the past four years I have been working to find a solution to this problem. My research was mainly done as part of a Ph.D. in computer science with the Centre for Next Generation Localization ADAPT, a large Irish research center focused on multi-lingual and adaptive computing.
I am also a programmer, so my approach was to adapt an open-source CAT tool called OmegaT to measure my working speed as a translator. The core idea is to be able to do something we call Segment Level A/B testing. Words per hour translation speed are calculated for segments that contain MT proposals (A) and segments that must be translated from scratch (B) so that a post-translation analysis can report on the speed ratio.
The idea is similar to research published by Mirko Plitt and Francois Masselot at Autodesk, makers of AutoCAD. In their study they found that the fastest post-editor also made the most changes to the segments presented to him. He also achieved the best QA scores. This is counter-intuitive for most people. Common sense dictates that it takes longer to edit more. On average this is true but nonetheless it seems some translators are better than others at leveraging MT proposals to improve their working speed. We gathered speed data from about 80 translators who redundantly worked on the same two-day handoffs per language pair. We found the same pattern as Autodesk. Translators vary greatly in terms of how MT impacts their working speed. For readers who do not mind technical detail the studies are described in a paper titled Towards desktop-based CAT tool instrumentation, available online in the AMTA proceedings.
The paper’s take-home message is that to maximize return on investment, vendor managers and project managers need to be able to identify translators who benefit from MT in terms of speed. It does not matter if a translator translates 2000 words per day or 4000 words per day without MT. What matters is the ratio of speed with MT to the speed without. A string comparison of raw versus post-edited MT can provide that information.
I am very grateful to one of our CNGL industrial partners, Welocalize, for making the research possible. Based on an early prototype I showed to Dave Clarke, their lead MT engineer in 2011, I was invited to work in-house to refine the idea. Welocalize already had some large volume post-editing accounts, but they felt that measuring the impact of MT by asking busy translators to fill in timesheets was error prone. The data could not be validated. Breaks to answer e-mail and other distractions are hard to account for. At the end of the day it is hard to know exactly how many hours were worked.
In 2012 I spent nearly a year on site with them in Dublin. Aside from the changes to OmegaT, I also developed utilities and analysis components to output very detailed segment-level post-translation speed reports. The productivity tests had to work in environments where normally Trados was used, so we had to develop workflows to work with files from enterprise-grade Translation Management Systems. Specifically, SDL TMS, SDL WorldServer and, of course, Welocalize’s own free and open-source TMS, GlobalSight. In fact, Welocalize integrated OmegaT into GlobalSight for use as a production CAT tool, which makes it easier to carry out longer productivity tests.
In late 2012 we received a grant to commercialize the software and in March of 2013 I was joined by another software developer who studied computational linguistics with me in the early 90’s - Christian Saam. Together we developed the software into the first iteration of a product we call the iOmegaT Translation Productivity Testbench. The “i” stands for instrumented, but the iOmegaT name will change later when we merge the logging code with the OmegaT project. Though we have already licensed the software to Welocalize and Hewlett Packard, it is not yet openly available. We are still looking for evaluation partners. In 2015 we plan to openly commercialize the software.
If testing can only ever be carried out in OmegaT that would still be useful. iOmegaT productivity tests demonstrated to Welocalize that it was a perfectly good offline CAT tool. With nearly 10,000 downloads per month and a doubling in download numbers every four years, the OmegaT CAT tool is one of the most popular and fastest growing CAT tools in terms of translator takeup. Welocalize sponsored many new features that are very popular with the OmegaT translator community and there is a vibrant support group with several thousand members.
However, translators should be allowed to work in whichever CAT tool they are most comfortable with. Translation is hard enough without wondering where the keyboard shortcut for the concordance function is. Though the analysis software took years to refine, it only took a few days to add the logging functions to OmegaT. The same logging of User Activity Data could be added to other CAT tools. In essence, the problem we are trying to solve is how to measure words per hour productivity so that translation speed can be improved using technology. We want to do it both with the knowledge and blessing of the translator and without getting in their way.
To this end, next year we plan to begin working on a User Activity Data specification and privacy model we call CAT-UAD. CNGL ADAPT has a good deal of experience with format standardization. XLIFF 2.0 and ITS 2.0 standardization efforts were both coordinated by CNGL researchers (Dr. David Filip and Prof. David Lewis respectively). We think this will be useful, as many large translation buyers do not want to dictate to their suppliers which CAT tool to use. In an ideal world, translators, translation buyers, and MT providers would be able to benefit from User Activity Data recorded in any CAT tool.
However, recording translation speed data has a dark side. Measuring working unit speed without permission is illegal in some countries. I was informed recently by a Microsoft employee that Microsoft had to retire an editing time feature in MS Word a number of years ago because it would have required employee permission in Germany. It was unrealistic to assume that this permission would always be forthcoming. This means that sharing speed data beyond the translator’s PC must be an opt-in and opt-out feature for translators on full-time employment contracts in Germany, at least. The legal situation is less clear for freelance translators but it seems only right that their privacy be respected as well.
An example of a speed report that is CAT tool-specific, but nonetheless useful for word price discounts for MT post-editing, is the speed report in the newest version of MemoQ. Importantly, the words per hour data is not available on the MemoQ Server, so translators need not worry that they will be asked for discounts as they become faster on regular accounts. However, our view is that translators will not mind sharing speed data when post-editing, at least intermittently. Ignoring the speed dimension with MT is the translation equivalent of drunk driving. You can get from A to B using the performance enhancing technology, but the results are likely to be messy. If the translator I met at AMTA had been using MemoQ he would have been able to send his client a speed report after one day of work to push back on the post-editing discount.
However, it is not just the translation industry that will benefit from CAT-UAD. Researchers should be able to make use of the data too. With a recent funding injection of €50m, CNGL ADAPT is now one of Ireland’s largest computing research centers and it is one of the largest collections of multi-lingual computing and adaptive systems researchers in the world. CAT-UAD and open-source CAT tools will make it much easier to field test exotic new language technologies like quality estimation scores, interactive MT, speech recognition, example-based MT, adaptive MT, as well as innovations in language-specific MT systems.
Researchers are data-tropic. Like flowers follow the sun, they are attracted to data. The more the better. Small lab experiments on bilingual students only get you so far. MT and other technologies like speech recognition are imperfect anyway, so testing them on working translators is quite possible so long as you can quickly recognize if that test is failing and fall back on a baseline.
The future is bright for language technology and enhanced translator productivity, but it is not all about MT. Open-source CAT tools like OmegaT, Virtaal, and the recently announced web-based tools translate5 and MateCAT are not just scaled down versions of their more expensive equivalents. They are also innovative in their own right. For a select few languages speech recognition software is beginning to become a productive and healthier alternative to solely keyboard-based translation. User Activity Data may help to make the case to extend that set of languages.
Also, translators are becoming savvier at using advanced functions like terminology extraction and auto-complete in CAT tools. To measure the impact of all of these technologies we need unobtrusively gathered in-production translation speed data. In fact, MT provision is becoming an increasingly crowded marketplace with emerging price pressure similar to human translation. As speed reports like those found in MemoQ become a standard feature in CAT tools, it will be possible to evaluate MT systems objectively in terms of translation speed quickly and cheaply. This will be good news for MT companies that can provide a service that enhances translator speed but those who make promises they cannot keep are destined to fail. In that respect MT is no different than human translation.
John Moran has worked as a lecturer in translation at Trinity College Dublin, a translator in his own LSP, and a consultant software engineer for leading global companies like Cap Gemini, Siemens and Telefonica. He is currently writing a PhD in Computer Science on the topic of CAT tool instrumentation.