MT and the Changing Face of Documentation
By: Ray Flournoy (Etsy)
06 March 2015
One of the first applications Ray Flournoy looked at when he entered the world of machine translation in 1999 was the use of MT on commercial documentation. Now at Etsy in 2015, he has seen the role of MT in documentation change before his eyes.
When I first entered the world of commercialized machine translation back in 1999, I was at a small start-up trying to figure out the use cases that would actually make someone willing to pay for MT. One of the first applications that we looked at was the use of MT on commercial documentation. The central question for this use case was (and continues to be) whether post-editing the MT output is faster than translating from scratch. If not, then there is no efficiency gain with MT, and thus no commercial justification for the technology. But if there are consistent gains in speed, then those savings in manpower can translate into a more cost-effective translation process.
At the time, MT technology was at a state where the baseline quality even for language pairs such as English and French was fairly low. As a result, the studies that showed up in the academic literature seemed to be fairly split over whether post-editing MT was faster than translation from scratch. My impression was that even when a study showed a speed-up, it was for a highly constrained case. If the inputs were perfectly grammatical, short, and within an extremely narrow domain, then maybe, just maybe, post-editing was more efficient.
In 2008 I returned to the question of post-editing when I joined the Adobe Globalization Group. By this time the statistical revolution had happened, and suddenly the baseline quality of MT had jumped significantly. Furthermore, statistical training opened up new ways to customize engines beyond simple dictionary entries.
This was a perfect backdrop for exploring MT post-editing at Adobe. Within Adobe, the use of Translation Memories was established and centralized across all products and for a large number of languages, with regular cleanup of the databases and a heavy emphasis on terminology consistency. This meant that there was a large volume of clean, high-quality training data that corresponded closely to the new text that was being translated with every update of the documentation.
And most importantly, the volume of documentation produced was prodigious, meaning that there was a tremendous need for future document translation, so any efficiency gains would have their relative value multiplied.
During my time at Adobe, however, the software business continued an evolution that had started many years earlier. Already, hefty user manuals had stopped being included with boxed software, replaced with a reliance on manuals available online. Then the software itself stopped being sold in shrink-wrapped boxes; it too was delivered digitally.
Users were no longer reading or wanting large amounts of documentation. Knowledge had reduced to more granular chunks that were more easily found via web search on specific questions. Furthermore, attention turned to forums where knowledgeable users were stepping up to answer the questions of other users.
User-generated discussion is much harder to customize for with machine translation. Additionally, the lower volumes of documentation mean that there are fewer economies of scale or other efficiencies to realize. While Adobe was still going strong with MT post-editing for many products and many languages, the tides were definitely turning against being able to do effective engine customization and justifying the efforts with cost savings.
My current company, Etsy, is an e-commerce platform, and thus is in a very different commercial space from Adobe. But it represents a further step in the evolution in companies and how they communicate with customers. At Etsy, the amount of documentation that is produced in-house is small, but the forums and other user-to-user communication avenues are rich with information. As a result, we have little useful data to use for engine customization, and the amount of professional translations that we do is significantly smaller than a traditional software company.
This has presented some interesting questions as we consider how to implement machine translation. With little data that matches our translation targets, how can we perform useful customization of engines? And with smaller budgets for human translation, can the cost savings from MT post-editing justify the overhead of the entire technology?
And perhaps most crucially, can machine translation technology be applied effectively to the current unstructured mode of communication with our users? Can the untamed world of user-generated content, both questions and answers, work with MT to provide a better experience for users? Early adopters like Intel and Adobe, and now Etsy too, are betting yes, but it will require fresh thinking about the integration and customization of the technology.
Ray Flournoy is Staff Product Manager at Etsy, leading the Localization and Translation Group. Prior to Etsy he led the machine translation effort at Adobe and served as product owner for Yahoo! Babel Fish.