Corporate Needs, Requirements, and Best Practices for Machine Translation: An Overview
By: Bob Kuhns (Independent consultant)
31 January 2015
Growing localization needs are driving machine translation, but what are these needs? Once needs are realized, what are the requirements and best practices for successful MT deployment? Answering these questions solves only part of the problem—executing solutions is where the value is.
Machine translation (MT) has evolved from research institutions to commercial settings, allowing companies to translate large volumes of texts quickly for a wide variety of language pairs. Fairly recently, with the introduction of Google Translate, everyone with internet access can now translate texts. Yet as MT goes mainstream, some companies have incorporated MT, while others remain reluctant to do so. To determine whether their reluctance is justified, companies should reassess their current and future translation needs and determine whether automation would create a positive impact on their localization processes.
Note that although the discussion is from a client’s perspective, much of what is discussed is applicable to LSPs interested in investing in MT.
The Current Translation Model
In the current client-LSP translation model (admittedly, oversimplified), the client produces content and hands it off to the LSP for translation. Once the material is translated and reviewed, the translated content is returned to the client for internal review and acceptance. While the LSP may use technologies such as translation memories (TMs), term banks, specialized editing software, and in some cases, MT with post editing, the core translation effort relies on humans.
Expectations of clients are straightforward. They want accurate translations and quick translation turnaround while attempting to minimize costs. Expectations may strain the relationship, especially if the client’s demands are unreasonable, such as too large a volume to be translated in too short a period of time. Nevertheless, this model has been successful and is used by many companies. Because the needs of clients do change, companies could deviate from the standard model to one that includes MT.
Growing Needs of Clients
As companies evolve, their needs change from product development, sales and marketing, customer service, and documentation, to localization. In terms of translation, certain factors may trigger a re-thinking of the standard client-LSP model and act as a catalyst for introducing MT to the localization process. These factors include:
Reduced costs – Cost reduction estimates by MT vendors vary based on hardware, post editing or not, and percent of leverage against TMs. However, cost reductions of between 10 – 30% are typical and can be much higher under certain conditions such as additional training of the MT engine. So, it is wise that companies perform their own cost analyses based on their own data and translation environments to determine their own benefits of MT.
Reduced time to market – Using this oft-cited statistic, a human translates about 2,500 words per day compared with 2,500+ words per minute depending on the MT configuration. Faster translations allow for simultaneous shipments of products to worldwide markets rather than a sequential rollout across the globe.
Ever expanding volume of content– Companies produce content constantly such as updated web sites and new product documentation. As a result, translation needs grow. The added volume can challenge LSPs trying to respond to translation demands from many clients simultaneously. Consequently, clients might be forced to delay translations.
Large amounts of untranslated content – In some situations, companies leave certain content untranslated due to cost or time constraints. Automation either removes or lessens these obstacles.
New markets – Companies planning to enter new markets create new translation challenges: specifically, new language pairs and more words to translate. They then could be forced to engage new LSPs to handle new languages and larger translation volumes, thereby, delaying translations and increasing costs.
If a company has any of the needs regarding improved translation speed, reduced costs, or increased translation volume, then it is a candidate for MT. Careful analyses of requirements and best practices help foster a successful deployment of MT.
Requirements and Best Practices
Unlike some other types of applications, MT is not software that is simply licensed then deployed. Besides managerial support from a range of departments, it requires an assessment of existing language assets and engineering expertise. Choosing the MT vendor that best fits a company’s needs involves testing and, ideally, a pilot system, both of which cannot be over emphasized.
A company considering MT must be aware that it will take time, staff, and therefore, money, in order to: gain management support, develop the infrastructure, evaluate and select an MT vendor, and determine the ROI with a pilot system. This involves a sizable amount of work even before fielding an MT system. The requirements and best practices described are broad and should provide a foundation for bringing MT to an organization.
To reflect the corresponding responsibilities within the company for any MT initiative, the requirements and best practices are grouped into three categories: Managerial, Linguistic, and Engineering.
Managerial Requirements and Best Practices
Buy-in- Transforming the translation process from the typical client-LSP model to one that incorporates automation alters the status quo and requires buy-in from a wide range of stakeholders including vice-presidents, localization and vendor managers, documentation managers, and engineering heads. Asking managers from the departments impacted by MT to appoint staff to be part of the team investigating MT has benefits. Early participation by these individuals helps relieve worries that MT will result in job cuts and brings a broader insight into how best to adopt MT.
Upfront costs are a factor that must be addressed at the outset. If a company chooses one or more vendors for pilot systems, then there will be costs for: compiling training and test data, creating evaluation metrics, training MT systems and processing the test corpus by vendors, evaluating translations, and performing an ROI analysis. Some of these tasks might require involving outside parties (vendors or consultants) that specialize in MT evaluations. The key for buy-in is to set realistic expectations from day one.
Evaluation – Vendor claims of translation improvements should be downplayed because what is valid is how a particular MT engine improves a company’s own translation program. Initially, there will be many unknowns, but a carefully designed pilot project will yield quantifiable results of the benefits of MT. Working closely with vendors on structuring evaluations is invaluable in understanding if a particular engine meets a company’s needs.
Vendor selection – Choosing one of the many MT vendors takes some research. A company will want a vendor that meets its language needs. However, a vendor having MT engines for a wide range of language pairs does not mean that the quality of translations for each pair is the same. Some vendors may specialize in Romance languages, while offering less-tuned engines for Asian languages. Realistically, companies looking at translating into different target languages may ultimately need more than one vendor.
Another major factor in vendor selection is cost. There is some expense for training and running a pilot system, but prior to a pilot system, a company should understand the software and maintenance costs for a fully configured MT engine.
Linguistic Requirements and Best Practices
Well-written source language– As LSPs are well aware, translators need well-written source material to be able to translate accurately and quickly. Good source quality is perhaps more important for MT since it does not have the world knowledge and understanding of the subtleties of language to make good decisions on ill-formed or ambiguous language.
Terminology– Consistent terminology is required for documentation, marketing, and branding as well as for translations. Multilingual terminologies with company-approved translations of terms are not only helpful for human translators, but have been shown to improve output of MT.
Suitable size of bi-lingual corpus– In choosing MT vendors, companies will come across either those offering statistical MT (SMT) or hybrid systems that combine rule-based components with SMT. In either case, these systems require training on a company’s content to maximize translation accuracy. The primary sources of training are the company’s TMs, bilingual corpora (source material that has been translated, but not aligned and imported to a TM), and source texts, which are used to generate statistics on word sequences. Vendors typically look for at least 100K source-target sentences as minimum. However, better results are usually obtained if a client has 250K or more sentences.
Engineering Requirements and Best Practices
As with other translation technologies, MT should be seamlessly integrated into the existing localization workflow. Engineering will be responsible for identifying and fixing any API or other incompatibilities and ensuring the successful integration of MT into the overall infrastructure.
Engineering will be needed to develop a testing platform for accepting training and test files provided by the team of translators and linguists, and components to collect metrics such as translation throughput and speed. The benchmarks will be used for deriving cost-benefits and the ROI of MT.
A Few Final Thoughts
Implementing MT takes time and requires buy-in and effort from many different stakeholders. Working together, MT can be realized and meet a company’s needs.
Bob Kuhns is an independent consultant specializing in language technologies. His clients have included the Knowledge Technology Group in the Sun Microsystems Labs and Sun’s Globalization group. In the Labs, Bob was part of a team developing a conceptual indexing system and for the Globalization group, he was project manager and lead translation technology designer for a controlled language checker, a terminology management system, and a hybrid MT system. He was also responsible for developing translation metrics and leading a competitive MT evaluation. Bob has also conducted research and published reports with Common Sense Advisory, TAUS, and MediaLocate on a variety of topics including managed authoring, advanced leveraging, MT, and global social media.