E.g., 08/15/2018
E.g., 08/15/2018

New Frontiers in Linguistic Quality Evaluation

By: Brad Ross (Director of Product) - Lingotek, Inc.


02 April 2018

This article accompanies the GALA webinar "New Frontiers in Linguistic Quality Evaluation & Scoring" that Brad Ross gave in January of 2018. Readers who wish to explore this topic more deeply may want to listen to the webinar on GALA's global site which is free for all GALA members.

Moving from labor-intensive spreadsheets to cloud TMS scoring and reporting

When it comes to translating content, everyone wants to find the highest quality translation at the lowest price. A recent Slator article on the results of an industry survey of over 550 respondents revealed that translation quality is over four times more important than cost. For translation buyers, finding a language service provider (LSP) that can consistently deliver high-quality is significantly more important than price. For LSPs, finding linguists who can deliver cost-effective, high quality translation is key to customer happiness.

Meeting quality expectations is even more difficult with the demand for higher volume. The Common Sense Advisory Global Market Survey 2018 of the top 100 LSPs predicts a trend toward continued growth. Three-quarters of those surveyed reported increased revenue and 80 percent reported an increase in work volume. That’s why improving and automating the tools and processes for evaluating linguistic quality are more important than ever. LSPs and enterprise localization groups need to look for quality evaluation solutions that are scalable and agile enough to meet the growing demand.

Evaluating the quality of translation is a two-step process. The first step involves Quality Assurance (QA) systems and tools used by the translator to monitor and correct the quality of their work, and the second step is Translation Quality Assessment (TQA) or Linguistic Quality Evaluation (LQE), which evaluates quality using a  model of defined values, parameters, and scoring based on representative sampling.

The Current State of Linguistic Quality Evaluation & Scoring

Many enterprise-level localization departments have staff specifically dedicated to evaluating translation quality. The challenge for these quality managers is creating and maintaining an easy-to-use system for efficiently scoring vendor quality.

Today, even the most sophisticated localization departments resort to spreadsheets and labor-intensive manual processes. The most commonly used LQE and scoring methods rely on offline, sequential processing. A March 2017 Common Sense Advisory, Inc. brief, “Translation Quality and In-Context Review Tools,” observed that the most widely used translation quality scorecards “suffer from a lack of integration.”

“Many LSPs continue to rely on in-house spreadsheet-based scorecards. They may be reluctant to switch to other tools that require process changes or that would raise direct costs. Unfortunately, these simple error-counting tools are typically inefficient[1] because they don’t tie in with other production tools and cannot connect errors to specific locations in translated content. In addition, they are seldom updated to reflect TQA best practices, and it is common for providers to use scorecards that were developed many years earlier with unclear definitions and procedures.”

In an age of digital transformation and real-time cloud technology, LQE is overdue for an automated, integrated solution.

Reducing manual processes = reducing human error

One critical step to ensure quality translation is to reduce the number of manual processes and to automate evaluation as much as possible. There is a direct correlation between the number of manual processes and the increased likelihood of errors. These usually occur when cutting and pasting from the content management system (CMS) into spreadsheets and back again. 

Evaluation scorecards, typically managed with spreadsheets, are very labor intensive. The spreadsheets usually include columns for languages, projects, sample word counts, categories, and error types. They also can include complex algorithms for scoring severity. To evaluate quality segment by segment requires copying and pasting what was corrected, the severities of each, etc. 

To perform sample testing, localization quality managers extract some percentage of the total project to examine. If the project contains thousands of documents, they may use an equation --ten percent of the total word count, for example. They will then export those documents, load them into ApSIC Xbench, Okapi Checkmate, or some other tool for checking quality programmatically, and open a spreadsheet to enter quality feedback and/or issues. When the quality evaluation is complete, it is cut and pasted back into the CAT Tool, often with annotations.

LSPs resort to these less than desirable scoring methods, because there haven’t been any tools on the market to create or administer a quality program at scale, until now.

The New Frontier of Linguistic Quality Evaluation

Centralized quality management inside a TMS

Top-tier cloud translation management system (TMS) platforms now have the ability to make assessing vendor quality easier and more automated with LQE and scoring inside the TMS. It can be purchased as a TMS add-on or clients can outsource quality evaluation and assessment to LSPs offering quality services using this innovative LQE technology. 

The centralized storage of information and the agile change management that a full API and cloud technology can provide eliminates the need to rely on error-prone manual processes. It centralizes quality management, supports flexible and dynamic scoring, and incorporates LQE as a seamless part of the workflow.

Currently, localization quality managers have to go into the TMS to get their sample, bulk select and download the information. With integrated LQE, there are no offline tasks to slow down the evaluation process or that can lead to human error. Quality evaluation is easily added to the workflow template by selecting from a list of published quality programs. From there, tasks are automatically assigned, and quality evaluation is performed in an integrated CAT tool/workbench, including running programmatic quality checks on the translated content.

Creating an LQE program inside the TMS

Creating and setting up a quality program can be challenging and time consuming, but it will ensure that everyone identifies quality issues the same way, which will simplify and improve communication over what constitutes quality. It requires a sophisticated level of experience. Those who aren’t particularly skilled at LQE run the risk of costly inefficiencies and unreliable reporting. 

The latest LQE software has the ability to base a quality program on an industry standard, such as the TAUS Dynamic Quality Framework (DQF) or the EU Multidimensional Quality Metrics (MQM). Because these standards can be overly complex and may contain more error types than needed, the software allows you to create a custom quality program by selecting elements of each.

Define error types, categories and severities

Inside the TMS, quality managers can create and define the core components of their quality program by defining error types, categories, and severities.

Severity levels range from major--errors that can affect product delivery or legal liability--to minor errors that don’t impact comprehension, but could have been stated more clearly. An error-rate model counts the errors resulting in a percentage score, starting at 100% and deducting for points lost. It is important to differentiate between how serious the error is, so a numerical multiplier is added to account for severity. The less common rubric model begins at zero and points are added if the translation meets specific requirements, for example, awarding points for adherence to terminology and style guides.

Publishing

After creating your quality program, you need to think about how you are going to publish and distribute the quality program. Change management can become a nightmare if the program isn’t centralized. A cloud-based program allows you to publish, change, and unpublish quickly, so if you make an adjustment to a severity level designation, you have the ability to notify all users of the change immediately. 

A cloud LQE app lets you keep prior versions of quality programs for historical reference, so translations will be held to the standards that applied at the time of translation, and not necessarily the most current standard. If your TMS doesn’t include this functionality, consider publishing your quality program on a wiki or in one of the many options for cloud-storage. This provides a centralized place that everyone is referring back to, instead of an offline spreadsheet.

Flexible and dynamic scoring

Scorecards, as CSA mentioned, need to be dynamic--based by content type, domain, connector, etc.--to manage translation in and out of the translation technology. Not all content requires the same quality level. A discussion forum or blog post may not need the level of review that a legal document or customer-facing brochure might require. The new frontier in flexible and dynamic scoring contains an algorithm that can set up scorecards automatically depending on content type.

The algorithm also lets you establish a standardized word count as a baseline for comparing quality scores among documents of different sizes. This gives you an apples-to-apples comparison, because the same number of errors should be viewed differently in a 500-word document than in a 5,000-word sample. To create an accurate and efficient weighting or total error point system, flexibility is important.

Feedback loop

The most critical component for improving quality is for feedback to be accessible by all parties involved: linguists and translators, reviewers, quality managers, and clients. When all parties have access to feedback, it improves communication and reduces the discussion that occurs when debating the subjective elements of scoring. When you have clear communication and scoring that is continually represented, it helps reviewers provide the appropriate feedback, quickly and easily.

Continuous, real-time feedback also creates an opportunity for improvement that is immediate. In offline scoring, a linguist may continue making the same mistake in several other projects before learning about the error. Cloud LQE enables real-time feedback that not only corrects an issue, but also trains linguists to improve the quality for the next (or even current) project.

The transparency this provides moves the entire process toward more objectivity, and the more objective the feedback, the less discussion is required to get clarification when a quality issue arises.

Quality reporting

Once linguistic quality evaluation has been done, you want to be able to review the data for quality reporting purposes. Cloud LQE allows reporting to be shared, so that clients can see issues affecting quality over time. You can track quality over time, by project and by locale, for all targets. Easy-to-read pie charts display the number of quality issues in each category such as terminology, style, language, and accuracy. This lets you monitor trends over time and to use that objective data for insights into improving quality delivery.

Conclusion

The new frontier in LQE is a cloud-based solution that improves user experience by streamlining quality evaluation. It reduces ambiguity, improves communication, and creates an objective platform to discuss and resolve quality issues. 

With a single app for managing quality, LSPs and enterprise quality managers can streamline project set up and don’t have to rely on labor-intensive spreadsheets to describe or score the quality program. The minimal effort required to set up an online program is more than offset by the efficiency gains. You don’t have to move from Microsoft Excel to Word, then to a computer-assisted translation (CAT) tool, it’s now all in one place. 

Efficiency of communication is also improved, making it easier for everyone to be on the same page when it comes to creation, scoring, publishing, and feedback. Improved quality data collection and reporting lets you monitor trends over time and use the objective data to inform your strategic decision making to improve translation quality. 

As the CSA  industry survey discovered, it’s not the price of translation, it’s the quality, so now may be the time to go boldly into this new, LQE frontier.


[1] CSA Research, pages 2-4 in “Remove Process Waste for Greater Efficiency” (Jan 17) and pages 2–4 in “How LSPs Can Remove Waste in the Process” (Sep 16).

randomness