A Tool-Based Approach to Terminology Consolidation And Validation
By: Andreas Ljungström (Language Technologies Consultant) - AMPLEXOR International S.A.
18 January 2017
Imagine a small team working on product documentation. They are used to working together and have made strict agreements on the terms they use for the products their company sells. It is quite easy for them to manage those terms: there’s a limited amount, governed by good agreements, and they only sell their product in one target area, in one language.
Now imagine that company growing and targeting different local markets. Suddenly, the team finds itself facing a terminology issue: the terms they’ve been using have to be translated and managed.
A year later, the company decides to branch out and hire another product documentation team located in another office, working on a related but new product series. Communication across teams about terminology gets more difficult, and after a while the teams notice that the use of terminology has diverged for lack of centralization.
The two teams start managing their own terminology in lists for each product. The two teams end up with lots of lists that they continue to send out to their language service provider (LSP) in order to update target language terms. Although meant to help, those terminology lists often raise concerns with the LSP, who argues that the terminology entries are inconsistent. And so, the lists are often returned cluttered with questions that take a great deal of effort for the teams to clarify. They are slowly realizing that managing their product terminology in Excel was perhaps not the best idea…
The creation of terminologically consistent product documentation gets increasingly difficult and time-consuming as time goes on. And what is more – they have noticed that their colleagues in Marketing have adapted much of the product wording to suit their own needs without previously liaising with the other teams on terminology questions.
This example shows us the bad news: valuable terminology assets are easy to pollute and they can grow uncontrollably over time. In the worst case, you end up with terminology silos that are different in structure and content, which may be of use for a specific user group but pose no added value for other stakeholders.
Given that we know the importance to collect, maintain and disseminate valid terminology in order for content creators, translators and revisers to do their jobs, is there anything we can do to clean up this mess in the terminology (hay-)stack and bring structure and order back into it? Is it possible to avoid unnecessary manual effort and to determine the effort it will take to remove any potential inconsistencies before actually doing it?
The good news: it is possible to harmonize diversely structured term assets from different sources using term expert tools. A polluted terminology stack can be freed from redundancy and conflicts before it is published to a wider audience.
Let’s take a look.
The starting point: where is the value?
Terminological assets are likely to grow in quantity over time, and with sophisticated mono- and bilingual term extraction tools available on the market this growth can indeed be achieved quite fast. Unfortunately, we see that growth and controlled collaboration do not go hand in hand. Terminology maintenance is rarely a corporate priority. Thus, these distributed, “departmentalized” assets seldom share the same structure or information level.
That’s a pity, because the assets lose their real value: the content is unstructured, inconsistent or unavailable for some of the stakeholders.
Terminology Management Systems often aim to address those challenges, yet most standard tools have no means to incorporate assets that have varying structures or content. They invariably fail at making the terminologist aware of innate conflicts on content level inside one or across several assets. The end result is the same:
- time is lost trying to manage messy terminology
- the once so useful one source of truth for naming conventions fails to provide the answers you are looking for
- the quality of the terminology used remains low
The consequences of low-quality terminology
In fact, this question merits a blog post of its own, but let’s touch upon at least one important aspect. We know that translators confronted with terminology that is inundated with conflicts spend more time researching recognized terms.
In a survey we conducted with our technical translators, respondents reported that they invest as much additional research time on checking poor terminology as they do when they are not presented with terminology at all and have to research from scratch. Presented with inconsistent terminology, the translator starts investigating, maybe even going back up the translation chain to get answers to this terminological enigma. Precious time is lost that could be better spent elsewhere!
Consequently, by providing low-quality terminology, you are indeed doing your translator a disservice. It distracts more than it helps.
The solution: a semi-automated, tool-based approach
Still today, many companies try to maintain their terminology with Excel files. Doing terminology in Excel is error-prone and full of limitations. Working with Excel, or even with standard Terminology Management Systems, will not empower users to find terminological overlaps and intersections, or near-duplicates with similar meaning. Also, with most standard tools it is impossible to estimate the effort of manually consolidating a terminology stack full of inconsistencies.
We need to do better than that. Terminologists need expert terminology tools developed specifically to address such problems. Today, there are extremely few tools on the market advanced enough to actually help terminologists automatically identify conflicts and inconsistencies, while simultaneously providing an upfront estimate of the consolidation and cleanup effort. To be really effective, we need sophisticated verification algorithms that compare term entries across all term assets and highlight conflicts in a dedicated user interface for easy recognition and resolution.
Terminologists should be able to easily clean up polluted term entries and resolve conflicts at the click of a button – if possible within the comfort of a web browser. No more sorting and comparing in Excel, no more sending around lists to get the latest version approved or the target languages checked by a reviewer – just enter the application and start collaborating. The output is a harmonized, consistent terminology stack that can be reused in the tool of your choice.
Our past consolidation projects have shown that manual resolution of conflicts in Excel can be a near insurmountable task. Faced with several input files to compare, each representing a multilingual term asset, just finding and resolving one conflict can take up to an hour. An automatic check algorithm should be able to reduce that to a fraction of a second. In other words, if we're assessing the manual effort of comparing and consolidating say 50 different Excel sheets each with 24 languages totaling at 90,000 terminological concepts, a tool-based approach is indeed the only feasible option – unless you want to spend the next few months of your working life sifting through columns and rows in spreadsheets.
Given the importance of terminology and the severe impacts of terminology mismanagement, it’s time you take your terminology assets from the high shelf and find the proper cleanup tool to dust them off with. While you're at it, make sure to give your terminology assets a good polish before you offer your stakeholders the chance to use them.
|Andreas Ljungström is a Language Technologies Consultant and Certified CAT trainer at AMPLEXOR Germany. His main area of focus are language technology consultancy and professional services aimed at streamlining and automating terminology and CAT processes on both customer and LSP side. Find him at: https://de.linkedin.com/in/andreasljungstroem|