Letter from the Guest Editor: Jochen Hummel, LT Innovate
By: Jochen Hummel (LT Innovate)
20 May 2015
This quarter’s GALAxy Guest Editor, Jochen Hummel, foresees a major disruption in the near future — one that will elevate the question of multilingual data to a C-level concern. How can Language Technology leverage Big Data and transform the industry? Within this framework, Hummel introduces this quarter's issue of GALAxy and invites readers to engage in this brave new world.
The Language Service Industry hasn’t seen much relevant innovation in recent years. In other industries, today’s topics are very different from what was talked about a decade ago. However, in localization the fundamental technologies of content management, term bases, translation memory, and even statistical machine translation were invented more than 25 years ago. What we have experienced in the intervening years are me-too products, more bells and whistles, and faster computers.
A main reason for this lack of innovation is that it continues to be hard to make computers process language. For the foreseeable future, recycling of human-created content will remain the best way to automate translation. But even then, there’s a lot of space for disruptive innovation. Today we use a different repository for every recycling strategy, often also different products. We recycle words with term bases, fragments with SMT, sentences with TM, and larger chunks with CMS. A smooth integration of these various approaches hasn’t been delivered yet. Also, it is basically impossible to keep the data in these different repositories in sync.
When innovation does not happen it’s wise to look at the business side of things. For some strange reason, many LSPs develop their own tools. But translation itself is not what’s costing most. For many languages it’s almost negligible. In fact, successful LSPs do not really sell translation. What they sell is a reliable, scalable, integrable, and measurable process! Managing that process is costly and tedious due to the complexity of the current recycling technologies. So innovating translation tools, squeezing a few more percent of fuzzy matches out of a TM or improving the BLEU score of SMT is, business-wise, rather pointless. Yet at the same time, rethinking the whole process would be a major threat to the current LSP business model.
I believe that sooner rather than later somebody, probably an outsider, will close that gap. The “Uber of localization” won’t be created by putting an e-commerce layer on top of today’s processes. It will be created by re-architecting the tools below, with just two goals in mind: process simplification and data maintenance. If done well, that disrupter can directly link content creators with the underpaid translators who do the hard work which cannot be so easily automated.
Difficult times ahead for LSPs? Only for their current business model. If they are smart they will conquer new markets and significantly grow their business. Astute and agile LSPs will profit from two mega trends. First, the content they were translating used to be created by companies and governments. Today, however, the vast amount of content is created by customers and citizens. Increasingly, companies prefer to curate rather than create content. But along with the content, the knowledge also moves extra-muros.
The second megatrend is Data - the oil of the 21st century. Or rather, the rocks in which successful companies drill for information and refine it to insights. Big Data is big because it is created by billions of customers and citizens. And a larger part is unstructured data, i.e. text, which is always multilingual. Companies and institutions can agree on using specific working languages; but customers and citizens will always use their mother tongue.
Sometimes the data is not big but smart. Then it’s more about finding the needle in the haystack: for example, customer support requests pointing to a major product defect or new competing products violating a company’s patent. One thing is sure: whoever does a better job in retrieving the knowledge buried in the gigantic heap of data has a decisive advantage over the competition. Mining knowledge out of global content and building an organization capable of processing multilingual data will become what localization never was — a C-level concern.
How can we help these C-level managers? By using Language Technology. Lots of it: knowledge management, voice recognition, tokenization, machine learning, text analytics, search, machine translation, sentiment analysis, natural language generation, speech, etc. But none of these LT components work off-the-shelf. They need to be customized for domains and languages and also be complemented with human knowledge. LSPs have the multilingual data and the know-how to deal with this. Working together with LT companies, they could build systems such as a cross-border Online Dispute Platform for the EU or a global Customer Service application.
Even when such systems capable of processing multilingual data have been deployed, they will not be able to handle all cases. This is where humans need to help out. And they need to be available for all supported languages. LT companies want to develop and market technology. They are not interested in or do not know how to deliver services. LSPs, however, possess a unique body of knowledge: how to resource and manage global supply chains. Together they can deliver multilingual data processing solutions and solve problems that go way beyond localization.
The following articles are intended to provide some inspiration about how language technologies can solve today’s and tomorrow’s business problems. At this year’s GALA conference in Seville, I had the privilege to present the above thoughts. I closed my keynote with a slide prompting collaboration between GALA and LT-Innovate members in order to help companies process multilingual data. Together we can deliver these multilingual solutions. They will keep our customers competitive and earn ourselves healthy margins. Engage!
Jochen Hummel is an entrepreneur/director/mentor with a coder background. He founded TRADOS, the world leader in computer-assited translation, and Metaversum, a highly innovative startup combining Web 2.0 and virtual worlds. The CEO of ESTeam AB, a solution provider for processing and searching multilingual information, he has a wealth of experience in international business: He built global organizations, raised venture capital, has been involved in M&A on both sides, and held executive positions in development, sales, and general managment, board seats. Jochen is interested in combining technology with innovative business models and connecting European skills with American virtues or vice-versa.