What’s New with Neural Machine Translation
I recently attended TAUS Tokyo Summit, where neural machine translation (NMT) was a hot topic. As Macduff Hughes from Google put it “Neural machine translation was a rumor in 2016. The first releases and tests happened six months ago and now it is here.”
Although it is early to make promises, the translation industry is clever enough not to overpromise again. Having been in the eye of the storm at the times of “statistical machine translation fever,” I understand well enough the results are extremely promising, but no MT professional will tell where the technology is going to lead us.
Since 2007, machine translation companies have built sophisticated platforms based on pure statistical machine translation for related language pairs, or hybrids (rule-based and SMT) when the language pairs involved had little to no grammatical relation. Controlling the input and making it more palatable and predictable to machine learning when grammatical structures were too different seemed the key to establishing statistical patterns. Morphologically rich languages always behaved better with rule-based technology. Statistics were applied to smooth out the final sentence.
What is an Artificial Neural Network?
In short, an Artificial Neural Network (ANN) is a paradigm that processes information in a way that would make us think of our own biological nervous systems (with interconnected decision centers taking part and weighing on a decision). The key element of this (not new) paradigm is the structure of the information processing system, which in the case of humans is composed of a large number of highly interconnected processing elements we call neurons.
Let us remember that not only humans are capable of making a decision. Many mammals can, based on intuition, experience something we have called “a kind of intelligence.” Neurons work in unison to solve a specific problem. Artificial Neural Networks learn by examples, just like us. And to that end they need samples, i.e. data, massive amounts of data.
A neural network can be configured to learn many things, not just bilingual patterns between two languages (two systems). It can also be used for any kind of pattern recognition (such as handwriting) or data classification (pictures, objects, shapes, etc.).
A neural network requires a learning process, after which it can create fairly accurate representations of queries. Taking the human (or mammal) example, a biological system requires adjustments to the synaptic connections that exist between the neurons – and this also happens in artificial networks.
Neural networks do not work miracles. But if used sensibly, they can produce some amazing results.
The Beginning of a Neural Machine Translation Hype?
We may think it all started with Google’s post from their neural machine translation team. It seemed they finally had cracked the language barrier with outputs that were surprisingly human.
The truth is that neural networks have been around in academia for the last 30 years (I would recommend an inspiring paper from 1997 by Castaño & Casacuberta from Jaume I University of Castellón and Universitat Politècnica de Valencia in Spain called “Machine translation using neural networks and finite-state models”).
But even much earlier, back in the 1943, the first artificial neuron was produced by the logician Walter Pits the neurophysiologist Warren McCulloch. However, the technology available at that time did not allow them to do much. For many years, neural technology fell into disrepute and lack of funding.
Research into statistical techniques and data processing have brought about a renewed interest in the neural, as well as the availability of GPUs. Neural networks are trained with the same type of graphic cards used by gamers and there is a simple and good reason for it: those cards are extremely efficient at carrying out mathematical calculations (for gaming it is the rendering of images). Neural networks are all about math. Whereas a CPU will have to look after general monitoring tasks such as controlling the hard disk, interfacing with the motherboard, controlling the temperature, accessing RAM, etc., GPU cards will do math and math only, whose results are quality graphics in the case of gaming, and quality output in the case of a neural network for machine translation.
Google’s initial results – checked on line by an army of professional translators and aficionados – caused a flurry of conversation around the new neural machine translation’s ability to produce quality translations. The linguistic flare of the output was impressive and it has been followed by a surge in academic publications mentioning the “neural” and the recent announcement by Facebook that convolutional NMT can run 9 times faster and produce even better results.
The new buzzword is neural networks, and the hype about artificial intelligence can only grow. The truth is we do not know when and where the new technology can take us. This graph from our presentation at LocWorld 2011 will easily illustrate our starting point. The presentation was a report from one of our clients, Sybase for an English-German SMT engine with a fairly low amount of data at the time, 5M words.
Then below we show the results of some of our initial NMT tests, for comparison.
Although the training sets were different, both were related to software field. The point is how human evaluators ranked 10% as excellent and 30% as good in comparison to a 53% perfect or almost perfect and 39% “light post-editing” required. So that is a 40% very good or good enough in German compared to an impressive 92% with neural networks. The starting point is very, very promising, twice as high as with SMT.
NMT is still in its infancy. It makes unpredictable errors that are not easily spotted. There is no terminology management, so customization is much harder. It is, thus, unreliable. It is machine translation and natural language processing, let us not forget. A lot of the work that was done by the Moses community has to be done all over again (connectors to CAT tools and API services, to name just two). However, the hype is only natural because of the higher level of acceptability of machine translation in general and the higher quality of the results.