Pangeanic is part of EU-funded Empirical Approaches to Machine Translation and Post-Editing
Pangeanic is a key participant in the EU-funded EXPERT (EXPloiting Empirical appRoaches to Translation). The project aims at training both novel and experienced researchers in the application of the latest techniques in hybrid machine translation. Pangeanic is in charge of practical approaches in hybrid machine translation combinations in the 6 official UN languages, testing and providing results on best outcomes with automatic metrics and human evaluation. In this respect, EXPERT’s findings will set an agenda for new skills and jobs.
Factorial, syntax-based and purely statistical approaches in the 6 languages will be tested in the 2-year old DIY SMT PangeaMT platform, with a set of “neutral” recommendations being produced.
Pangeanic released and improved this web-based tool to organize by domain, maintain and clean training material (TM) for MT. Nowadays, this web tool is also able to directly create engines by domain or by TMs, create new linguistic domains wherein new data is added, filter suspicious material to be cleaned as new training material (TMX), etc. The company has tested the creation of new domains by mixing existing domains and training sets and including a full set of system statistics that provide information on engine progress and performance. With PangeaMT, Engines are created or updated depending on domains. The web tool already incorporates hybrid features, and these will be tested, expanded and improved upon in EXPERT.
These will include general pre-/post-processing rules designed to improve MT output. In addition, tests will alter training sets and evaluate the impact of reordering in certain language combinations, measuring gains when using purely statistical, syntax-based or factorial models.
Our role within EXPERT is to provide our hardware and software resources to develop language building tools, the MT system and platform as well as the optimisation of the hybridisation performance. During the project, Pangeanic will give unlimited access to its entire computing infrastructure to provide the project with our massive data repositories and statistical language models, DIY platform, automated retraining features, etc. In particular, we will concentrate on results-driven testing of hybridisation on the 6 official UN languages, carrying out a series of experiments on EN/FR/ES/ZH/ RU/AR.
EU’s Marie Curie EXPERT Project is based on the belief that a number of developments in both Example-Based MT and Statistical MT have already shown the potential of corpus-based approaches to produce fast and low-cost translations, which has significantly increased output at many organizations. Nevertheless, although we live a data-hungry society producing massive amounts of data and results point to less human effort, time and costs in translation, the full potential of machine translation and wide adoption remain a challenge.
The main reason for this is that machine translation tools are not designed to aid professional translators. Some of the shortcomings of machine translation technologies are improductive or even unfriendly user interfaces, often lacking awareness of general translator's feedback, or particular post-editing, etc. Pangeanic has offered its latest web-based SMT management interface capable of creating and re-training machine translation engines at will for the project’s advancement.
The great potential of the new tools remains to be fully exploited. Thus, EXPERT will concentrate on the training of a new generation of researchers producing “neutral” recommendations and best practices that will serve as the basis for a new generation of technology adopters and for the industry in general.