NTEU: Providing Massive Data Resources and Production-Ready Engines to the Language Community

29 Apr 2021
08:00 AM to 09:00 AM
Pacific Standard Time (Mexico) (UTC-08:00)

This event has expired, video available

NTEU is a massive data-gathering and neural MT engine-building exercise by 3 MT companies (Pangeanic, KantanMT and Tilde) together with SEDIA (the Spanish agency for digitalization and modernization of Public Administrations). Partners have been collecting data and manufacturing synthetic data to train near-human quality in all language combinations for the benefit of Public Administrations. The engines will be made available via ELG (the European Language Grid) and the training data will be made available via ELRC-Share.

This 2-year project will culminate in a translation panel and the availability of the engines in docker form for academia, industry and researchers, as well as the evaluation results. Production-ready engines will guarantee that any Public Administration willing to automate communication will be able to include the engines into their workflows, whilst academia will benefit greatly from massive datasets (15M segments) in all EU combinations.

This webinar dscusses the project goals and how the language community, Public Administrations and NLP professionals can benefit from the project.

Host organization: Pangeanic

Event Speakers

Manuel Herranz

MIT in Entrepreneurship, Manuel worked for major automobile manufacturers and power co-generation in the UK in the 90’s with postings in Argentina, Mexico and his native Spain. His background in machine translation comes from his mission to automate language processes for B.I Corp., the Japanese corporation for which he was European Director from 1998-2005. He has traveled to Japan and China extensively. Since 2009, he has focused on the development of Natural Language Processing technologies to provide process automation and true value to clients. A frequent speaker at industry events, Manuel’s areas of interest cover statistics, deep neural networks, adaptive technologies, pattern recognition and deep learning applied to Natural Language Processing. His interest in data acquisition led him to make of Pangeanic a founding member of TAUS and data-sharing initiatives. Manuel is also committed to supporting NGO actions like the Malima Project for primary education in Central Africa, as well as Translators Without Borders, medical research into rare diseases and sports events. Manuel is a double graduate from Manchester University.

Artūrs Vasiļevskis

Head of Machine Translation Solutions at Tilde,;where he leads the Machine Translation group, overseeing all aspects of MT sales and product development.;Under his leadership Tilde has realized many major language technology projects for eGovernance, such as;EU Council Presidency Toolkit for the EU Council Presidency, language technology platform hugo.lv and enterprises solutions Tilde;MT as well as received global acknowledgment i.e. winning WMT2017,;WMT2018 and WMT2019 over such tech giants as Microsoft, Google. For the solution hugo.lv, Tilde was nominated as Microsoft partner of the year (Latvia, 2019).

Maite Melero

Maite Melero is senior researcher at BSC, where she leads the Machine Translation projects and the research on unsupervised learning and under-resourced languages. She advises the Spanish Plan on Language technologies and is the current technical National Anchor Point for Spain in the European Language Resource Coordination (ELRC) network. She is also a member of the board of Linguapax, a non-governmental organization dedicated to the protection and revitalization of world linguistic diversity and in favor of dialogue and peace