NTEU: Providing Massive Data Resources and Production-Ready Engines to the Language Community

29 Apr 2021
08:00 AM to 09:00 AM
Pacific Standard Time (Mexico) (UTC-08:00)

Members: Free
Non-Members: $75

NTEU is a massive data-gathering and neural MT engine-building exercise by 3 MT companies (Pangeanic, KantanMT and Tilde) together with SEDIA (the Spanish agency for digitalization and modernization of Public Administrations). Partners have been collecting data and manufacturing synthetic data to train near-human quality in all language combinations for the benefit of Public Administrations. The engines will be made available via ELG (the European Language Grid) and the training data will be made available via ELRC-Share.

This 2-year project will culminate in a translation panel and the availability of the engines in docker form for academia, industry and researchers, as well as the evaluation results. Production-ready engines will guarantee that any Public Administration willing to automate communication will be able to include the engines into their workflows, whilst academia will benefit greatly from massive datasets (15M segments) in all EU combinations.

This webinar will discuss the project goals and how the language community, Public Administrations and NLP professionals can benefit from the project.

Host organization: Globalization and Localization Association

Event Speakers

Manuel Herranz
Pangeanic

MIT in Entrepreneurship, Manuel worked for major automobile manufacturers and power co-generation in the UK in the 90’s with postings in Argentina, Mexico and his native Spain. His background in machine translation comes from his mission to automate language processes for B.I Corp., the Japanese corporation for which he was European Director from 1998-2005. He has traveled to Japan and China extensively. Since 2009, he has focused on the development of Natural Language Processing technologies to provide process automation and true value to clients. A frequent speaker at industry events, Manuel’s areas of interest cover statistics, deep neural networks, adaptive technologies, pattern recognition and deep learning applied to Natural Language Processing. His interest in data acquisition led him to make of Pangeanic a founding member of TAUS and data-sharing initiatives. Manuel is also committed to supporting NGO actions like the Malima Project for primary education in Central Africa, as well as Translators Without Borders, medical research into rare diseases and sports events. Manuel is a double graduate from Manchester University.

Tony O'Dowd
KantanMT

Tony is Founder and Chief Architect at KantanMT.com, a cloud based platform used by some of the largest companies in the world to machine translate billions of words of online content. Prior to this he was Founder of Alchemy Software Development, developers of Alchemy CATALYST, the market leading CAT tool for software and web site localisation. Tony spent three years as a lecturer at Trinity College Dublin, teaching Microprocessor Design and Assembly Language Programming. He has a BSC Computer Science from Trinity College Dublin, is a Fellow of the University of Limerick, and is a founder of FIT Ltd., a $20 million government training organization for the long term unemployed. Tony is currently studying for a Msc International Sales Management.

Artūrs Vasiļevskis
Tilde

Head of Machine Translation Solutions at Tilde,;where he leads the Machine Translation group, overseeing all aspects of MT sales and product development.;Under his leadership Tilde has realized many major language technology projects for eGovernance, such as;EU Council Presidency Toolkit for the EU Council Presidency, language technology platform hugo.lv and enterprises solutions Tilde;MT as well as received global acknowledgment i.e. winning WMT2017,;WMT2018 and WMT2019 over such tech giants as Microsoft, Google. For the solution hugo.lv, Tilde was nominated as Microsoft partner of the year (Latvia, 2019).

Maite Melero

Maite Melero is senior researcher at BSC, where she leads the Machine Translation projects and the research on unsupervised learning and under-resourced languages. She advises the Spanish Plan on Language technologies and is the current technical National Anchor Point for Spain in the European Language Resource Coordination (ELRC) network. She is also a member of the board of Linguapax, a non-governmental organization dedicated to the protection and revitalization of world linguistic diversity and in favor of dialogue and peace