Imagine: A World with No Barriers to Information
By: Diane Wagner and Mia Marzotto, Advocacy Officer at Translators without Borders
Imagine: being forced to flee your home but unable to understand directions and receive assistance.
Imagine: loved ones dying from a disease – Ebola – that access to accurate sanitation information might have prevented.
Imagine: sending your child to school where no one speaks her language.
Access to information – so easy for many thanks to the Internet – is far more challenging for the world’s poorest, least educated and most marginalized people. It’s even harder for those who don’t speak or understand the world’s so-call “commercial” languages—languages such as Chinese, English, French or Japanese that are considered the lingua franca of technology and commerce. While access is part of the issue – the Internet remains expensive and/or limited by geography/coverage in many parts of the world – the larger issue is finding needed information in one’s own language. Commercial languages have abundant information resources available —from health care to education to the rights of women and children – but these topics are not readily accessible for regional, or underserved, languages.
This reality reflects not only a bias toward commercial languages but also a massive data challenge: language automation requires human intellectual property—massive amounts of parallel data in a useable (digital) format, which, for many languages, simply doesn’t exist. And, as rapid advancement in human-machine interaction continues – think Siri, Ok Google, and real-time speech translation from Microsoft – commercial languages will continue to flourish with new capabilities. Many other languages – including those native for people in large parts of Africa, Asia and Europe – will languish, lacking the data required to bring them online.
As we celebrate Human Rights Day on December 10, linguistic rights – the right to critical information in one’s own language – remain an aspiration for speakers of non-dominant languages. Among several international conventions and declarations, the Universal Declaration of Human Rights and the Universal Declaration of Linguistic Rights have recognized the right to seek, impart, and receive information in one’s own language, and the importance of informed decision making for the enjoyment of all other human rights. Yet, greater efforts are needed to ensure multilingual, fair, and equitable access to information in the worldwide information and communications space.
Now, a new broad-based effort may offer a path forward for underserved language communities. The Common Language Initiative (CLI) is a cross-industry effort that brings together a coalition of technologists, native speaker communities, humanitarian organizations, content creators and owners, and private and public donors to fund investment in language data, making it useful and free of charge to all. The CLI is sponsored by Translators without Borders (TWB), a non-profit organization that provides humanitarian language services around the globe.
The CLI focuses on building data assets that make it easier to automate underserved languages. To be successful, the CLI’s methodology, process, and workflow must be efficient, cost-effective and—most critically—replicable. The CLI has been designed with this repeatable model in mind while also taking into account the end-to-end challenges associated with gathering language data and making it useful.
Here’s how it works:
- Step 1: create/license a test set of simple and accurate content that can be reused for any/all languages.
- Step 2: recruit, train and incent communities of translators, able to translate and localize content for their community.
- Step 3: work with technology partners who host manage large-scale language data as well as creating new language engines.
- Step 4: partner with local developers and humanitarian agencies to bring each new language online in ways that are immediately useful.
- Step 5: with language communities and content providers, drive continuous improvement in language accuracy and quality through every day use and the addition of new content sets. This step is critical to ensure quality, improvement and relevance over time.
In 2018, TWB and coalition of partners will focus on developing a base, reusable content set, piloting language data collection for one to three languages, and then test and measure results. The goal over the next decade is to bring 20 underserved languages online, creating a useful, sustainable and free asset to empower people through greater access to critical information, alleviating suffering, enhance communication and creating opportunity to participate in our tech-driven world. By moving towards the realization of universal language rights, the CLI will allow speakers of these underserved languages not just to acquire information, including multilingual local content, but also to transform it into knowledge and understanding thereby empowering them to increase their livelihoods and contribute to the social and economic development of their society.
If you’re interested in learning more about the CLI, there are many ways to participate:
- Read – and give TWB feedback on its white paper, “The Common Language Initiative: Solutions for Underserved Languages.” Comments close on January 15, 2018.
- Contribute healthcare, crisis relief, education and/or job skilling content to grow the usefulness of the data set.
- Share parallel data – TWB’s focus in 2018 is on Bengali, Hausa and/or Kiswahili but parallel data in any underserved language is welcome.
- Volunteer your expertise – in localization, project management, content creation and/or engineering.
- Follow the CLI’s progress on Facebook, Instagram, and Twitter.
- Donate unrestricted funding to Translators without Borders to support the CLI.