Exploring Plain Language Summaries and Writing for Machine Translation

decorative

Sign up for our newsletter on globalization and localization matters.

 

Although plain language is not a new concept, its use is on the rise. Sometimes called plain writing or clear writing, plain language is often described as a communication that readers can understand on first reading.

Momentum for the use of plain language has been growing since 2006, when the United Nations adopted the UN Convention on the Rights of Persons with Disabilities (CRPD), which has since been signed by more than 160 member states, making it one of the most successful UN treaties. As part of the CRPD, Article 2 specifically includes plain language among the approaches that may be required to make communication accessible.

The right to understand is one strong motivation for writing in plain language, but others include improving the reader experience, reaching new audiences, and optimizing search engines or translation processes. In fact, interest in plain language has grown so much that the International Organization for Standardization published its first standard on the subject in June 2023: ISO 24495-1:2023 Plain language—Part 1: Governing principles and guidelines.

Plain language to reach a broader audience

One area where plain language is gaining a foothold is in science communication or research communication. Traditionally, researchers disseminate their findings in the form of expert-to-expert communications such as journal articles or conference proceedings. Increasingly, however, researchers are being encouraged to share their findings with a broader audience that includes non-experts. This type of outreach can take different forms (e.g., infographics, videos), but one of the most common forms is a plain-language summary. A growing number of journals, particularly in the health and science fields, encourage researchers to submit a plain language summary of their research findings along with the more traditional scientific abstract and article (e.g., Canadian Science Publishing, Taylor & Francis, Sage). In this way, people who are not part of the expert community can still engage with the research. Such people might include funding agencies (who want to understand how their research dollars are being spent), research participants (such as those participating in clinical trials or participatory action research projects), policy makers (who are trying to develop legislation in a specialized area), or researchers from adjacent fields (who might be exploring interdisciplinary research).

Researchers need plain language

Another group that is not often mentioned explicitly, but that could benefit from plain language summaries, are researchers who do research in an additional language. This is actually a very large group. English has become the dominant language of research publication, even though only a small percentage of the world’s researchers and graduate students have English as their dominant language, meaning that many of them must consult the scientific literature through an additional language.

The availability of an English plain language abstract could help non-Anglophone researchers in at least two ways. Firstly, the plain language summary might be easier to read and digest than the traditional scientific abstract. This reduced cognitive load could help a researcher decide more quickly whether the article is relevant to his or her work and therefore worth the effort of reading more deeply. Secondly, the plain-language abstract could be more translation-friendly, meaning that a researcher could use machine translation to translate the abstract from English into his or her dominant language to get the gist of the text and decide whether it is relevant to engage further with the accompanying scientific article.

Plain language for machine translation

As part of a broader project looking at plain language from a number of different angles, we conducted some preliminary research into the production and machine translation of plain language summaries.

The first investigation focused on who should produce plain language summaries. Many scientific journals encourage authors of research articles to produce a plain language summary of their work. However, while these authors are subject matter experts, they are not necessarily trained in writing or communication. In contrast, people working in the emerging field of science communication often have a dual education, such as a Bachelor of Science degree coupled with a Master's degree in journalism or communication.

Canadian Science Publishing (CSP), which publishes 23 scientific journals in various fields of science, engineering, and health sciences, provides two different types of plain language abstracts on its website.  On the one hand, journal authors are encouraged to produce their own summaries of their research, which are hosted on CSP’s Medium website where they are freely available. On the other hand, CSP also maintains a public blog, and one category of blog posts is known as “Briefs,” which consist of plain language summaries produced by science communicators.

The science of communication

We constructed two different corpora corresponding to the two different types of plain language summaries - summaries produced by researchers and summaries produced by science communicators - and compared some of their characteristics.

Some of the main findings include the fact that plain language summaries produced by researchers tend to be shorter overall, but the average sentence length is longer and the number of passive constructions is higher. Using a variety of automated readability scores, such as Flesch Reading Ease and Flesch-Kincaid Grade Level, we calculated the average readability of the two types of summaries, and according to all measures, the summaries produced by science communicators received a better score than the summaries produced by researchers. This is not particularly surprising, since researchers are primarily scientists or engineers, while science communicators are trained communication professionals who know how to write for a non-specialist audience. While the results are not surprising, they do provide tangible evidence that if a publisher wants to engage in science communication, it may be better to hire a science communicator rather than a researcher.

Next, we set out to investigate the translation friendliness of different types of summaries. In this case, we tested three different types of abstracts:

1) traditional scientific summaries produced by researchers;

2) plain language summaries written by researchers; and

3) plain language summaries written by science communicators.

All source texts were written in English, and the free online version of DeepL was used to translate the texts into French.

The translations were then evaluated for accuracy and fluency. Once again, the plain-language summaries written by the science communicators came out on top, while the plain-language summaries written by the researchers came in second and the traditional scientific abstracts were found to be the least translation-friendly.

These results seem rather intuitive, especially since we know that, historically, combining machine translation with controlled language has proven to be more successful than using machine translation for uncontrolled natural language.

However, one of our inspirations for testing the combination of machine translation and plain language came from an article by Shaimaa Marzouk and Silvia Hansen-Schirra (2019) on the “Evaluation and impact of controlled language on neural machine translation compared to other MT architectures.” These researchers found that controlled language – which is an artificially engineered type of language – did not work well with neural machine translation, which is a data-driven approach to machine translation. Because controlled language is not widely used, it is unlikely to be well represented in the typical training corpora used for neural machine translation engines. In contrast, the use of plain language has been growing steadily over the past few decades, and is therefore quite likely to be represented in training corpora. Moreover, the types of guidelines that have been developed for writing in plain language overlap heavily with guidelines developed for writing for machine translation.

Conclusion

The use of plain language is growing, so it is important for us to understand more about it. This very preliminary research on questions such as who should write plain language summaries and how translation-friendly plain language summaries are will help us learn more about plain language.

 

Do you want to contribute with an article, a blog post or a webinar?

We’re always on the lookout for informative, useful and well-researched content relative to our industry.

Write to us.