Welocalize Reveals Findings on Translation Quality of LLMs versus Generic and Custom MT

Generative AI and Large Language Models (LLMs) are expected to disrupt content services industries and professions in several ways, including putting the ability to generate multilingual content into the hands of content authors, streamlining workflows, and performing translations.

Welocalize, a pioneer of tech enablement within language services, is monitoring these forces. Language services companies have used AI to translate content for decades and understanding how LLMs compare to that technology will help inform how the multilingual services landscape will evolve.

In a recent study, the company compared the performance of eight different LLMs and MT (Machine Translation) workflow variants, including a commercial MT system trained by Welocalize and currently in production. The company analyzed the quality of translations for customer support content from English into five target languages, including Arabic, Chinese, Japanese, and Spanish.

According to Welocalize's findings, the custom NMT (Neural Machine Translation) models outperformed all others, both with ‘pure LLM’ output, and those combining NMT and LLMs. However, it is worth noting that, the output from both the LLM-augmented workflows as well as the ‘pure LLM prompts’ came very close to meeting a high industry standard quality level threshold, sometimes differing by mere tenths of a percentage.  

 It is particularly impressive that more challenging target languages like Arabic, Chinese, and Japanese saw promising results,” comments Elaine O’Curran, Senior AI Program Manager at Welocalize. 

Although LLMs like GPT-4 may not yet quite match the raw translation performance of highly trained NMT engines, they exhibit impressive proximity to achieving similar results. 

As LLMs become fine-tuned and work their way into the corporate IT stack, their ability to achieve desired translation results with lighter prompting and minimal task-specific training will be a compelling alternative.  

It is easy to imagine a future where LLMs outperform NMT, especially for specific applications, content types, or use cases. We will continue to compare and analyze their performance in the coming months,” adds O’Curran. It will also be interestingto see the performance of customized LLMs. Similar to MTs, the idea is to fine-tune the model for a specific context, domain, task, or customer requirement to enhance their ability to provide more accurate translations for different use cases. 

Moving Multilingual Content Generation Upstream

The integration of LLMs into content tools and workflows could reconfigure the translation industry. Companies will be able to produce content simultaneously in multiple languages, streamlining their processes and increasing efficiency.

LLMs represent a force of potential disruption within the translation industry. As they continue to evolve and become more accurate, this will lead to an uptick in automation and push translation and localization upstream in the content supply chain

For more information, visit welocalize.com.

### 

About Welocalize 

Welocalize, Inc., ranked as one of the world’s largest LSPs by language industry intelligence firms CSA Research, Nimdzi, and Slator, offers innovative language services to help global brands reach audiences around the world in more than 250 languages. The company provides translation and localization services, linguistic talent management, language tools, automation, and technology, quality, and program management. Its range of managed language services include machine translation, digital marketing, validation and testing, interpreting, multilingual data training, and enterprise translation management technologies. As a pioneer of tech enablement within language services and digital transformation, Welocalize is uniquely positioned to help its clients capitalize on recent developments in generative AI. Welocalize.com