Raw Machine Translation and Intellectual Property Rights (IPR) Work

One day in the life of a patent professional

104 patent documents. That was the number Sven needed to review before meeting with the technical team the next week.

The 104 documents came from Sven’s patent search. Now he had to work through each of them to determine whether it was relevant in the team's evaluation of whether their newest invention (code name Titan) should be patented. Finding all of the relevant existing patent documents, including already granted patents and patent applications, was key to ensuring that no resources were wasted writing a patent application for a new product or invention that was not novel and therefore not patentable. In fact, missing a relevant document was one of the bigger risks Sven faced, which is why patent searches had to be thorough. This one required a variety of search techniques, several iterations of searching, and a full 12 hours to complete. It was a bit like your basic Google search on steroids.

Sven’s next step was to review all 104 documents and mark the ones he didn’t think were relevant, meaning that they didn’t seem to have similarities or features that overlapped with Titan. Normally this step would be done by the technical team but just now they were very busy, so he agreed to do the first review. Although as the patent professional, he had better knowledge of the patent genre, the technical experts knew the technology better. It took both sides’ opinions to make a good judgement and that is why they would meet the next week to discuss it.

Of the 104 documents to be reviewed, 41 were in languages other than English: 27 were in Chinese, 6 in Japanese, 4 in Korean, 3 in German and 1 in French. Sven was as responsible for reviewing the content of these documents as he was for the English-language ones.

The German ones were easy since Sven’s German was good enough so that could read it directly and understand almost everything. The few terms he didn’t know, he threw into machine translation (MT).

He also knew some French but not enough to read the French document without help. Instead, he put it through MT and adjusted the tool settings so that the original text and translation would be shown side-by-side. He then took what he could understand from both texts and put together a general understanding of the meaning of the document.

2 different views of a patent from the European Patent Office’s Patent Translate tool.

2 different views of a patent from the European Patent Office’s Patent Translate tool. One the left is an excerpt of a  bilingual PDF generated by the system, on the right is a view of the English translation in the tool. The user has hovered over the paragraph in red to see the original Korean text, which is then shown in the purple pop-up.


For the Chinese, Japanese and Korean documents, Sven relied entirely on MT. The quality he got from the MT for these languages was lower than for German and French, although he noticed that it had improved over time, and especially in the past 4 or 5 years. He translated all of the documents into his working language of English.

Sven’s main MT tool was the one provided directly in the patent search tool he used daily. It was developed by the same company and was meant specifically for patent documents. He also relied on a few other MT tools when needed, like when he ran across a passage he didn’t understand or wanted to verify that a passage actually said what the first MT tool said that it said. For example, he might use the MT tools offered by the national patent offices of Japan and China, or the European Patent Office’s tool.

With the machine-translated documents, he needed to decide which would need human translation and which were good enough quality to allow him to understand the important parts (mostly the claims). In many ways this decision came down to a calculation of risk. If a document looked relevant and would be used in a high-risk IPR process, for example when evaluating freedom to operate, then he might immediately send it for human translation. If it were to be used in a lower-risk process but seemed relevant and the MT was just not understandable, that might be sent for human translation too. But if he and the technical team felt they had a good enough understanding and the document was relevant, they could use it in its machine-translated form throughout the long process of filing for and being granted a patent. If it became problematic during the process, they could send it for human translation at a later stage.

Sven then started the critical task of classifying documents as clearly irrelevant, possibly relevant, or very relevant, and documenting his findings in the excel the team would use when reviewing the documents in their joint meeting. The goal of the meeting would be to identify a small set of the most relevant documents describing the most similar existing inventions.

Another thing Sven did to prepare for the meeting was make a list of the documents he wanted to go through more thoroughly with the team. He knew that there would be some of the newer team members at the meeting, so he planned on opening some of the machine-translated documents in the review to walk through them. Inexperienced people often had a funny first reaction to MT, laughing and saying it was gibberish. Reading through machine-translated texts together was a good way to get them beyond that to where they could see that it is possible to get meaning out of raw MT.

If the examination of the most relevant patent documents showed that Titan did not overlap with any similar existing inventions, the team would decide to proceed with writing a patent application. The set of relevant documents would then help them to shape the application so that it had no clear overlap with any of them. The application would be written and filed in English, and eventually translated if needed. That work would, of course, be done by human translators. 

After the filing, there would be a long process in which the application would be examined and open for commenting before the final granting of the patent. During this time other parties, for example, the competitors who held the most relevant patents, could comment on the patent application. Those parties would also have done a thorough search for relevant patent documents and might make objections to the application based on past documents, some of which might also be machine-translated. Throughout it all, the use of raw MT would be transparent to  all parties and machine-translated documents would be marked as such. 

Background to the story

In 2018–2019 I conducted a study of nine Scandinavian patent professionals, defined as  IPR professionals who use their expertise in patenting to assist others in IPR processes. I explored how these professionals use raw MT in their everyday work. It was part of my Ph.D. which focuses on the contexts in which people use raw MT, and the factors in those contexts that influence their use and reception of the raw MT.

I was fascinated by many of the things found in the study. Patent professionals use raw MT on a very regular basis to ”gist”, or consume raw MT with the aim of understanding as much as they need for a specific purpose. In this case they use MT gisting to read patent documents that are in languages they don’t speak themselves. Since the amount of information they need to review is very large (sometimes ”hundreds and hundreds of patents at a time” according to one of the study’s informants), it would simply be impossible to translate it all using human translators.

Prior to the study, I assumed that even if people used MT, they wouldn’t want to talk about it. But I discovered that the use of raw MT for gisting in this environment was open, widespread, taught to newcomers, and considered a legitimate way of accessing information in unknown languages. The legitimacy was visible in the transparency of the practice, in official patenting guidelines that also covered MT use, and also in the participants’ views on where the boundaries of that legitimacy lay. For example, the study’s informants consistently identified legal settings as out of bounds for raw MT. You would not use raw MT in court. You also wouldn’t use raw MT to apply for a patent. 

Another interesting thing was how the informants in my study described the risk assessment processes they used when deciding whether to send a document for human translation or rely on the raw MT. This was a group of MT users who did not need to be told about the risks of MT; they were very aware of them. However, the group also dealt with a number of other risks in their work and seemed well accustomed to a constant weighing of risks and benefits when making decisions.

I chose to write this description of how patent professionals use MT gisting as a story because explaining it always requires a lot of description, so the story genre works well. But also because, well, I like stories (see other stories of MT gisting in mt-stories.com). However, the story was based on details taken directly from the studies listed at the bottom of this article.

Many details came from the survey of patent users done by Joho et al. (2010). The 104 patent documents Sven needed to review came from the survey’s finding that the average number of documents examined was 100 (I changed it to 104 because 100 was just too round). The 12 hours that went into Sven’s search was the average search time found in the survey. Even the character Sven came from Joho et al. - a majority of the survey respondents were male and the top country of origin of respondents was the Netherlands.

The figure of 41 documents that were in other languages than English started with Tinsley et al.’s (2012) estimate that on average, 30% of patent search results are not in English. Since the number of patent documents originating in China has skyrocketed in the past 10 years, I took a calculated guess that the current average might be roughly 40%. The breakdown of the languages those 41 documents were in came from 2020 statistics published by the World Intellectual Property Organization and the European Patent Office.

Other details came from my own 2018–2019 study, which is reported on in articles [3] and [4] below.

Based on the following sources:

[1] Joho, Hideo, Leif Azzopardi, and Wim Vanderbauwhede. 2010. "A Survey of Patent Users: An Analysis of Tasks, Behavior, Search Functionality and System Requirements." Proceedings of the Third Symposium on Information Interaction in Context, 13–24.

[2] Nurminen, Mary. Machine Translation Stories 

[3] Nurminen, Mary. 2020. "Raw Machine Translation use by Patent Professionals: A Case of Distributed Cognition." Translation, Cognition & Behavior 3 (1): 100–121.

[4] Nurminen, Mary. 2019. "Decision-Making, Risk, and Gist Machine Translation in the Work of Patent Professionals." Proceedings of the 8th Workshop on Patent and Scientific Literature Translation: 32–42.

[5 Tinsley, John, Alexandru Ceausu, Jian Zhang, Heidi Depraetere, and Joeri Van de Walle. 2012. "IPTranslator: Facilitating Patent Search with Machine Translation." AMTA-2012: Proceedings of the Tenth Biennial Conference of the Association for Machine Translation in the Americas, 1–9.