Why Is Japanese Challenging?
By: Sachiyo Demizu (Nikon Precision) - Nikon Precision Inc.
06 June 2013
Sachiyo Demizu (Nikon Precision Inc.) takes a look at the fascinating history of the Japanese language and how it has made translation to and from this language such an undertaking.
When you try to expand your localization business to Japanese, you may encounter all sorts of problems. Why?
A typical Japanese person would answer: “We are Japanese, so we are different. We have 2,000 years of independent history as an island country. Besides, our language is double-byte!” You may be puzzled, thinking that the United Kingdom has a long history as an independent island country but does not have any cultural issues, or Chinese is a double-byte language but does not give you so many problems. Let’s take a closer look.
Origin of Japanese language
Japanese is spoken by 125 million people. The origin of the language is unclear, but some speculate that it is closely linked with one of the ancient Korean languages. Japanese belongs to the Japonic language family* , although some scholars place it in the Altaic language family, which consists of Turkic, Mongolic, and Tungusic. The separation of Korean and Japanese is believed to have occurred about 10,000 years ago.
Origin of Japanese people
The language separation has a direct connection to geography. Up to the end of the last major ice age, about 10,000 years ago, a land bridge of sorts existed between the Korean peninsula and northwestern Kyushu in Japan. Genetic studies also strongly suggest that the Japanese people originated in northern Asia.
Origin of kanji
The first Japanese writing system “kanji” was brought in from China through Korea around the fourth century. The word “kanji” literally means “sinographs” (Kan being the Japanese pronunciation of Han, the largest ethnic group in China, and ji meaning “graph”). It is said that the origin of the current kanji characters can be traced back to the oracle bone script used in the Shang or Yin dynasty about 3,200 years ago.
Some of the earliest characters were pictograms. Here are two examples of kanji evolution from pictograms; the characters mean elephant and mountain respectively.
Other characters were developed by combining simple characters. For example, “sun” 日 + “moon” 月 became “bright 明, and “mouth” 口 + “bird” 鳥 became “sing” 鳴. The vast majority of kanji, however, were created by combining phonetic and other structural components (logograms ). While Japanese adapted the Chinese phonetic readings of kanji, they also adopted purely Japanese readings, as well. This means there are multiple ways to read individual kanji. For example, there are over 20 ways to read the kanji 生, depending on its use by itself or in various combinations with other kanji.
Hiragana and katakana
Japanese has a second writing system called hiragana. It is a syllabary created in Japan in the Heian era (9th century) based on kanji whose pronunciation was similar to Japanese sounds. For example: 安 ->あ (a), 以 -> い（i）, 宇 -> う (u), 衣 -> え (e), 於 -> お (o). Later, a third writing system, katakana, was developed by extracting one element from similar-sounding kanji characters, as shown below:
Modern Japanese writing systems
The modern hiragana and katakana systems each contain 46 characters, and about 3,000 kanji are typically used in newspapers and magazines. Major Japanese kanji dictionaries contain about 50,000 characters. In addition, the Roman alphabet is also used when writing non-Japanese words and when giving the Romanized form of Japanese words. Japanese can be written horizontally (from left to right) or vertically (from right to left). Usually, there is no space between words.
Characteristics of Japanese language
- Agglutinative: Japanese sentence structure is quite different from that of Indo-European languages.
- There is no space between words.
- Japanese honorifics are complicated, and even many native Japanese speakers find it difficult to use them correctly.
- In Japanese, the subject is often omitted, and unlike the case with some Romance languages, there is no verb conjugation or other clues to identify the absent subject; it must be found by reading the context carefully.
Impact on localization
Having four different writing systems causes various standardization problems, and the mixed use of double- and single-byte characters brings about many technical issues. Before Unicode, it was almost impossible to display letters from the extended Roman alphabet, such as ü, with double-byte Japanese characters. Even today, you still see garbled characters in non-Unicode compliant software and email files that go through various gateways.
Memory recognition problem: Kanji-kana combination
Non-standardized writing systems can cause memory recognition problems in various CAT tools and present potential MT problems. For example, the Japanese term “to-ri-hi-ki” (business transaction) can be written in three different ways: 取引, 取り引き, and 取引き. When this word is registered in a dictionary or a termbase, there are four different ways to describe it phonetically: とりひき, トリヒキ, ﾄﾘﾋｷ, or torihiki. In addition the following three forms as well as the three root forms above can be used for the verb form as a termbase entry: 取引する, 取り引きする, 取引きする.
Memory recognition problem: Date format
Dates present similar problems. “April 28, 2013” can be translated as follows:
平成25年4月28日（平成25年 translates as Heisei 25, the 25th year of the Heisei Imperial Era, which is 2013 on the Gregorian calendar.)
平成２５年４月２８日 (Same as above but the numbers are double-byte.)
平成二十五年四月二十八日 (This form is usually used in the vertical writing system.)
２０１３年４月２８日 (numbers in double-byte)
2013年4月28日(numbers in single-byte)
In Japanese to English translation, finding a CAT tool that can recognize that all the phrases above have the same meaning and that can provide an accurate translation is almost impossible.
Not having a space between words causes term recognition problems in CAT tools and search engines running in Romance or Germanic languages. Since Japanese does not have a shared history with other languages (except Chinese and in some scientific fields), the number of cognates is very limited and the relationship between terms originating in Indo-European culture and those in Japanese is many-to-many , causing problems in concept-based terminology management.
Extra linguistic factors
Japanese have developed a unique set of symbols (such as ※, ▲, and ◎), that must be translated, but are hard to register in a terminology and even require extra effort on behalf of the translator to properly handle. For example, ※ often means “Note” (but the referent is usually not clearly marked in the text), and can also indicate caution, warning, or danger. Left untranslated, these symbols can (and do) lead to significant misunderstandings.
One major cultural issue with Japanese translations is that in Japan, service prices are included in the price of the product. Naturally, Japanese people expect a perfect translation service to come with the product. Their tolerance for mistakes in English-to-Japanese translation is low. (This is not unique to Japanese; some Romance language speakers are also intolerant of flawed translations.) On the other hand, Japanese people tend not to notice mistakes in English translation and do not realize that some of the translation problems originated in non-standardized Japanese. The list goes on.
Tips for successful localization
We have seen various reasons why Japanese is challenging. There are several things you can do to mitigate the problems.
- Get software that can handle Japanese properly
- Study the tools thoroughly for potential double-byte language issues before purchase
- Consult with experts to deal with linguistic, technical, and cultural issues
- Give projects enough lead-time and obtain accurate estimates beforehand (the hourly rate of Japanese translators may be the same as that of other linguists, but the actual translation/editing work may take a lot longer)
- Develop (or obtain from your client) a good style guide and terminology
- Talk to your client and understand their quality expectation and style/terminology preferences
Japanese localization is difficult. Part of the reason lies in Japan’s unique history and language. The best way to deal with this is to acknowledge the fact, plan ahead, and be prepared to invest time and resources.
There are 17 or 18 language families worldwide. English is a member of the Indo-European language family, which includes virtually all European languages other than Finnish, Hungarian, and Basque, as well as Farsi, Hindi, Urdu, and other Indo-Iranian languages. The development of most Indo-European languages took place between 3,500 years to 1,600 years ago, depending on the language.
xíngshēng 形聲 in Chinese.
For example Japanese term薄膜 could be translated as “pellicle”, “membrane”, or “thin film” etc., depending on subject matter and English term “pellicle” could be translated asペリクル、被膜、薄膜…
Sachiyo Demizu earned her Master's in Translation and Interpretation from the Monterey Institute of International Studies and has been working in the industry for more than 20 years. Currently, she works as a technical translator at Nikon Precision Inc. and is actively involved in the selection, testing, and implementation of computer-assisted translation tools and machine translation engines. She also has a Master's of Accountancy from the University of Denver, earned her CPA in the State of Colorado, and is an experienced technical/financial translator.