16 May 2006

RTL is the writing system known as right-to-left (RTL) in which script runs from the right-hand side of a page and concludes on the left-hand side, such as in Arabic, Hebrew and Urdu. As business development manager of a fast-growing regional company working in Middle Eastern languages on a daily basis, I will endeavor to give readers some basic information about these types of languages.

The below map shows locations where the Arabic alphabet is used.

Most of these languages like Farsi and Urdu are based on Arabic with additional letters, so for the purposes of this article I will concentrate on Arabic only, to show where it differs from left-to-right (LTR) languages.

The Arabic alphabet is composed of 28 basic letters and the script is cursive; most primary letters have conditional forms for their glyphs, depending on where they appear in a word, whether at the beginning, middle or end, so they may exhibit 4 distinct forms (initial, medial, final or isolated). Six letters, however, have an isolated or final form, and if they are followed by another they are not joined to it, and thus this next letter will appear in its initial or isolated form, despite the fact that it is not an initial.

For compatibility with previous standards, Unicode encodes all of these forms separately:

What is Unicode?

Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.

Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers.

These encoding systems also conflict with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) needs to support many different encodings; yet whenever data is passed between different encodings or platforms, that data always runs the risk of corruption.

What is bidirectional?

While the script in Arabic runs from right-to-left, conversely the digits or numerals run from left to right, thus creating a bidirectional language.


There are two kinds of numerals used in Arabic writing: Hindi digits, the standard, and Farsi or "East Arab" digits, used in Iran, Pakistan and India.

How this affects DTP

Some types of desktop software do not support RTL languages, for example Adobe Frame Maker, which is still quite a common application for DTP,. This means a source FrameMaker file can be kept for reference only as it is useless for the target layout. Thus an RTL localization provider will need to recreate the layout from scratch in another application that supports RTL languages, which has serious implications for production time. Also, as it will have to be recreated again, additional QA cycle is required, which ultimately impacts the pricing of RTL languages. In addition, deadlines will differ from those for LTR languages; for example, a standard RTL DTP output production rate is 20-30 pages per day per single source, while LTR DTP production output is 60-100 pages per day. This naturally affects the overall localization production cycle, and the same applies to the QA, and you can imagine all the extra time taken for fixing and checking!!

The best answer if you are looking to localize DTP materials into RTL languages is to take all the abovementioned points about production into account. To save time and money and retain a high quality, you should create your materials in an application that will support RTL languages; for example Adobe InDesign, which is the most popular for RTL layouts in its ME version. In closing, I hope that you find this information helpful if you are planning to localize into RTL languages.

Mohamed Ali is a founder partner of eLocalize, which is based in Cairo, Egypt. He can be reached at [email protected].