April 3, 2007

The Internationalization Tag Set 1.0 - A W3C Recommendation

The Internationalization Tag Set (ITS) version 1.0 has just been published as a W3C Recommendation. You can read the W3C press release.

ITS is a set of attributes and elements designed to help in the internationalization and localization of XML material. It can be used externally to the documents (a bit like the “DTD Settings” file in Trados), and it can be integrated within the XML documents themselves.

The ITS 1.0 Specification can be found here: www.w3.org/TR/its

You can find extensive examples, descriptions and links to implementations of ITS on the ITS Working Group page: www.w3.org/International/its.

The next step for the Working Group is the publishing of the “Best Practices for XML Internationalization” which is currently a Working Draft.

February 7, 2007

Very hard to find Trados training in China

We are a Singapore software internationalization and localization provider with one of the branches in China. Currently we need the Trados training to our staff. However, we searched the webs but couldn’t find such training courses locally..

-Stefan
ESS Software (www.essware.com)

April 30, 2006

Google’s Language Tools II

Google has released an Arabic to English and English to Arabic machine translation engine. AFAIK, it uses a statistical approach to build the bilingual corpuses. I tried to cut and paste Arabic texts from Al-Jazeera and translate them into English and they seemed to make a lot of sense.

Obviously, there is a cap on what machine translation can achieve (if linguistics was an exact science, jokes wouldn’t exist), but if the translations are accurate this seems to be a significant leap.

March 21, 2006

Google’s Language Tools

Google was recently given the highest rating in a government test (National Institute of Standards and Technology) of machine translation tools.

Google describes its "automatic translation" as being "…produced automatically by state-of-the-art technology without the intervention of human translators." This seems like a strange word choice, almost as if ‘intervention’ is being used pejoratively.

However, reading further down in the FAQ, I see that Google is careful to offer this disclaimer for possible inaccuracies in the translation: “While many engineers and linguists are working on the problem, it will be some time before anyone can offer a quick and seamless translation experience. In the interim, we hope the service we provide is useful for most purposes.”

Once again, an odd choice of words!

I think it will be some time before the translation experience sans humans is immediate, or in real-time as well as intelligible, but there are certainly varying expectations and definitions when it comes to requiring a “quick and seamless translation experience.”

I would like to think that for what it’s worth, a human intervention can be quite helpful, and what’s more, on those projects that go beyond the translation of a phrase or sentence, some of us pretty darn good at offering our clients a quick and seamless translation experience.

Finally, I leave you with a humorous twist on the classic translation/back-translation every purveyor of human linguistic expertise loves to perform, using Google’s language tools. Yes, it’s been done a million times, but I hope you’ll appreciate the humor of this one.

English:
Sally’s mom is very nice.

Spanish:
La mama de la salida es muy agradable.

Back to English:
The breast of the exit is very pleasant.

(The component that really brings the entire deck of cards down is, of course, the missing accent in “mamá”, which some of the other online MT tools remembered to add.)

November 22, 2005

W3C Internationalization Tag Set - First Working Draft

ITS (http://www.w3.org/TR/its/) is a set of elements and attributes that supports the internationalization and localization of schemas and XML documents. This first draft addresses the following type of information (called data categories in the document):

  • translatibility
  • localization information
  • terminology
  • directionality
  • and Ruby text

For example, ITS provides attributes to identify within your XML document parts that should not be translated, or words/phrases that should be treated as “terms”, as shown below:

<para>
And he said:
You need a new <span its:translate='no'
its:term='yes'>motherboard</span>.
</para>

Each data category can be used in schemas, in-situ (within the content), or dislocated (defined somewhere else than where the corresponding content is located). XPath is used to provide all the scoping mechanism.

I think it is important for the localization and translation tools vendors who are not part of the ITS working group to provide feedback on this draft, so the final version of ITS can be well-suited for their applications. You can send your comments to www-i18n-comments@w3.org. Use Comment on its tagset WD in the subject line of your email. The comments archives are publicly available.