May 18, 2006

W3C Internationalization Tag Set - Last Call Working Draft

The Last Call Working Draft of the Internationalization Tag Set has been published.

Along with it a new document the Best Practices for XML Internationalization has also been published as a First Working Draft.

I would encourage anyone who develop translation tools that provide support for XML to review the ITS Specification document. Part of the specification addresses some of the XML localizability issues, offering a common way to specify localization information:

  • what should or should not be translated,
  • some terminology identification mechanism,
  • identification of inline codes and sub-flows,
  • localization notes,
  • and more…

You can use Bugzilla to point out issues and provide comments. The review period lasts until June-30 (6 weeks).

April 30, 2006

Google’s Language Tools II

Google has released an Arabic to English and English to Arabic machine translation engine. AFAIK, it uses a statistical approach to build the bilingual corpuses. I tried to cut and paste Arabic texts from Al-Jazeera and translate them into English and they seemed to make a lot of sense.

Obviously, there is a cap on what machine translation can achieve (if linguistics was an exact science, jokes wouldn’t exist), but if the translations are accurate this seems to be a significant leap.

March 21, 2006

Google’s Language Tools

Google was recently given the highest rating in a government test (National Institute of Standards and Technology) of machine translation tools.

Google describes its "automatic translation" as being "…produced automatically by state-of-the-art technology without the intervention of human translators." This seems like a strange word choice, almost as if ‘intervention’ is being used pejoratively.

However, reading further down in the FAQ, I see that Google is careful to offer this disclaimer for possible inaccuracies in the translation: “While many engineers and linguists are working on the problem, it will be some time before anyone can offer a quick and seamless translation experience. In the interim, we hope the service we provide is useful for most purposes.”

Once again, an odd choice of words!

I think it will be some time before the translation experience sans humans is immediate, or in real-time as well as intelligible, but there are certainly varying expectations and definitions when it comes to requiring a “quick and seamless translation experience.”

I would like to think that for what it’s worth, a human intervention can be quite helpful, and what’s more, on those projects that go beyond the translation of a phrase or sentence, some of us pretty darn good at offering our clients a quick and seamless translation experience.

Finally, I leave you with a humorous twist on the classic translation/back-translation every purveyor of human linguistic expertise loves to perform, using Google’s language tools. Yes, it’s been done a million times, but I hope you’ll appreciate the humor of this one.

English:
Sally’s mom is very nice.

Spanish:
La mama de la salida es muy agradable.

Back to English:
The breast of the exit is very pleasant.

(The component that really brings the entire deck of cards down is, of course, the missing accent in “mamá”, which some of the other online MT tools remembered to add.)

February 25, 2006

VMware Server free as in beer

Nowadays, VMware has become an essential tool for ApSIC’s localization and internationalization testing team, effectively phasing out the former arrangements based on disk images.

So now it’s good news that the entry product for VWware server-grade line-up, VMware Server, will be available for free to entice businesses to start deploying server virtualization, with a view that they get some conversions to their currently pricey high-end server virtualization solutions.

I first learn about VMware, back in 1999 when I read a press release about a start-up that had released a product which would allow running Linux and Windows virtual machines on a Linux or Windows host operating system.

At that time we were frequently dual booting Windows NT Workstation and OS/2, so I wrote them to ask if they were planning to add OS/2 as either a guest or a host operating system, and they replied almost instantly saying that due to certain technical particularities, support for OS/2 was not feasible. (Fortunatelly, weeks later we deployed a solution to avoid frequent dual booting with a combination of Windows NT Terminal Server 4.0 and OS/2 client machines.)

We forgot about VMware for a while, and the main driver to revisit VMware in our testing process was a couple years ago, because when we expanded the testing lab with newer machines the only way to test Windows NT was on a virtual machine, due to the lack of drivers for new machines.

Ironically, lack of NT support for newer hardware was a good show-stopper to run into, because VMware has since become instrumental in our testing efforts!

February 9, 2006

Will Web 3.0 Be Localized?

Web 2.0 is defined many ways, but one simple definition is “the era when people have come to realize that it’s not the software that enables the web that matters so much as the services that are delivered over the web.”
The social networking and blogging phenomena appear by my estimation to be dying down and losing their appeal as web business models since everyone has exhausted and copied what you can do with sharing information with your friends and making new friends. The trend seems to be toward creating little online applications using existing client-server technologies or API (Application Programming Interfaces) provided by other toolmakers, like making a site that shows you where all the good beer is in town http://www.beerhunter.ca/ or plotting your jogging routes http://www.walkjogrun.net/ or sharing travel information in a journalistic, collaborative way travbuddy.com through Google’s maps API.

Borrowing from a couple of “Top 10 Web 2.0 Innovations” lists, I have attempted to examine some of the applications/websites in terms of their potential for future localization.

del.icio.us is a social bookmarking tool. Everyone shares favorite URLs with their friends, so why not make a website where you can share your Favorites with the rest of the world? Yahoo! recently purchased it, and although Yahoo! isn’t as localized as heavily as Google, they obviously have an interest in markets beyond their front door. It stands a fairly good chance of becoming a localized application, or perhaps the similar existing Yahoo! application http://myweb2.search.yahoo.com/ will be.

netvibes.com is a personalized page to display your favorite newsfeeds, shopping alerts, weather alerts, etc. It looks to me an awful lot like my customized my.yahoo.com page, only the content is aggregated from any RSS source. I wasn’t especially impressed with it, but it bears mentioning because it is one of the only Web 2.0 apps I could find that has bothered to make an attempt at localizing its interface. If you click on the languages at the bottom of netvibes’ homepage, you will still see most of the content in English, obviously, because the sample aggregation is pulling in English newsfeeds.

flickr.com was cited as a top Web 2.0 app of 2005, even though it has been around for a while longer than just last year. Photosharing is pretty hot stuff with a lot of people, and flickr has an appealing interface. As an aside, I think that online digital photo-sharing is preferred by the hipper younger crowd, and am not sure why companies like HP continue to market photoprinting devices so heavily to this target market. Flickr is another cool web app purchased by Yahoo!, and I look for its interface to be localized in the next year or so. Native English-speaking people aren’t the only ones who take digital photos and share them with each other.

Looking for a light, online word processing tool that you can easily use to collaborate with other users? You have quite a few choices, and I predict that Google or Yahoo! will have an entire office suite of online applications in the not-so-distant future in an attempt to compete with each other as well as anyone else who makes an office suite. In the meantime, there are several online word-processing applications springing up : writely.com, writeboard.com, rallypointhq.com, zohowriter.com. The last one mentions multilingual support, i.e., you can write in your "mother tongue", however, once again, the portal and the interface are English-only.

In a similar vein, a few "online project management" suites have sprung up: basecamphq.com, centraldesktop.com, sidejobtrack.com. Once again, every single one of them has foregone the option of localizing the interface.

Like the multitude of blogging tools that came before them, these apps are used by multilingual users everywhere. It seems obvious to me that companies developing an application to be used by the entire world would pause to consider that most of the online world consists of non-native English speakers.
If you click through blogger.com’s random-blog button, you discover that there are probably more bloggers writing in Portuguese or Spanish. Yet, after more than five years of existence, Blogger still has no localized user interface whatsoever. As an aside, Blogger happens to be owned by Google, which is highly praised for its localized site. However, looking past its search interface and the machine translation tool for websites, Google has failed to localize most of its tools http://www.google.com/intl/en/options/.

Hopefully, the case will be different for the more robust online applications that are popping up everywhere. The decision to localize a web application interface doesn’t involve the level of investment risk it once did. A successful web app localization requires a relatively small investment yet will dramatically increase use and exposure in major non-English language markets.

If the thin-client, online software trend continues (the bulk of the data processing occurs on the server rather than client-side) and isn’t simply a fad of online application gadgetry, then all of our beloved office tools will move completely online. At some point, this will have implications for the translation and localization industry, in terms of how files, translation memories and projects are managed, and how collaboration takes place across the globe on translation projects. Of course we already see this to some degree, but will this increase, or even become the norm? How do you see the technologies of Web 2.0 affecting/being used by the localization industry, if at all? Are we all going to be “Linked In” as one hive, translating mind?

However, the more pressing question as to how this new trend of applications relates to the translation industry is, why are so few of them being localized?

Further reading:
http://www.web2con.com/
http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html
http://www.web20workgroup.com/
http://webservices.sys-con.com/read/172417.htm
http://www.articledashboard.com/Article/Top-10-Innovative-Web-2-0-Applications-of-2005/10891
http://web2.wsj2.com/the_best_web_20_software_of_2005.htm
http://news.zdnet.com/2100-3513_22-6031272.html

February 3, 2006

SDL Trados ‘06 in the pipeline

Following the heels of Trados 7.1 upgrade in late December there seems to be an upcoming release, SDL Trados 2006, some time in March. This release is about bundling the Trados Workbench and SDLX products, making them available together and according to proz TGB, at a price level comparable to each of the former CAT tools separately. The fact that SDLX does not appear in the name of the product seems to suggest that the SDLX brand will be the one that eventually will phase out.

So it seems that finally SDLX and Trados will be soon merging functionalities more seriously, but at this pace of releases, I wonder if we are in front of a shift to a rent-a-CAT model, with upgrades every few months, almost a subscription!

Regarding the tools themselves, what I liked from SDLX was how it simplified things for the translator and what I liked from Trados was the scalability (in terms of response time) of their translation memory engine. What I disliked from SDLX was the little annoyances here and there of its features, which IMHO needed more testing before release and, from Trados, the feature-stagnation of the translator’s front-end. I’m anxious to see the outcome from this upcoming merger of tools. I’m also anxious to learn about the few CAT alternatives that are said to be in the pipeline, and that should hit the market over the next few months.

December 3, 2005

Look it later

From time to time I google for new tools, ideas, and news on the localization business. When I work remotely, I normally VPN and use Remote Desktop from several diferent desktops to the office. I tend to avoid carrying a laptop if I can. However, I always surf from the remote machine because the surfing experience locally is much better than with Remote Desktop.

One issue that I have with this research surfing, is that I tend to lose some of the interesting links I find because I don’t have a stable Favorites place anywhere, which means that I have to rely on my own memory to google it afterwards (most common method, but getting less and less reliable as we speak) or I send myself a reminder mail, but it has to be something really important to make go through all the hassle of sending a mail to myself.

Well, one Web-based tool has just come to rescue: LookLater. It is very new, just a few days old indeed, and it has a quite neat and simple interface a la Google. It’s very easy and lightweight to install (for example, in Firefox, you only have to drop a link to the link bar and you are all set in one second). I wouldn’t be surprised if the big guys eventually launched versions of this cool idea linked with their ad services.

I find Looklater also especially useful for translators. Translation in general and technical translation in particular are extremely knowledge-intensive, much more that most people seem to believe. After all, nobody can translate what he o she does not know what it is (that applies also to machine translation), and that means that the professional translator must often do an intensive background research work to get it right.

I think that translators can use this type of tool to quickly bookmark links, pages, and snippets, when they are looking for information or bilingual corpuses on the subject matter of their current job, and after the research, just do a quick recap on the saved links to qualify and organize them.

November 22, 2005

W3C Internationalization Tag Set - First Working Draft

ITS (http://www.w3.org/TR/its/) is a set of elements and attributes that supports the internationalization and localization of schemas and XML documents. This first draft addresses the following type of information (called data categories in the document):

  • translatibility
  • localization information
  • terminology
  • directionality
  • and Ruby text

For example, ITS provides attributes to identify within your XML document parts that should not be translated, or words/phrases that should be treated as “terms”, as shown below:

<para>
And he said:
You need a new <span its:translate='no'
its:term='yes'>motherboard</span>.
</para>

Each data category can be used in schemas, in-situ (within the content), or dislocated (defined somewhere else than where the corresponding content is located). XPath is used to provide all the scoping mechanism.

I think it is important for the localization and translation tools vendors who are not part of the ITS working group to provide feedback on this draft, so the final version of ITS can be well-suited for their applications. You can send your comments to www-i18n-comments@w3.org. Use Comment on its tagset WD in the subject line of your email. The comments archives are publicly available.

November 16, 2005

Windows Workflow Foundation in the works

I recently learnt about the release of a beta of Windows Workflow Foundation, a .NET library which is currently in the works both for Windows client and server operating systems.

Among other more sophisticate possibilities, workflow can be sequential, which to a large extent means that the computer is in charge and outsources some tasks to human beings, or state-oriented, which means that the human beings decide the actual transitions from state to state according to their own judgment and the role of the computer is basically to limit the acceptable transitions for each state and track the status. But this is probably easy to say but awkward to code in a model that gives the level of flexibility that LSPs need. Hopefully this library in the pipeline comes to help to make it simpler for developers.

The RTM (Release-to-Manufacturing) version of this framework is still a few months away, but I think that its mere existence increases significantly the chances that both Tool Vendors consider workflow in their product roadmaps, and other existing frameworks (or future copycat frameworks) empower developers so that they can add more features to their tools.

November 14, 2005

It’s all about matching requirements

In the localization tools market there are a number of organizations which deliver various localization and translation tools.

They all focus on specific areas of localization which in general has been driven by the localization needs of clients at the time they were designed.

As soon as documentation translations became a requirement the tools to enable translations of this type of material started surfacing.

In general the same applied to software localization tools and localization project management applications which have been designed and released following the huge demand to be able to control the localization process in better way.

Based on your requirements you will find that the commercial available tools will fit a number of more or less independent groups.

o Tools which will handle your Software localization requirements.

o Other tools which will take care of your Documentation localization requirements.

o Tools which will handle the project management side of your localization projects.

o Tools which will limit the translation cost by reducing documentation source material, and the typical single source
publishing tools which are available in a number of different flavors.

o And the tools which will focus on Machine Translation.

Depending on the organizational structure and translation requirements your will find that in most of the cases the difference in translation requests can’t be covered with just one tool.

The functionality and file type support is not matching your requirements, so it will become clear that if you decide to purchase you’ll need at least two or more tools to handle your projects.

As a result a number of large corporations decide to figure out if there would be a possibility to design and develop localization software internally and fill the caps between the commercial tools available on the market, and the requirements they had internally.

Many organizations have followed this path and have build tools varying from simple batch files on one side to highly complex and fully integrated localization solutions on the other side.

But what are the drawbacks of that approach ?

Wouldn’t it be much easier to discuss your requirements in the BLOG and let us (Tool Vendors) know all about it ?

We may be able to deliver the Solution……..

« Previous Page Next Page »