July 9, 2008

XML and single source publishing

Rapid globalization has had a profound effect on the documentation requirements of many forward-thinking companies. The need to quickly and accurately localize content for distribution to a host of different languages, while at the same time adhering to strict budgetary requirements, means that many companies have to rethink their legacy documentation technologies and workflows.

XML and single source publishing have revolutionized content management, document exchange, and multilingual communications by separating content structure from appearance. An XML-based documentation system can greatly reduce costs through facilitating ease of conversion for delivery to many different data formats and types of applications. However, in order to take advantage of the full benefits XML provides, changes must be made in the traditional documentation workflow process.

Throughout the documentation workflow, checks and balances are underway to ensure high quality content delivery. The single source concept ensures that these processes (i.e. conversion, edits, etc.) do not have to be repeated or reworked – that all content in the repository requires only minimal restructuring and promotion before being loaded to respective applications for delivery.

In a global setting, where documentation needs to be simultaneously distributed to a variety of different languages, archival XML source documents can easily be translated by applying translation scripts as well as rendering scripts that can localize formatting attributes based on language-specific requirements. In addition, translation and maintenance costs can be significantly reduced by normalizing content for an international market.

XML Documentation

XML has long been lauded by the publishing industry as a cost-cutting solution to many process-related issues in content production and delivery. Having content converted to XML allows for enhancements in content organization, indexing, linking, storage, reuse and delivery/display. But just having content converted to XML does not allow it to reach its full benefit. XML and its associated technologies call for redesigned workflows to demonstrate their enormous potential.

An optimized workflow for content publishers requires minimal process repetition. Once content is delivered, it is edited and converted to XML and stored in a centralized single source repository within the content management architecture. The XML files themselves will be minimally defined (tagged) so as to allow maximum flexibility. This repository now becomes the core storage mechanism for all deliverable content.

It is on the delivery side that this process model demonstrates its primary benefits. Storing content in the single source repository transforms exporting the content to different formats and applications for delivery into a primarily automated process. There is no need for additional conversions or edits each time content is to be delivered to a different medium. In addition, any complications that arise will now be instantly recognized as process-oriented rather than data-oriented.
Translation

Legacy translation memory databases attempt to modularize content by segmenting source and translated text and storing it in a searchable database for reuse. Though these partially automated systems have been proven to reduce costs when compared to manual processes, the fact that translation is generally done at the sentence level means that is often taken out of context and therefore often loses its meaning.

XML documents, on the other hand, are inherently modular and do not require the extensive parsing applied by traditional translation memory systems. In addition, XML assets can easily be encoded (with metadata, for example) and tracked throughout the translation process, ensuring that it remains closely associated with the contextual information often required by translators.

An XML-enabled single source publishing model is designed to leverage content reuse, enabling organizations to save significant time and money through reducing or even eliminating repeated translations. XML gives publishers the ability to conceptually segment content assets for translation purposes, while at the same time keeping them closely tied to context.

In the case of document frameworks, such as technical publishing, where text is often repeated in many different places, the ability to consolidate resources offers potentially enormous savings in translation costs alone.

Standards

A growing number of emerging standards are designed to aid in the localization of document frameworks. Methodologies for translation workflows and document exchange are designed to streamline content management architectures for multilingual environments. These include:

• Translation Memory eXchange (TMX) – a vendor-neutral XML standard for the exchange of translation memory data between tools and/or translation vendors
• Term Base eXchange (TBX) – an open XML-based standard for exchanging structured terminological data
• Open Lexicon Interchange Format (OLIF) – an open, XML-compliant standard for the exchange of terminological and lexical data
• XML Localization Interchange File Format (XLIFF) – an XML-based vocabulary for the exchange of localizable software and document-based objects and related metadata (XLIFF is also represented in the DITA Translation Subcommittee)
• Translation Web Services (TransWS) – specifies the calls needed to use Web services for the submission and retrieval of files and messages relating to localization projects
• XML-based Text Memory (xml:tm) – an open XML standard for embedding text memory directly within an XML document using XML namespace syntax

The extensible nature of XML lends itself to the creation of a wide variety of industry specifications, many of which enable businesses to streamline business processes and improving communication.

Formatting

In today’s global marketplace, organizations are often challenged with having to produce content in a variety of different languages. In a traditional documentation workflow model, this is an extremely arduous process. Legacy publishing software such as Quark, PageMaker, FrameMaker, etc. require expensive and resource-intensive desktop publishing and engineering processes for repurposing. In addition, these page layout applications are generally not well suited for delivery to multiple output formats.

XML is inherently extensible, offering an infinite number of ways to define and structure markup. This flexibility also enables it to handle arbitrary data structures and convey information for both human users and machines for processing. In addition, XML also provides broad support for Unicode characters, enabling the automation of text normalization processes and making it natively accessible to multilingual environments.

An XML documentation framework offers significant productivity enhancements to the localization workflow. The separation of content from structure and appearance that is inherent to XML gives companies the ability to translate text while at the same time maintaining the document structure dictated by an XML Schema or DTD. Any additional formatting that is required can also be simultaneously implemented through the application of XSLT and/or XSL:FO stylesheets.

Altova® Tools for XML-based Single Source Publishing in a Global Environment

There are, of course, several different methods for internationalizing content with XML technologies. Included below are just a few of examples of how Altova tools can be used to streamline global publishing workflows.

Multiple Output Formats

StyleVision® is a graphical stylesheet design tool that enables users to create one design for simultaneous output to HTML, RTF, PDF, Word 2007 (OOXML), and Authentic® electronic forms.

Language-specific Stylesheets

StyleVision also supports user-defined parameters that allow designers to maintain the modularity of their XML assets through the application of variables. This enables publishers to add unlimited new languages to their documentation by importing language-specific stylesheets and leaving XML content untouched.

Of course, this approach to multi-lingual publishing can lead to the creation of an enormous number of stylesheets that are increasingly difficult to maintain.

SchemaAgent®, Altova’s XML-based file management system offers advanced support for managing XSLT (as well as XML Schema and WSDL) document relationships in a large publishing environment.

XSL Lang() Function

StyleVision also supports the XSL lang() function, which pulls the correct translation from XML source document(s) based on the xml:lang attribute. In this scenario, the translations could be stored together in one XML instance as specified in the xml:tm standard, or stored separately in language-specific directories.

WYSIWYG Authoring Tool

Authentic gives content contributors the opportunity to edit XML directly through e-Forms based on the stylesheet design created in StyleVision. Authentic is available through a free license so that it can be deployed to an unlimited amount of users without increasing costs. This enables translators to work directly with XML, rather than having it transposed at a later date for publishing.

Authentic also includes a multi-lingual spell-checker that references built-in dictionaries in 18 different languages and vocabularies, allowing writers and translators to ensure the accuracy of their work.

Conclusion

Single source publishing calls for the creation of a centralized store of content that can be accessed, reused, and deployed to a variety of different mediums. This enables the integrity of the content to be maintained throughout an infinite number of iterations. In a large documentation localization pool, the ability to adapt to different language and formatting requirements provides significant business advantages.

There are several different approaches to maintaining single source content for a global audience. A careful and informed approach to preparing and storing content assets can ensure a variety of benefits including increased quality and consistency, reduction of translation costs, and increased longevity of translation investments.

In addition, the XML-enabled single source publishing model facilitates document repurposing for delivery to a variety of different formats, making it accessible to end-users in HTML, RTF, PDF, Word 2007 (OOXML), etc. Incorporating this system within organizations documentation workflow processes enables the presentation accurate, consistent, and standardized information. XSL transformations apply format-specific processing instructions while ensuring that document content and structure remain intact.

Migrating content to XML-based single source publication workflows requires some initial planning and technology investment, but the rewards are numerous. Cost reductions in translation and type-setting, faster time-to-market, and the ability to adapt to new language and data structures requirements in the future make the relatively small investment worthwhile.

Discover how single source publishing can optimize your global documentation workflows. Download a free 30-day trial of StyleVision today!

*Please note that StyleVision and the other products mentioned above are available as part of Altova’s software bundle, MissionKit™, which offers XML and data management tools for distributed publishing environments. Click here for more information on the MissionKit.

May 12, 2008

Internationalization WebSeminar June 12th

Internationalization Webseminar Announcement

Is your software global-ready?

- Can it be efficiently translated into multiple languages?
- Can it support Asian languages?
- Can it work in multiple date/time formats or handle address, phone number and other information that will vary worldwide?

Not sure? This interactive two-hour online course is definitely right for you then. Join us for a live WebSeminar and learn how to make internationalization - the process of adapting source code to support worldwide locale requirements - a smooth effort and avoid iterative, pernicious, and expensive delays to global releases and revenue.

WebSeminar: Global-Ready Applications / Programming for the World

Please email webseminars@lingoport.com for information on a GALA discount.

May 5, 2008

Are test translations a waste of time?

GALAxy recently published an article about test translations and their potential (or lack thereof) of providing a true test of the quality of a company’s work. What do you think about test translations?

Read the full article at this link: http://www.gala-global.org/GALAxy-article-why_sample_translations_break_all_the_rules-8668.html.

February 13, 2008

2008 GALA Webinars Series

GALA Technology webinars have provided a platform for tool providers to give an in-depth introduction to their tools. Many of you will have attended a GALA webinar and have contributed to their success over the last few years. These have been non sales events where you got a good overview of the tool from the people who built it. This year we have decided to extend this program. We now have expert presentations as well as the presentations from tool providers.

GALA will shortly be announcing the 2008 series of webinars. Among the tool providers who have already committed to presentations are across, Alchemy, Plunet, Beetext, AIT and Kilgray. The expert presentations will start in March with a presentations from Richard Sikes called ‘Global Customer Satisfaction – Quality at the Source’. We will also have Yves Savourel from Enlasso on ‘The Internationalization Tag Set (ITS)’. XLIFF has recently become an official OASIS standard and I will be giving a presentation on it called an ‘Introduction to XLIFF’. Balázs Kis from Kilgray will be talking about ‘Term Extraction Algorithms’. Adam Aasnes from Lingoport has confirmed that he will be giving a presentation and towards the end of the year we will have David Pooley from SDL will give us an ‘Introduction to TMX’.

These are just some of the presentations which have been lined up so far. We are very interested in seeing your ideas for improving this series. If you want to suggest an expert presentation or are a tool provider who would like to take part in this please contact Amy who will work with you on this.

I hope you enjoy this series of presentations.

May 18, 2007

Why is WYSIWYG important for software localization?

If you are new to software localization and visit the web sites of software tool vendors, they will tell you that What-You-See-Is-What-You-Get (WYSIWYG) is an extremely important feature. We all know it is important for desktop publishing. WYSIWYG editing eliminates the need to print a flyer again and again to see how changes look. But why is WYSIWYG important to software localization? [Read More on The Localization Tool…]

April 3, 2007

The Internationalization Tag Set 1.0 - A W3C Recommendation

The Internationalization Tag Set (ITS) version 1.0 has just been published as a W3C Recommendation. You can read the W3C press release.

ITS is a set of attributes and elements designed to help in the internationalization and localization of XML material. It can be used externally to the documents (a bit like the “DTD Settings” file in Trados), and it can be integrated within the XML documents themselves.

The ITS 1.0 Specification can be found here: www.w3.org/TR/its

You can find extensive examples, descriptions and links to implementations of ITS on the ITS Working Group page: www.w3.org/International/its.

The next step for the Working Group is the publishing of the “Best Practices for XML Internationalization” which is currently a Working Draft.

March 8, 2007

Vista glossaries now available at MSDN

Microsoft posted earlier this week an update to their software glossaries, and that update includes the long awaited Vista glossaries. Many thanks to Nick Rosenthal for letting me know, it was a great news this week. If you are an MSDN Subscriber, you will find the steps to download them in Nick’s blog.

This is very good news for the localization industry, and I hope that eventually Microsoft decides to make them available again to the general public. There are many freelance translator’s out there for whom a subscription to MSDN is not really an option that makes sense financially speaking, and freelance translators definitively can play a role in getting IT content translated consistently with the platform if they have the right tools.

February 9, 2007

Microsoft Glossaries: Cancel or Allow?

Many years ago, Microsoft decided to make available the software glossaries for all their products in MSDN and as a free download. This was shocking at first, as it was lot of IP and potentially, trade secrets.

But the reasoning behind was clear, if Microsoft wanted to be “the platform” globally they had to open up to the entire ecosystem also globally. Steve Ballmer made it very clear that it was not about users, but developers. And Developers, and by extension, localizers, need to have useful and full access to the platform information to build a true ecosystem.

We developed ApSIC Xbench, a free download, with a view to provide a convenient access to bilingual information, and that included support for the Microsoft Software Glossaries in its .csv form.

Late last year, Microsoft decided to pull out the software glossaries and replace it by a Master Glossary. The announcement mentioned that former software glossaries would continue to be available to MSDN subscribers. But it did not mean that they are available. We at ApSIC are subscribers of MSDN Universal Edition and the latest software glossaries are as of July 2006. This means: no Vista glossaries, no Office 2007 glossaries, no Exchange 2007 glossaries.

I think that with decision is moving away from the role of being the platform. It is simply more difficult and expensive for Microsoft hardware and software partners to integrate globally and seamlessly with the platform.

Hey, what about the master glossary? IMHO, a master glossary helps you to translate something that ’smells’ like the platform, but professional translation requires access to the exact strings in any relevant product to provide international users with a true high quality experience.

And there is a fundamental problem with a master glossary: there is no single market force to push to make it right. No end user wants a master glossary. End users want software that they can understand well. The software strings should become the master glossary, because localized software “wants” to be clear and accurate if it has to sell well. Actually, I would symptom it as a big problem if software glossaries cannot become the real master glossary after the product has shipped.

At least now we have Mac OS X glossaries (which we now support in ApSIC Xbench 2.7), which seems to follow as a publicly extensive available reference.

I don’t discard that Microsoft sees publishing software glossaries as a security threat (hence the title of this post). I hope they reconsider the value they bring to the entire ecosystem by continuing to be a platform and that we see the glossaries available publicly again in the future.

February 7, 2007

Very hard to find Trados training in China

We are a Singapore software internationalization and localization provider with one of the branches in China. Currently we need the Trados training to our staff. However, we searched the webs but couldn’t find such training courses locally..

-Stefan
ESS Software (www.essware.com)

November 30, 2006

GALA Members Search Engine

I just learnt about a relatively new feature of Google: the ability to define your own search engine. After some tries in our own website, I found it was interesting for limiting searches to some content we planned to add (namely our tools documentation) but the requirement to add advertisements was a turn-off. I definitively do not want random ads in our website.

I looked into the Custom Search Engine settings and then I saw that non-profits are allowed to have no advertisements. Then suddenly an idea came thru. How about if there was a search engine whose search results included only content in the websites of GALA members?

It turned out that implementing it was simpler than I though, and here is the result:

gala logo

The custom search engine could be added somewhere in the GALA website (and maintained as new members join GALA). Please note that the search engine in this blog post only lists websites for members as of the time of this writing.

I hope you like the idea of a GALA syndicated search.

Next Page »