XML and single source publishing
Rapid globalization has had a profound effect on the documentation requirements of many forward-thinking companies. The need to quickly and accurately localize content for distribution to a host of different languages, while at the same time adhering to strict budgetary requirements, means that many companies have to rethink their legacy documentation technologies and workflows.
XML and single source publishing have revolutionized content management, document exchange, and multilingual communications by separating content structure from appearance. An XML-based documentation system can greatly reduce costs through facilitating ease of conversion for delivery to many different data formats and types of applications. However, in order to take advantage of the full benefits XML provides, changes must be made in the traditional documentation workflow process.
Throughout the documentation workflow, checks and balances are underway to ensure high quality content delivery. The single source concept ensures that these processes (i.e. conversion, edits, etc.) do not have to be repeated or reworked –that all content in the repository requires only minimal restructuring and promotion before being loaded to respective applications for delivery.
In a global setting, where documentation needs to be simultaneously distributed to a variety of different languages, archival XML source documents can easily be translated by applying translation scripts as well as rendering scripts that can localize formatting attributes based on language-specific requirements. In addition, translation and maintenance costs can be significantly reduced by normalizing content for an international market.
XML has long been lauded by the publishing industry as a cost-cutting solution to many process-related issues in content production and delivery. Having content converted to XML allows for enhancements in content organization, indexing, linking, storage, reuse and delivery/display. But just having content converted to XML does not allow it to reach its full benefit. XML and its associated technologies call for redesigned workflows to demonstrate their enormous potential.
An optimized workflow for content publishers requires minimal process repetition. Once content is delivered, it is edited and converted to XML and stored in a centralized single source repository within the content management architecture. The XML files themselves will be minimally defined (tagged) so as to allow maximum flexibility. This repository now becomes the core storage mechanism for all deliverable content.
It is on the delivery side that this process model demonstrates its primary benefits. Storing content in the single source repository transforms exporting the content to different formats and applications for delivery into a primarily automated process. There is no need for additional conversions or edits each time content is to be delivered to a different medium. In addition, any complications that arise will now be instantly recognized as process-oriented rather than data-oriented.
Legacy translation memory databases attempt to modularize content by segmenting source and translated text and storing it in a searchable database for reuse. Though these partially automated systems have been proven to reduce costs when compared to manual processes, the fact that translation is generally done at the sentence level means that is often taken out of context and therefore often loses its meaning.
XML documents, on the other hand, are inherently modular and do not require the extensive parsing applied by traditional translation memory systems. In addition, XML assets can easily be encoded (with metadata, for example) and tracked throughout the translation process, ensuring that it remains closely associated with the contextual information often required by translators.
An XML-enabled single source publishing model is designed to leverage content reuse, enabling organizations to save significant time and money through reducing or even eliminating repeated translations. XML gives publishers the ability to conceptually segment content assets for translation purposes, while at the same time keeping them closely tied to context.
In the case of document frameworks, such as technical publishing, where text is often repeated in many different places, the ability to consolidate resources offers potentially enormous savings in translation costs alone.
A growing number of emerging standards are designed to aid in the localization of document frameworks. Methodologies for translation workflows and document exchange are designed to streamline content management architectures for multilingual environments. These include:
- Translation Memory eXchange (TMX) – a vendor-neutral XML standard for the exchange of translation memory data between tools and/or translation vendors
- Term Base eXchange (TBX) – an open XML-based standard for exchanging structured terminological data
- Open Lexicon Interchange Format (OLIF) – an open, XML-compliant standard for the exchange of terminological and lexical data
- XML Localization Interchange File Format (XLIFF) – an XML-based vocabulary for the exchange of localizable software and document-based objects and related metadata (XLIFF is also represented in the DITA Translation Subcommittee)
- Translation Web Services (TransWS) – specifies the calls needed to use Web services for the submission and retrieval of files and messages relating to localization projects
- XML-based Text Memory (xml:tm) – an open XML standard for embedding text memory directly within an XML document using XML namespace syntax
The extensible nature of XML lends itself to the creation of a wide variety of industry specifications, many of which enable businesses to streamline business processes and improving communication.
In today’s global marketplace, organizations are often challenged with having to produce content in a variety of different languages. In a traditional documentation workflow model, this is an extremely arduous process. Legacy publishing software such as Quark, PageMaker, FrameMaker, etc. require expensive and resource-intensive desktop publishing and engineering processes for repurposing. In addition, these page layout applications are generally not well suited for delivery to multiple output formats.
XML is inherently extensible, offering an infinite number of ways to define and structure markup. This flexibility also enables it to handle arbitrary data structures and convey information for both human users and machines for processing. In addition, XML also provides broad support for Unicode characters, enabling the automation of text normalization processes and making it natively accessible to multilingual environments.
An XML documentation framework offers significant productivity enhancements to the localization workflow. The separation of content from structure and appearance that is inherent to XML gives companies the ability to translate text while at the same time maintaining the document structure dictated by an XML Schema or DTD. Any additional formatting that is required can also be simultaneously implemented through the application of XSLT and/or XSL:FO stylesheets.
Altova® Tools for XML-based Single Source Publishing in a Global Environment
There are, of course, several different methods for internationalizing content with XML technologies. Included below are just a few of examples of how Altova tools can be used to streamline global publishing workflows.
Multiple Output Formats
StyleVision® is a graphical stylesheet design tool that enables users to create one design for simultaneous output to HTML, RTF, PDF, Word 2007 (OOXML), and AuthenticÂ® electronic forms.
StyleVision also supports user-defined parameters that allow designers to maintain the modularity of their XML assets through the application of variables. This enables publishers to add unlimited new languages to their documentation by importing language-specific stylesheets and leaving XML content untouched.
Of course, this approach to multi-lingual publishing can lead to the creation of an enormous number of stylesheets that are increasingly difficult to maintain.
SchemaAgent®, Altova’s XML-based file management system offers advanced support for managing XSLT (as well as XML Schema and WSDL) document relationships in a large publishing environment.
XSL Lang() Function
StyleVision also supports the XSL lang() function, which pulls the correct translation from XML source document(s) based on the xml:lang attribute. In this scenario, the translations could be stored together in one XML instance as specified in the xml:tm standard, or stored separately in language-specific directories.
WYSIWYG Authoring Tool
Authentic gives content contributors the opportunity to edit XML directly through e-Forms based on the stylesheet design created in StyleVision. Authentic is available through a free license so that it can be deployed to an unlimited amount of users without increasing costs. This enables translators to work directly with XML, rather than having it transposed at a later date for publishing.
Authentic also includes a multi-lingual spell-checker that references built-in dictionaries in 18 different languages and vocabularies, allowing writers and translators to ensure the accuracy of their work.
Single source publishing calls for the creation of a centralized store of content that can be accessed, reused, and deployed to a variety of different mediums. This enables the integrity of the content to be maintained throughout an infinite number of iterations. In a large documentation localization pool, the ability to adapt to different language and formatting requirements provides significant business advantages.
There are several different approaches to maintaining single source content for a global audience. A careful and informed approach to preparing and storing content assets can ensure a variety of benefits including increased quality and consistency, reduction of translation costs, and increased longevity of translation investments.
In addition, the XML-enabled single source publishing model facilitates document repurposing for delivery to a variety of different formats, making it accessible to end-users in HTML, RTF, PDF, Word 2007 (OOXML), etc. Incorporating this system within organizations documentation workflow processes enables the presentation accurate, consistent, and standardized information. XSL transformations apply format-specific processing instructions while ensuring that document content and structure remain intact.
Migrating content to XML-based single source publication workflows requires some initial planning and technology investment, but the rewards are numerous. Cost reductions in translation and type-setting, faster time-to-market, and the ability to adapt to new language and data structures requirements in the future make the relatively small investment worthwhile.
Discover how single source publishing can optimize your global documentation workflows. Download a free 30-day trial of StyleVision today!
*Please note that StyleVision and the other products mentioned above are available as part of Altova’s software bundle, MissionKit™, which offers XML and data management tools for distributed publishing environments. Click here for more information on the MissionKit.
About Altova: Founded in 1992, Altova is a commercial software development company with headquarters in Beverly, MA, USA and Vienna, Austria that produces integrated XML, database, UML, and data management software development tools.