E.g., 06/19/2019
E.g., 06/19/2019

XML:TM 1.0

XML Text Memory (xml:tm) 1.0 Specification

26 February 2007

[2011-07-05:  This document is reproduced with permission of the Localization Industry Standards Association. All emendations to the original text are indicated by striking through the original text and inserting new text in red bold face in square brackets.]

[Note: This document introduces no technical changes to the xml:tm Standard with respect to the versions published on the LISA Website. All changes relate to locations of files on the Internet and to copyright licences.]

Editors:

Andrzej Zydroń <[email protected]>

Rodolfo M. Raya <[email protected]>

Bartosz Bogacki <[email protected]>

[[email protected]> (for post-LISA porting tasks only)]

[The following changes to the copyright and license agreement are made with the permission of the LISA Board and Director as part of dissolution proceedings for the Localization Industry Standards Association.

This document is made available by the Localization Industry Standards Association [LISA] under the terms of the Creative Commons Attribution 3.0 Unported (CC BY 3.0) license:

License Summary

Visit http://creativecommons.org/licenses/by/3.0/ for full license details.


You are free:

  • to Share:  to copy, distribute and transmit the work
  • to Remix:  to adapt the work
  • to make commercial use of the work

Under the following conditions:

  • Attribution:  You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

With the understanding that:

  • Waiver:  Any of the above conditions can be waived if you get permission from the copyright holder.
  • Public Domain:  Where the work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
  • Other Rights:  In no way are any of the following rights affected by the license:
    • Your fair dealing or fair use rights, or other applicable copyright exceptions and limitations;
    • The author's moral rights;
    • Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights.

    Copyright © The Localization Industry Standards Association [LISA] 2007. All Rights Reserved.

    This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to LISA.

    The limited permissions granted above are perpetual and will not be revoked by LISA or its successors or assigns.

    [All inquiries concerning rights not addressed in the emended license agreement above should be sent to Mr. Micheal Anobile at [email protected].]


    This Specification has been created by XML-INTL Ltd. and donated to the LISA OSCAR standards committee for consideration as a LISA OSCAR standard for the benefit of those individuals and organizations who are involved in the translation and localization of XML based documents.


    Abstract

    This document defines the XML based Text Memory specification ( xml:tm ). The purpose of this vocabulary is to store text memory information within an XML document using the XML namespace syntax.

    Status of this Document

    Recommendation

    Table of Contents

    1. Introduction

    2. General Structure

    2.1. Main Text Memory Element

    2.2. Version History

    2.3. Text Element

    2.4. Text Unit

    2.5. Inline Translatable Attribute

    2.6. Non-inline Translatable Attribute

    2.7. Individual Text Unit Modification History

    2.8. Inline Spanning Elements

    2.9. Inline Spanning Attribute Elements

    3. Detailed Specifications

    3.1. Text Memory Namespace Declaration

    3.2. Elements

    3.2.1. Text Memory

    3.2.2. Version History

    3.2.3. Text Elements

    3.2.4. Text Units

    3.2.5. Inline Translatable Attributes and Subflows

    3.2.6. Non Inline Translatable Attribute

    3.2.7. Text Unit Modification History

    3.2.8. Text Unit History

    3.2.9. Span Elements

    3.2.10. Span Attribute Elements

    3.3. Attributes

    3.3.1. Text Memory Attributes

    3.3.2. XML Namespace Attributes

    4. Text Memory Matching

    4.1. Exact Matching

    4.2. In-document Leveraged Matching

    4.3. In-document Fuzzy Matching

    4.4. Non-translatable Text

    Appendices

    A. Text Memory Tree Structure

    B. Text Memory Schema

    C. References


    1. Introduction

    XML based Text Memory (xml:tm) is an entirely new approach to the problem of how to store and use translation memory. It is totally integrated with XML and uses XML syntax to define memory segments. It borrows from the Lisa TMX 2.0 specification, mandates the use of the Lisa SRX segmentation standard as well as the W3C ITS Document Rules definition and is designed to tightly integrate with the OASIS XLIFF 1.2 specification for the extraction and actual translation of text. However, its goals and requirements are different enough from any of the above so as to warrant its own format.

    The xml:tm namespace is added to a document during a process that is called Text Memory Namespace Application. The xml:tm namespace application is driven by a W3C ITS Document Rules definition document. The W3C ITS Document Rules is an XML vocabulary that specifies the following:

    1. Which elements contain non-translatable text.
    2. Which elements are inline ('within text' in W3C ITS terminology), that is they do not force a segment break.
    3. Which inline elements form a 'subflow', that is they do not form part of the linguistic entity within which they occur.
    4. Which elements have translatable attributes.

    An additional aspect of the xml:tm namespace application process is the subdivision of text into identifiable sentences which are referred to as individual text units. The text unit is the lowest level of granularity within xml:tm enabled documents. The xml:tm standard mandates that this segmentation process is driven by SRX rules. Unique identifier attributes are allocated to each text unit in a document. These identifiers are designed to be immutable for the lifetime of the document.

    During the document's life cycle the xml:tm namespace and the unique text unit identifiers are maintained by a process referred to as DOM differencing. This process compares the current and previous versions of the document using the Document Object Model (DOM) structure of the document and allocates unique identifiers accordingly.

    For the purposes of translation the xml:tm namespace greatly simplifies the creation of an XLIFF form of the document's translatable text as well as a skeleton file for merging the translated text. xml:tm is designed to work tightly with XLIFF and to simplify the process of translating XML documents.

    Once the XLIFF text has been translated and merged with the skeleton file to form a target version of the document, the source and target documents are perfectly aligned at the text unit level. In the next iteration of the document when it comes to translation those text units that remain unchanged are automatically allocated the target language text in a process known as Exact Matching.

    The xml:tm namespace can be stripped from the document by means of a simple XSLT transformation if non-namespace versions of the document are required for specific forms of processing such as print composition.

    2. General Structure

    Text Memory is an XML namespace application, and as such it is designed to co-exist within any well formed XML document. The Text Memory namespace needs to be declared before the top level Text Memory element is inserted, ideally within the top level element of the document using the following namespace declaration syntax:

    xmlns:tm="urn:xmlintl-tm-tags"

    For consistency it is recommended that all Text Memory namespace elements are prefixed with the text memory namespace identifier tm: . If the tm: namespace has already been used in the document then another appropriate namespace identifier should be chosen. For the purposes of this specification it is assumed that the tm: namespace is used.

    All ID attribute values for xml:tm namespace elements are deemed to be immutable for the life of the document. If an element is deleted its ID is never reused.

    It is recommended that the encoding scheme for xml:tm documents be either Unicode UTF-8 or UTF-16, as this simplifies handling of the character data in the source and translated documents. Other encodings may be used if supported by the actual XML parser and DOM and SAX libraries used to process the document. The actual encoding must be declared using the XML encoding declaration at the very start of the document. The values to use for the encoding declaration are defined in the [IANA Charsets] listing. For example:

     <?xml version="1.0" encoding="utf-8"?> 

    The Text Memory hierarchical structure view of a document is relatively flat compared to that of a normal XML document. The following is a list of the xml:tm elements:

    tm:tm

    The top element is the actual text memory element itself. The tm:tm element can contain any number of tm:te text element elements and any number of tm:ta non-inline translatable attributes.

    tm:vh

    The text memory version history contains one non-content element per version, detailing the version number and the date attributes to that version.

    tm:te

    Text elements are elements that contain PCDATA text. They will contain one or more tm:tu text unit elements

    tm:ta

    Any translatable attributes in non-inline elements are placed in a tm:ta element which is the immediate child of the element within which the translatable attribute occurred.

    tm:tu

    The tm:tu element contains the actual text of a tm:te element or any subdivision of the text into recognizable sentences. The Lisa SRX standard should be used for segmenting tm:te element text into tm:tu elements.

    tm:ti

    Any translatable attributes of elements that occur within tm:tu elements are pulled out as the immediate child of their element into a tm:ti element.

    tm:span

    Inline elements that would otherwise span across multiple segments.

    tm:attr

    Closely linked to tm:span elements, each tm:attr element contains the name and value of each original attribute of the parent span element.

    tm:mh

    If a tm:tu text unit element has changed by less than 30% a tm:mh modification history is established for that text unit.

    tm:th

    Previous versions of modified tm:tu elements are held in tm:th elements.

    An example of an XML document with Text Memory namespace:

    Given the following sample XML document, where translatable text is rendered in blue:

                
    <?xml version="1.0" encoding="UTF-8" ?>
     <office-document xmlns:text="http://openoffice.org/2000/text">
      ..........
      <text:p text:style-name="Text body" text:index-qualifier="xml:tm description">
            xml:tm is a radical new approach
            <text:index name="radical new approach"/>
            to dealing with the problems of translation
            memory for XML documents by using XML syntax to embed memory
            directly into the XML documents themselves.
            It makes extensive use of XML namespace.
       </text:p>
      <text:p text:style-name="Text body">
            The “tm” stands for “text memory”.
            There are two aspects to text memory:
      </text:p>
      <text:ordered-list text:continue-numbering="false" text:style-name="L1">
      <text:list-item>
      <text:p text:style-name="P3">
            Author memory
       </text:p>
       </text:list-item>
       <text:list-item>
       <text:p text:style-name="P3">
             Translation memory
        </text:p>
        </text:list-item>
        ..........
     </office-document>

    The same XML document with xml:tm namespace would look like this:

    <?xml version="1.0" encoding="UTF-8" ?>
     <office-document xmlns:text="http://openoffice.org/2000/text"
         xmlns:text="http://openoffice.org/2000/text"
         xmlns:tm="urn:xmlintl-tm-tags">
      <tm:tm te="543" ta="41" version="2.0"id="8fba2f33" source-language="en-US" date="20031218T13:06:52Z"
          xmltm-version="1.0" tool-name="XYZ Tool" tool-version="1.23">
        <tm:vh version="1.0" date="20030502T14:15:03Z"/>
      ..........
      <text:p text:style-name="Text body">
        <tm:ta id="a1" name="text:index-qualifier" version="1.0">
           xml:tm description
        </tm:ta>
        <tm:te id="e1" tu="2" version="1.0">
          <tm:tu id="u1.1" ti="1" crc="3275b242" version="1.0";>
            xml:tm is a radical new approach
            <text:index>
               <tm:ti id="i1.1.1" name="text:name" crc="9114ce48" version="2.0">
                radical new approach
               </tm:ti>
            </text:index>
            to dealing with the problems of translation
            memory for XML documents by using XML syntax to embed memory
            directly into the XML documents themselves.
          </tm:tu>
          <tm:tu id="u1.2" crc="306bf701" version="1.0">
            It makes extensive use of XML namespace.
          </tm:tu>
        </tm:te>
      </text:p>
      <text:p text:style-name="Text body">
        <tm:te id="e2" tu="2" version="1.0">
          <tm:tu id="u2.1" crc="f8c012ff" version="1.0">
            The “tm” stands for “text memory”.
          </tm:tu>
          <tm:tu id="u2.2" crc="270af770" version="1.0">
            There are two aspects to text memory:
          </tm:tu>
        </tm:te>
      </text:p>
      <text:ordered-list text:continue-numbering="false" text:style-name="L1">
      <text:list-item>
      <text:p text:style-name="P3">
         <tm:te id="e3" tu="1" version="1.0">
           <tm:tu id="u3.1" crc="851603a2" version="1.0">
            Author memory
           </tm:tu> 
         </tm:te>
       </text:p>
       </text:list-item>
       <text:list-item>
       <text:p text:style-name="P3">
         <tm:te id="e4" tu="1" version="1.0">
           <tm:tu id="u4.1" crc="313af159" version="1.0">
             Translation memory
           </tm:tu> 
         </tm:te>
        </text:p>
        </text:list-item>
        ..........
      </tm:tm>
     </office-document>

    The complete tree structure view is available in Appendix A .

    2.1. Main Text Memory Element

    The <tm:tm> element is the top level of the xml:tm hierarchy. It signals the start of the text memory namespace DOM tree. Its direct children are zero or more <tm:vh> version history elements (one per version history), zero or more <tm:te> text elements (XML elements that contain PCDATA ) and zero or more <tm:ta> main element translatable attribute elements.

    2.2. Version History

    Each time a source language xml:tm namespace document is updated and the IDs of its xml:tm elements are updated through the DOM differencing process a new <tm:vh> is added as a direct child of the <tm:tm> element. The <tm:vh> elements have no content. The date and version number of the history element are specified via the "id" and "date" attributes .

    2.3. Text Element

    Each XML element that contains PCDATA (parsable character data) is allocated a <tm:te> element. Each <tm:te> element has a unique ID attribute value and the version number when it was first created. The version number corresponds to the ID value of the current <tm:tm> "version" attribute or the <tm:vh> "version" attribute. Each <tm:te> element is allocated a unique ID value for the life of the document.

    2.4. Text Unit

    Every <tm:te> text element must have at least one <tm:tu> text unit element. If there is more than one identifiable sentence in the PCDATA of a <tm:te> element, then each sentence will have a separate <tm:tu> element. Each <tm:tu> element is allocated a unique ID value for the life of the document as well as the "version" attribute from the current <tm:tm> "version" attribute when it was created.

    2.5. Inline Translatable Attribute

    Any translatable attributes for inline elements are place in their own <tm:ti> element as the direct child of the inline element. If that element had no content, then it is deemed to have the content of <tm:ti> for the purposes of the xml:tm namespace. Each <tm:ti> element is allocated a unique ID value for the life of the document as well as the "version" attribute from the current <tm:tm> "version" attribute when it was created.

    2.6. Non-Inline Translatable Attribute

    Any translatable attributes for non-inline elements are place in their own <tm:ta> element as the direct child of their element. If that element had no content, then it is deemed to have the content of <tm:ta> for the purposes of the xml:tm namespace. Each <tm:ta> element is allocated a unique ID value for the life of the document as well as the "version" attribute from the current <tm:tm> "version" attribute when it was created.

    2.7. Individual Text Unit Modification History

    Where during DOM differencing it is found that the contents of a <tm:tu> element have changed by less than 30% (70% or more of the contents are unchanged), then a modification history is created for that element. The role of <tm:mh> elements is to provide previous source and target versions of the <tm:tu> during translation as a form of in-document fuzzy matching.

    2.8. Inline Spanning Elements

    Where an inline element spans multiple segments it is necessary convert the start and end tags for the element into tm:span elements, otherwise it would not be possible to segment the data into multiple tm:tu elements. In this way the effect of the inline element on segmentation is neutralized. If the spanning element has one or more attributes, then the name and value of the attributes are held in tm:attr inline span attribute elements. This allows for the complete reconstruction of the original spanning element together with its attributes when creating a non tm namespace version of the document.

    For example the following XML content:

            	
       <text:p text:style-name="P11">
         This is used to generate a format <text:span text:style-name="T5" text:emph="normal"> as required.
       	 The generated format can then be applied to the output.
       	 See Appendix C for details</text:span>.
       </text:p>

    Would generate the following tm: code:

            	
       <text:p text:style-name="P11">
          <tm:te id="e9" tu="5" version="1.0">
           <tm:tu crc="ebda4c34" id="u9.1" version="1.0">
       	     This is used to generate a format 
       	     	<tm:span name="text:span" type="start">
                      <tm:attr name="text:style-name" value="T5"/>
                      <tm:attr name="text:emph" value="normal"/>
                     </tm:span> as required.
           </tm:tu>
           <tm:tu crc="a0bcf600" id="u9.2" version="1.0">
             The generated format can then be applied to the output.
           </tm:tu>
           <tm:tu crc="7e489ac6" id="u9.3" version="1.0">
             See Appendix C for details<tm:span name="text:span" type="end"/>.
           </tm:tu>
         </tm:te>
       </text:p>

    2.9. Inline Spanning Attribute Elements

    Inline spanning attribute elements tm:attr are used to hold the details of the span element's attributes and their values. For full details and examples please refer to Section 2. 8 Inline Spanning Elements.

    3. Detailed Specifications

    3.1. Text Memory Namespace Declaration

    Text Memory only exists as a namespace within another XML document. It is not designed to have an independent existence. The Text Memory namespace must be declared as an attribute of any preceding element of that document, although for clarity it is recommended that this declaration be placed within the attributes of the top document element:

    xmlns:tm="urn:xmlintl-tm-tags"

    3.2. Elements

    xml:tm elements can be divided into the following categories: top-level elements, text elements and versioning elements. Some core Attributes are shared among them.
     

    Top Level elements <tm:tm> , <tm:te>.
    Text Elements <tm:ta> , <tm:tu> , <tm:ti> , <tm:th>.
    Versioning elements <tm:vh> , <tm:mh>.
    Span elements <tm:span> , <tm:attr>.

    3.2.1. Text Memory

    The main text memory element has the following format:

    <tm:tm>

    Text Memory Element - The <tm:tm> element encloses all the other xml:tm elements of the document.

    Required attributes:

    id - the unique document ID.

    te - the next unique text element tm:te identifier.

    ta - the next unique main element translatable attribute tm:ta identifier.

    version - the current version identifier for the tm:tm namespace for this document.

    date - the date that the current version of the tm:tm namespace was created for this document.

    source-language - the language in which this document is authored.

    xmltm-version - The version of the xml:tm specification.

    tool-name - The name of the tool that generated the text memory.

    tool-version - The version identifier of the tool that generated the text memory.

    Optional attributes:

    target-language - if present signifies the language of this document.

    doctype-public - the public doctype identifier of the original document's DOCTYPE declaration if any.

    doctype-system - the system doctype details of the original document's DOCTYPE declaration if any.

    Contents:

    Zero or more <tm:vh> elements, zero or more <tm:ta> elements, zero or more <tm:te> elements.

    3.2.2. Version History

    The xml:tm text memory version histories of this document:

    <tm:vh>

    Version History Element - There is one <tm:vh> element for each version of the xml:tm namespace for the document.

    Required attributes:

    version - the version identifier for this previous tm:tm namespace for this document.

    date - the date that the version of the xml:tm namespace was created for this document.

    Optional attributes:

    source-language - The original source language of this document.

    Contents:

    EMPTY

    3.2.3. Text Elements

    The xml:tm text memory element tag for elements that contain text:

    <tm:te>

    Text Element - There is one <tm:te> element within each native element that has text ( PCDATA ) content.

    Required attributes:

    id - A unique immutable identifier for this text element. This will always begin with the character 'e' followed by digits.

    version - the version identifier of the tm:tm element when this tm:te element was created.

    tu - The maximum current identifier for any tm:tu elements that are children of this tm:te element.

    Optional attributes:

    xml:lang - A language code as described in RFC 4646. Denotes that the text contained in the child tm:tu elements relates to a language other than that declared in the source-language or, for target language documents, declared in the target-language attribute for the document.

    xml:space - specifies how white spaces (ASCII spaces, tabs and line-breaks) should be treated.

    Contents:

    One or more <tm:tu> elements.

    3.2.4. Text Units

    The xml:tm text memory element for individual tm:te element contents or subdivision of the same into recognizable sentences:

    <tm:tu>

    Text Unit - There is one <tm:tu> element within each native element that has text ( PCDATA ) content, or a subdivision of the contents into recognizable sentences.

    Required attributes:

    id - A unique immutable identifier for this text unit element. This will always begin with the character 'u' followed by digits.

    crc - The AUTODIN II polynomial crc hex value for the text contents of this element.

    version - the version identifier of the tm:tm element when this tm:tu element was created.

    flag - Used in target language version of the element to indicate if the translation for the element is merged with the preceding element, or not to be used for leveraged matching.

    type - The general type of this text unit. Possible values are "text" , "alphanumeric" , "numeric" , "measurement" , "punctuation" , "markup" or "notrans" .

    Optional attributes:

    ti - The maximum current identifier for any inline <tm:ti> translatable attribute elements that are children of this <tm:tu> element.

    translate - indicates the 'translatability' of the contents. The default value is "yes" .

    xml:lang - A language code as described in RFC 4646. Denotes that the text contained in the element relates to a language other than that declared in the source-language or, for target language documents, declared in the target-language attribute for the document.

    xml:space - specifies how white spaces (ASCII spaces, tabs and line-breaks) should be treated.

    Contents:

    1. Zero or one <tm:mh> text unit modification elements.
    2. The text unit text, along with any inline elements of the main document namespace.
    3. Zero or more <tm:ti> translatable attribute elements.
    4. Zero or more <tm:span> elements.

    PCDATA and any inline elements and inline element <tm:ti> translatable attribute elements.

    3.2.5. Inline Translatable Attributes and Subflows

    The xml:tm text memory element for individual inline translatable attributes and subflows:

    <tm:ti>

    Translatable inline attribute and subflow text - Any translatable attributes of inline elements are expanded into a direct child of the inline element. Any inline element content that should be treated as a subflow. A 'subflow' is text that occurs within the current text, but linguistically does not form part of the current text stream. Examples are footnote text, index text etc. Both translatable inline element attributes and subflow text should be extracted separately from the text within which they occur.

    Required attributes:

    id - A unique immutable identifier for this text unit element. This will always begin with the character 'i' followed by digits.

    crc - The AUTODIN II polynomial crc hex value for the text contents of this element.

    version - the version identifier of the <tm:tm> element when this <tm:tu> element was created.

    flag - Used in target language version of the element to indicate if the translation for the element is not to be used for leveraged matching.

    type - The general type of this inline element attribute contents. Possible values are "text" , "alphanumeric" , "numeric" , "measurement" , "punctuation" , "markup" or "notrans" .

    name - the name of the attribute.

    subflow - indicates that the element is an inline subflow as opposed to a translatable attribute. A value of "yes" indicates the contents of an inline subflow element. The default value is "no" .

    Optional attributes:

    translate - indicates the 'translatability' of the contents. The default value is "yes" .

    xml:lang - A language code as described in RFC 4646. Denotes that the text contained in the element relates to a language other than that declared in the source-language or, for target language documents, declared in the target-language attribute for the document.

    xml:space - specifies how white spaces (ASCII spaces, tabs and line-breaks) should be treated.

    Contents:

    The translatable attribute text.

    3.2.6. Non-inline Translatable Attribute

    The xml:tm text memory element for non-inline translatable attributes:

    <tm:ta>

    Translatable inline attribute - Any translatable attributes of inline elements are expanded into a direct child of the inline element.

    Required attributes:

    id - A unique immutable identifier for this text unit element. This will always begin with the character 'a' followed by digits.

    crc - The AUTODIN II polynomial crc hex value for the text contents of this element.

    version - the version identifier of the <tm:tm> element when this <tm:tu> element was created.

    flag - Used in target language version of the element to indicate if the translation for the element is not to be used for leveraged matching.

    type - The general type of this inline element attribute contents. Possible values are "text" , "alphanumeric" , "numeric" , "measurement" , "punctuation" , "markup" or "notrans" .

    name - the name of the attribute

    Optional attributes:

    translate - indicates the 'translatability' of the contents. The default value is "yes" .

    xml:lang - A language code as described in RFC 4646. Denotes that the text contained in the element relates to a language other than that declared in the source-language or, for target language documents, declared in the target-language attribute for the document.

    xml:space - specifies how white spaces (ASCII spaces, tabs and line-breaks) should be treated.

    Contents:

    The translatable attribute text.

    3.2.7. Text Unit Modification History

    The xml:tu text unit modification history:

    <tm:mh>

    Text Unit Modification History - If an individual text unit has changed by less than 70% between revisions a note is made of previous versions of this text unit.

    Required attributes:

    NONE

    Contents:

    One or more <tm:th> text unit history elements.

    3.2.8. Text Unit Modification History

    The xml:tu history:

    <tm:th>

    Text Unit History - Previous id of the text unit. Where a text unit has been modified this element contains the id of the text unit's previous incarnation. There must be at least one <tm:th> element for a <tm:mh> element.

    Required attributes:

    id - A unique immutable identifier for this text unit element. This will always begin with the character 'u' followed by digits.

    crc - The AUTODIN II polynomial crc hex value for the text contents of this element.

    version - the version identifier of the <tm:tm> element when this <tm:tu> element was created.

    Optional attributes:

    NONE

    Contents:

    EMPTY

    3.2.9. Span Elements

    The tm:span element for inline elements that span across multiple tm:tu elements:

    <tm:span>

    Spanning inline elements - Where an inline elements spans multiple segments boundaries it can be escaped by use of a tm:span element. The effect of the tm:span element is to remove its role as an element with content and replace this with an inline no-content element for both the start and end elements. Without use of the tm:span it would not be possible to break the text into the appropriate segments. tm:span elements must always occur in pairs to cover the start and end elements.

    Required attributes:

    name - the name of the element that is being escaped.

    type - The type of span element. This can only have the value start where the start code is being replaces, or end where the closing element is being replaced.

    Optional attributes:

    original attributes - The original attributes of the start element appear in their original form in the tm:span element.

    Contents:

    Zero, o ne or more tm:attr elements.

    3.2.10. Span Attribute Elements

    The tm:attr element contains the attribute name and values of its parent tm:span element's original attributes:

    <tm:attr>

    Spanning attribute elements - Where an inline elements spans multiple segments boundaries it can be escaped by use of a tm:span element. The effect of the tm:span element is to remove its role as an element with content and replace this with an inline no-content element for both the start and end elements. Where an element is being escaped in this way its attribute names and values are stored in child tm:attr elements.

    Required attributes:

    name - the name of the attribute that is being stored.

    value - The value of the attribute that is being stored.

    Optional attributes:

    NONE

    Contents:

    EMPTY

    3.3. Attributes

    This section lists the various attributes used in the xml:tm elements. An attribute is never specified more than once for each element. Along with some of the attributes are the "Recommended Attribute Values". Values for these attributes are case sensitive. These lists are purely informative; the goal is to specify a preferred syntax so tools can have some level of compatibility.
     

    xml:tm attributes crc , date , doctype-public , doctype-system , flag , id , name , source-language , subflow , target-language , ta , te , ti , translate , tu , tool-name , tool-version , type , value , version , xmltm-version
    XML namespace attributes xml:lang , xml:space.

    3.3.1. Text Memory Attributes

    crc

    CRC - The crc hash value of the serialized contents of this element.

    Value description:

    The AUTODIN II polynomial crc hex value of the serialized contents of this element. The crc attribute is critical for validating the consistency of xml:tm memory alignment.

    Default value:

    Undefined

    Used in:

    <tm:ta><tm:th> ,   <tm:ti><tm:tu> .

    date

    Date - The date attribute indicates when a given element was created or modified.

    Value description:

    Date in [ ISO 8601 ] Format. The recommended pattern to use is: YYYYMMDDThhmmssZ  
    Where: YYYY is the year (4 digits), MM is the month (2 digits), DD is the day (2 digits), hh is the hours (2 digits), mm is the minutes (2 digits), ss is the second (2 digits), and Z indicates the time is UTC time. For example:

                        date="20020125T21:06:00Z"
    is January 25, 2002 at 9:06pm GMT
    is January 25, 2002 at 2:06pm US Mountain Time
    is January 26, 2002 at 6:06am Japan time

    Default value:

    Undefined

    Used in:

    <tm:tm><tm:vh> .

    doctype-public

    DOCTYPE PUBLIC IDENTIFIER - the PUBLIC identifier of the original document's DOCTYPE declaration.

    Value description:

    For the main <tm:tm> element this attribute holds the value of the original document's DOCTYPE declaration if a DOCTYPE was declared for the original document.

    Default value:

    Undefined

    Used in:

    <tm:tm> .

    doctype-system

    DOCTYPE SYSTEM DETAILS - the SYSTEM URI of the original document's DOCTYPE declaration.

    Value description:

    For the main <tm:tm> element this attribute holds the value of the original document's DOCTYPE declaration SYSTEM URI if a DOCTYPE was declared for the original document.

    Default value:

    Undefined

    Used in:

    <tm:tm> .

    flag

    Flag - Used to indicate if the text unit has been merged with the preceding text unit, or if the translation should not be used for leveraged memory:

    The possible types for the text types are:

    normal

    This is the default value.

    merged

    Signifies that the translation for the current text unit has been merged with that of a preceding text unit. Not used for <tm:ta> and <tm:ti> elements.

    noequiv

    Signifies for a target version of a text unit that the contents of the text unit should not be used for leveraged memory.

    Default value:

    normal

    Used in:

    <tm:ta> , <tm:ti> , <tm:tu> .

    id

    ID - The unique ID identifier for this element.

    Value description:

    For the main <tm:tm> element this is a unique value based on the CRC of the whole file plus a random 64 bit number in hex character form. For other instances it is the first character of the element name plus a unique sequential number. The sequential number is based on the number of the parent plus a unique number for the current element. The following list details how the individual identifiers are constructed for the very first occurrence of that element:

    <tm:tm>

    CRC of the whole file plus a random 64 bit number in hex character form, e.g. "8fba2f33".

    <tm:ta>

    "a1" - formed by the letter 'a' followed by the value of the "ta" attribute of the <tm:tm> element plus one.

    <tm:te>

    "e1" - formed by the letter 'e' followed by the value of the "te" attribute of the <tm:tm> element plus one.

    <tm:th>

    "u1.1" - inherited from the original <tm:tu> element.

    <tm:ti>

    "i1.1.1" - formed by the letter 'i' followed by the "id" value of the parent <tm:tu> element less the leading letter plus the period character plus value of the "ti" attribute of the <tm:tu> element plus one.

    <tm:tu>

    "u1.1" - formed by the letter 'u' followed by the "id" value of the parent <tm:te> element less the leading letter plus the period character plus value of the "tu" attribute of the <tm:te> element plus one.

    Default value:

    Undefined

    Used in:

    <tm:ta><tm:th> ,   <tm:ti><tm:tu> .

    name

    Name

    1. When used in tm:ta and tm:ti elements denotes the attribute name of translatable attributes.

      Value description:

      The name of the attribute for translatable attributes that have been pulled out as a direct child of their element. Translatable attributes for inline ( <tm:ti> ) and non-inline ( <tm:ta> ) elements a re pulled out from their element into a xml:tm element as a direct child.

      For example:
      Original document

       <elm trans="Please translate this"/>

      xml:tm version of the document

       <elm><tm:ta id="a1" name="trans">Please translate this</tm:ta></elm>

      Default value:

      Undefined

    2. When used in tm:attr denotes the attribute name of one of the parent tm:span attributes.

      Value description:

      The name of the attribute that the child tm:attr element represents regarding its parent tm:span element.

      For example:
      Original document

              	
         <text:p text:style-name="P11">
           This is used to generate a format <text:span text:style-name="T5" text:emph="normal"> as required.
         	 The generated format can then be applied to the output.
         	 See Appendix C for details</text:span>.
         </text:p>

      xml:tm version of the document

              	
         <text:p text:style-name="P11">
            <tm:te id="e9" tu="5" version="1.0">
             <tm:tu crc="ebda4c34" id="u9.1" version="1.0">
         	     This is used to generate a format 
         	     	<tm:span name="text:span" type="start">
                        <tm:attr name="text:style-name" value="T5"/>
                        <tm:attr name="text:emph" value="normal"/>
                       </tm:span> as required.
             </tm:tu>
             <tm:tu crc="a0bcf600" id="u9.2" version="1.0">
               The generated format can then be applied to the output.
             </tm:tu>
             <tm:tu crc="7e489ac6" id="u9.3" version="1.0">
               See Appendix C for details<tm:span name="text:span" type="end"/>.
             </tm:tu>
           </tm:te>
         </text:p>

      Default value:

      Undefined

    3. When used in tm:span denotes the name of the original element that is being spanned.

      Value description:

      The name of the original element that is being replaced by the tm:span element pair. Please refer to the previous item for examples.

      Default value:

      Undefined

    Used in:

    <tm:ta><tm:ti><tm:span><tm:attr> .

    source-language

    Source language - The language for the main <tm:tm> element.

    Value description:

    A language code as described in RFC 4646. The values for this attribute follow the same rules as the values for xml:lang . Unlike the other xml:tm attributes, the values for xml:lang are not case-sensitive. For more information see the section on xml:lang in the XML specification , and the erratum E11 (which replaces RFC 1766 by RFC 4646).

    Default value:

    Undefined

    Used in:

    <tm:tm> , <tm:vh> .

    subflow

    Subflow indicator - defined for a tm:ti element if it is an inline element that is to be treated as a subflow for translation, rather than an inline translatable attribute.

    Value description:

    For use with inline elements such as footnotes or index markers to indicate that the contents of the inline element are to be treated separately for translation purposes (appear in their own XLIFF trans-unit element) and do not form part of the linguistic text unit entity.

    Default value:

    no

    Used in:

    <tm:ti> .

    <footnote><tm:ti sublow="yes" id="i1.1.1" crc="3dedf1">footnote text</tm:ti></footnote> .

    target-language

    target language - The language for the current document if it is a translation of the source language document.

    Value description:

    A language code as described in RFC 4646. The values for this attribute follow the same rules as the values for xml:lang . Unlike the other xml:tm attributes, the values for xml:lang are not case-sensitive. For more information see the section on xml:lang in the XML specification , and the erratum E11 (which replaces RFC 1766 by RFC 4646)

    Default value:

    Undefined

    Used in:

    <tm:tm> .

    ta

    Non-inline translatable element attribute counter - The maximum value of the <tm:ta> "id" attribute within this document.

    Value description:

    Each time a new non-inline translatable attribute element is created it is allocated a unique ID identifier formed from the character 'a' plus the integer value of the <tm:tm> "ta" attribute value plus one. The <tm:tm> "ta" attribute is then also incremented to reflect the new maximum value.

    Default value:

    0

    Used in:

    <tm:tm> .

    te

    Translatable text element attribute counter - The maximum value of the <tm:te> "id" attribute within this document.

    Value description:

    Each time a new element containing text and/or inline elements is created it is allocated a unique ID identifier formed from the character 'e' plus the integer value of the <tm:tm> "te" attribute value plus one. The <tm:tm> "ta" attribute is then also incremented to reflect the new maximum value.

    Default value:

    0

    Used in:

    <tm:tm> .

    ti

    Inline translatable element attribute counter - The maximum value of the <tm:ti> "id" attribute within this <tm:tu> element.

    Value description:

    Each time a new inline translatable attribute element is encountered within a <tm:tu> element it is allocated a unique ID identifier formed from the character 'i' plus the integer value of the <tm:tu> "ti" attribute value plus one. The <tm:tu> "ti" attribute is then also incremented to reflect the new maximum value.

    Default value:

    0

    Used in:

    <tm:tu> .

    tool-name

    Name - The identifier of the tool used to create the text memory.

    Value description:

    the name of the xml:tm creation tool.

    Default value:

    Undefined

    Used in:

    <tm:tm> .

    tool-version

    Name - The version identifier of the tool used to create the text memory.

    Value description:

    the version identifier of the xml:tm creation tool.

    Default value:

    Undefined

    Used in:

    <tm:tm> .

    translate

    Translatability indicator - Indicates if the contents of the <tm:tu> , <tm:ta> or <tm:ti> element is translatable or not.

    Value description:

    This attribute has the possible values "yes", or "no" .

    Default value:

    yes

    Used in:

    <tm:tu> , <tm:ti> .

    tu

    Text unit element attribute counter - The maximum value of the <tm:tu> "id" attribute within a <tm:te> element.

    Value description:

    This attribute contains the maximum value of <tm:tu> id attributes allocated for this <tm:te> element. Each time a new text unit element is created within a <tm:te> element it is allocated a unique ID identifier formed from the character 'u' plus the integer value of the <tm:te> "tu" attribute value plus one. The <tm:te> "tu" attribute is then also incremented to reflect the new maximum value.

    Default value:

    0

    Used in:

    <tm:te> .

    type

    Type

    1. When used in: tm:tu, tm:ti, tm:ta, denotes the basic translation classification of the PCDATA text content of this element.

      Value description:

      The possible types for the text types are:

      alphanumeric

      The text is made up solely of alphanumeric words and punctuation characters, e.g. "104AGC" .

      numeric

      The text is made up solely of numeric words and punctuation characters, e.g. "10.254" .

      measurement

      The text is made up solely of recognizable measurements and punctuation characters, e.g. "10.52 mm" .

      punctuation

      The text is made up solely of punctuation characters, e.g. "-" .

      markup

      The content of this element is made up of nothing but inline elements with no text or translatable attribute content, e.g. "<inline/>".

      "notrans"

      Non translatable text

      "x-"

      An extension mechanism is provided to allow user defined type values. These must begin with the sequence x- and will be assumed to be non-translatable.

      text

      Normal text

      Default value:

      text

    2. When used in tm:span, denotes if this is the 'start' or 'end' element of the tm:span element pair.

      Value description:

      The possible types for the text types are:

      start

      Denotes that the tm:span is the start of the tm:span pair.

      end

      Denotes that the tm:span is the end of the tm:span pair.

      Default value:

      NONE

    Used in:

    <tm:ta> , <tm:ti> , <tm:tu> , <tm:span> .

    value

    Value - The value of the tm:attr element attribute.

    Value description:

    The version number of this attribute denoted by the tm:attr element:

    Default value:

    Undefined

    Used in:

    <tm:attr> .

    version

    Version - The version number.

    Value description:

    The version number of this tm namespace element :

    Default value:

    Undefined

    Used in:

    <tm:tm> , <tm:vh> , <tm:te> , <tm:ta> , <tm:ti> , <tm:th> , <tm:tu> .

    xmltm-version

    xml:tm Version - The version number of the xml:tm specification implemented.

    Value description:

    The version number of this specification

    Default value:

    Undefined

    Used in:

    <tm:tm> .

    3.3.2. XML Namespace Attributes

    xml:lang

    Language - The xml:lang attribute specifies the locale language variant of the xml:tm document.

    Value description:

    A language code as described in RFC 4646. This declared value is considered to apply to all elements within the content of the element where it is specified, unless overridden with another instance of the xml:lang attribute. Unlike the other xml:tm attributes, the values for xml:lang are not case-sensitive. For more information see the section on xml:lang in the XML specification , and the erratum E11 (which replaces RFC 1766 by RFC 4646)

    The use of xml:lang for the <tm:tm> element is not recommended. The source-language attribute must be used to denote the original source language of the document. The target-language attribute must be used to denote the target language of the document where applicable. The use of xml:lang would therefore be ambiguous with regard to target language documents.

    The use of xml:lang for xml:tm elements apart from <tm:tm> denotes that the text contained in the element relates to a language other than that declared in the source-language or, for target language documents, declared in the target-language attribute for the document.

    Default value:

    Undefined

    Used in:

    <tm:ta> , <tm:ti> , <tm:te> , <tm:tu> .

    xml:space

    White spaces - The xml:space attribute specifies how white spaces (ASCII spaces, tabs and line-breaks) should be treated.

    Value description:

    default or preserve . The value default signals that applications' default white-space processing modes are acceptable for this element; the value preserve indicates the intent that applications preserve all the white space. This declared intent is considered to apply to all elements within the content of the element where it is specified, unless overridden with another instance of the xml:space attribute.

    For more information see the section on xml:space in the XML specification .

    Default value:

    default .

    Used in:

    <tm:ta> , <tm:ti> , <tm:te> , <tm:tu> .

     

    4. Text Memory Matching

    The use of the xml:tm namespace facilitates various types of advanced translation memory matching at the XML document level:

    1. Exact Matching
    2. In-document Leveraged Matching
    3. In-document Fuzzy Matching
    4. Non-translatable Text

    Once a document has been translated there is an exact alignment between the source language version and the target versions. If the source language version is subsequently updated, then there is an exact link between the source and target language tm:tu elements which is maintained at the 'id' attribute level. This information, along with the other xml:tm namespace facilities allow for translation memory matching at the document level.

    Within xml:tm it is assumed that all translation memory matching will be done during the creation of an XLIFF file and that all translation operations will occur at the XLIFF file level.

    4.1. Exact Matching

    Where a tm:tu element has the same id attribute as the previous version then an Exact Match can be declared. Please note that if the text unit immediately before an unchanged text unit has been changed or deleted, then the exact match should be degraded to a leveraged match as it will need to be reviewed for correctness as the change could affect the translation of the unchanged text unit.

    4.2. In-document Leveraged Matching

    The tm:tu 'crc' attribute value can be used for in-document leveraged matching.

    4.3. In-document Fuzzy Matching

    If a tm:tu element has a modification history (tm:mh), then the first text unit history element (tm:th) can be used to locate the previous target translation as an in-document fuzzy match.

    4.4. Non-translatable Text

    If a tm:tu element has a 'type' attribute value other than 'text' then the contents can be described as non-translatable in the resultant XLIFF file and no match needs to take place.

    A. Text Memory Tree Structure

    The following figure shows the possible structure as a tree. Each element is followed by notation indicating its possible occurrence according to the corresponding legend.

    (legend: 1 = one
             + = one or more
             ? = zero or one
             * = zero, one or more)
    
    <tm:tm>1
    |
    +--- <tm:vh>*
    |
    +--- <tm:ta>*
    |
    +--- <tm:te>*
         |
         +--- <tm:tu>+
              |
              +--- <tm:ti>*
              |
              +--- <tm:span>*
              |    |
              |    *---<tm:attr>*
              |
              +--- <tm:mh>?
                   |
                   +--- <tm:th>+

    B. Text Memory Schema

    <?xml version="1.0" encoding="UTF-8"?>
    <!--
       Document        : tm.xsd
       Version         : 1.0
       Created on      : 26 feburary 2007
       Authors         : [email protected], [email protected]
       Description     : This XML Schema defines the structure of the xml:tm namespace
       Note            : Final version approved 26 February 2007
    
      Copyright © 2007 The Localisation Industry Standards Association [LISA]. 
      All Rights Reserved.
      
    -->
    
    <xs:schema xmlns:tm="urn:xmlintl-tm-tags" targetNamespace="urn:xmlintl-tm-tags"
    	xml:lang="en" xmlns:xs="http://www.w3.org/2001/XMLSchema"
    	elementFormDefault="qualified">
    	<xs:import namespace="http://www.w3.org/XML/1998/namespace"
    		schemaLocation="http://www.w3.org/2001/xml.xsd"/>
    	<!--
    	================================================== 
    	Restrictions
    	================================================== 
    	-->
    	<!-- Restrictions for "type" attribute -->
    	<xs:simpleType name="type">
    		<xs:restriction base="xs:token">
    			<xs:enumeration value="text"/>
    			<xs:enumeration value="alphanumeric"/>
    
    			<xs:enumeration value="numeric"/>
    			<xs:enumeration value="measurement"/>
    			<xs:enumeration value="punctuation"/>
    			<xs:enumeration value="markup"/>
    			<xs:enumeration value="notrans"/>
    		</xs:restriction>
    	</xs:simpleType>
    	<!-- "flag" attribute for text units -->
    	<xs:simpleType name="flag1">
    
    		<xs:restriction base="xs:token">
    			<xs:enumeration value="normal"/>
    			<xs:enumeration value="merged"/>
    			<xs:enumeration value="noequiv"/>
    		</xs:restriction>
    	</xs:simpleType>
    	<!-- "flag" attribute for tm:ta and tm:ti -->
    	<xs:simpleType name="flag2">
    		<xs:restriction base="xs:token">
    
    			<xs:enumeration value="normal"/>
    			<xs:enumeration value="noequiv"/>
    		</xs:restriction>
    	</xs:simpleType>
    	<!-- "yes/no" values -->
    	<xs:simpleType name="YesNo">
    		<xs:restriction base="xs:token">
    			<xs:enumeration value="yes"/>
    			<xs:enumeration value="no"/>
    
    		</xs:restriction>
    	</xs:simpleType>
    	<!-- "start/end" values -->
    	<xs:simpleType name="StartEnd">
    		<xs:restriction base="xs:token">
    			<xs:enumeration value="start"/>
    			<xs:enumeration value="end"/>
    		</xs:restriction>
    	</xs:simpleType>
    
    		<!-- Restrictions for "id" attribute in tm:tu -->
    	<xs:simpleType name="UnitID">
    		<xs:restriction base="xs:string">
    			<xs:pattern value="u[0-9]+\.[0-9]+"/>
    		</xs:restriction>
    	</xs:simpleType>
    	<!-- Restrictions for "crc" "id" attribute in tm:tm -->
    	<xs:simpleType name="CRC">
    		<xs:restriction base="xs:string">
    
    			<xs:pattern value="[0-9a-fA-F]+"/>
    		</xs:restriction>
    	</xs:simpleType>
    		<!-- Restrictions for "id" attribute in tm:te -->
    	<xs:simpleType name="ElementID">
    		<xs:restriction base="xs:string">
    			<xs:pattern value="u[0-9]+"/>
    		</xs:restriction>
    	</xs:simpleType>
    
    	<!-- Restrictions for "id" attribute in tm:ti -->
    	<xs:simpleType name="InlineID">
    		<xs:restriction base="xs:string">
    			<xs:pattern value="i[0-9]+(\.[0-9]+)+"/>
    		</xs:restriction>
    	</xs:simpleType>
    	<!-- Restrictions for "id" attribute in tm:ta -->
    	<xs:simpleType name="AttributeID">
    		<xs:restriction base="xs:string">
    
    			<xs:pattern value="a[0-9]+"/>
    		</xs:restriction>
    	</xs:simpleType>
    	<!-- Restrictions for "date" attributes -->
    	<xs:simpleType name="Date">
    		<xs:restriction base="xs:string">
    			<xs:pattern value="[1-2][0|9][0-9][0-9][0-1][0-9][0-3][0-9]T[0-2][0-9]([0-5][0-9]){2}Z"/>
    		</xs:restriction>
    	</xs:simpleType>	
    	<!-- Restrictions for user-defined attribute values -->
    
    	<xs:simpleType name="Custom">
    		<xs:restriction base="xs:string">
    			<xs:pattern value="x-[^\s]+"/>
    		</xs:restriction>
    	</xs:simpleType>
    		<!-- Restrictions for xml:space attribute -->
    	<xs:simpleType name="space">
    		<xs:restriction base="xs:token">
    			<xs:enumeration value="default"/>
    
    			<xs:enumeration value="preserve"/>
    		</xs:restriction>
    	</xs:simpleType>	
    	<!--
    	================================================== 
    	Structural Elements 	
    	================================================== 
    	-->
    	<!-- The main tm object -->
    	<xs:element name="tm">
    		<xs:complexType>
    			<xs:sequence>
    				<xs:element minOccurs="0" maxOccurs="unbounded" ref="tm:ta"/>
    
    				<xs:element minOccurs="0" maxOccurs="unbounded" ref="tm:te"/>
    				<xs:element minOccurs="0" maxOccurs="unbounded" ref="tm:vh"/>
    			</xs:sequence>
    			<xs:attribute name="id" use="required" type="tm:CRC"/>
    			<xs:attribute name="te" use="required">
    				<xs:simpleType>
    					<xs:restriction base="xs:integer">
    						<xs:minInclusive value="0"/>
    					</xs:restriction>
    
    				</xs:simpleType>
    			</xs:attribute>
    			<xs:attribute name="ta" use="required">
    				<xs:simpleType>
    					<xs:restriction base="xs:integer">
    						<xs:minInclusive value="0"/>
    					</xs:restriction>
    				</xs:simpleType>
    			</xs:attribute>
    
    			<xs:attribute name="version" use="required" type="xs:ID"/>
    			<xs:attribute name="date" use="required" type="tm:Date"/>
    			<xs:attribute name="source-language" use="required" type="xs:NMTOKEN"/>
    			<xs:attribute name="xmltm-version" use="required">
    				<xs:simpleType>
    					<xs:restriction base="xs:string">
    						<xs:enumeration value="1.0"/>
    					</xs:restriction>
    				</xs:simpleType>
    
    			</xs:attribute>
    			<xs:attribute name="tool-name" use="required" type="xs:NMTOKEN"/>
    			<xs:attribute name="tool-version" use="required" type="xs:NMTOKEN"/>
    			<xs:attribute name="target-language" use="optional" type="xs:NMTOKEN"/>
    			<xs:attribute name="doctype-public" use="optional" type="xs:string"/>
    			<xs:attribute name="doctype-system" use="optional" type="xs:string"/>			
    		</xs:complexType>
    	</xs:element>
    	<!-- The version history for this object -->
    
    	<xs:element name="vh">
    		<xs:complexType>
    			<xs:attribute name="version" use="required" type="xs:ID"/>
    			<xs:attribute name="date" use="required" type="tm:Date"/>
    			<xs:attribute name="source-language" use="optional" type="xs:NMTOKEN"/>
    		</xs:complexType>
    	</xs:element>
    	<!-- Translatable attributes for non-inline elements -->
    	<xs:element name="ta">
    
    		<xs:complexType mixed="true">
    			<xs:attribute name="id" use="required" type="tm:AttributeID"/>
    			<xs:attribute name="crc" use="required" type="tm:CRC"/>
    			<xs:attribute name="version" use="required" type="xs:IDREF"/>
    			<xs:attribute name="flag" use="required" type="tm:flag2"/>
    			<xs:attribute name="type" use="required">
    				<xs:simpleType>
    					<xs:union memberTypes="tm:type tm:Custom"/>
    				</xs:simpleType>					
    			</xs:attribute>
    
    			<xs:attribute name="name" use="required" type="xs:NMTOKEN"/>
    			<xs:attribute name="translate" use="optional" default="yes" type="tm:YesNo"/>
    			<xs:attribute ref="xml:space" default="default"/>
    			<xs:attribute ref="xml:lang"/>
    		</xs:complexType>
    	</xs:element>
    	<!-- Text elements -->
    	<xs:element name="te">
    		<xs:complexType>
    
    			<xs:sequence>
    				<xs:element minOccurs="1" maxOccurs="unbounded" ref="tm:tu"/>
    			</xs:sequence>
    			<xs:attribute name="id" use="required" type="tm:ElementID"/>
    			<xs:attribute name="version" use="required" type="xs:IDREF"/>
    			<xs:attribute name="tu" use="required">
    				<xs:simpleType>
    					<xs:restriction base="xs:integer">
    						<xs:minInclusive value="1"/>
    
    					</xs:restriction>
    				</xs:simpleType>
    			</xs:attribute>
    			<xs:attribute ref="xml:space" default="default"/>
    			<xs:attribute ref="xml:lang"/>
    		</xs:complexType>
    	</xs:element>
    	<!-- Text units -->
    	<xs:element name="tu">
    
    		<xs:complexType mixed="true">
    			<xs:choice minOccurs="0" maxOccurs="unbounded">
    				<xs:element minOccurs="0" maxOccurs="unbounded" ref="tm:ti"/>
    				<xs:element minOccurs="0" maxOccurs="1" ref="tm:mh"/>
    				<xs:element minOccurs="0" maxOccurs="unbounded" ref="tm:span"/>
    			</xs:choice>
    			<xs:attribute name="id" use="required" type="tm:UnitID"/>
    			<xs:attribute name="crc" use="required" type="tm:CRC"/>
    			<xs:attribute name="version" use="required" type="xs:IDREF"/>
    
    			<xs:attribute name="flag" use="required" type="tm:flag1"/>
    			<xs:attribute name="type" use="required">
    				<xs:simpleType>
    					<xs:union memberTypes="tm:type tm:Custom"/>
    				</xs:simpleType>					
    			</xs:attribute>
    			<xs:attribute name="ti">
    				<xs:simpleType>
    					<xs:restriction base="xs:integer">
    
    						<xs:minInclusive value="0"/>
    					</xs:restriction>
    				</xs:simpleType>
    			</xs:attribute>
    			<xs:attribute name="translate" use="optional" default="yes" type="tm:YesNo"/>			
    			<xs:attribute ref="xml:space" default="default"/>
    			<xs:attribute ref="xml:lang"/>
    		</xs:complexType>
    	</xs:element>
    
    	<!-- Translatable in-line element attributes -->
    	<xs:element name="ti">
    		<xs:complexType mixed="true">
    			<xs:attribute name="id" use="required" type="tm:InlineID"/>
    			<xs:attribute name="crc" use="required" type="tm:CRC"/>
    			<xs:attribute name="version" use="required" type="xs:IDREF"/>
    			<xs:attribute name="flag" use="required" type="tm:flag2"/>
    			<xs:attribute name="type" use="required">
    				<xs:simpleType>
    
    					<xs:union memberTypes="tm:type tm:Custom"/>
    				</xs:simpleType>					
    			</xs:attribute>
    			<xs:attribute name="name" use="required" type="xs:NMTOKEN"/>
    			<xs:attribute name="subflow" use="required" type="tm:YesNo"/>
    			<xs:attribute name="translate" use="optional" default="yes" type="tm:YesNo"/>
    			<xs:attribute ref="xml:space" default="default"/>
    			<xs:attribute ref="xml:lang"/>
    		</xs:complexType>
    
    	</xs:element>
    	<!-- Modification history -->
    	<xs:element name="mh">
    		<xs:complexType>
    			<xs:sequence>
    				<xs:element minOccurs="1" maxOccurs="unbounded" ref="tm:th"/>
    			</xs:sequence>
    		</xs:complexType>
    	</xs:element>
    
    	<!-- Text history -->
    	<xs:element name="th">
    		<xs:complexType>
    			<xs:attribute name="id" use="required" type="tm:UnitID"/>
    			<xs:attribute name="crc" use="required" type="tm:CRC"/>
    			<xs:attribute name="version" use="required" type="xs:IDREF"/>
    		</xs:complexType>
    	</xs:element>
    	<!-- Span element -->
    
    	<xs:element name="span">		
    		<xs:complexType>
    			<xs:sequence>
    				<xs:element minOccurs="0" maxOccurs="unbounded" ref="tm:attr"/>
    			</xs:sequence>
    			<xs:attribute name="name" use="required" type="xs:NMTOKEN"/>
    			<xs:attribute name="type" use="required">
    				<xs:simpleType>
    					<xs:union memberTypes="tm:type tm:Custom"/>
    
    				</xs:simpleType>					
    			</xs:attribute>
    		</xs:complexType>
    	</xs:element>
    	<!-- Attribute element -->
    	<xs:element name="attr">
    		<xs:complexType>
    			<xs:attribute name="name" use="required" type="xs:NMTOKEN"/>
    			<xs:attribute name="value" use="required" type="xs:string"/>
    
    		</xs:complexType>
    	</xs:element>
    </xs:schema>
    

    C. References

    Normative

    [DOM]

    Document Object Model (DOM) Level 3 Core Specification. W3C Document Object Model, April 2004

    [IANA Charsets]

    IANA Names for Character Sets. IANA (Internet Assigned Numbers Authority), Aug 2001

    [ISO 639]

    Codes for the Representation of Names of Languages. ISO (International Standards Organization), Nov 2001.

    [ISO 3166]

    Codes for the representation of names of countries and their subdivisions. ISO (International Organization for Standardization), Jun 2000.

    [ISO 8601]

    Representation of dates and times. ISO (International Organization for Standardization), Dec 2000.

    [RFC 3066]

    RFC 3066 Tags for the Identification of Languages. IETF (Internet Engineering Task Force), Jan 2001. This has now been replaced with RFC 4646.

    [RFC 4646]

    RFC 4646 Tags for the Identification of Languages. IETF (Internet Engineering Task Force), Sep 2006.

    [TMX 2.0]

    TMX Format Specifications. LISA (Localization Industry Standards association), January 2007.

    [SRX 2.0]

    SRX 2.0 Specification. LISA (Localization Industry Standards association), January 2007.

    [XLIFF 1.2]

    XLIFF 1.2 Specification. OASIS XLIFF Committee Specification, June 2006.

    [XML 1.0]

    Extensible Markup Language (XML) 1.0 Second Edition. W3C (World Wide Web Consortium), Oct 2000.

    [XML Names]

    Namespaces in XML. W3C (World Wide Web Consortium), Jan 1999.

    [XSLT]

    XSL Transformations (XSLT) Version 1.0. W3C (World Wide Web Consortium), Nov 1999.

    [W3C ITS Document Rules]

    Internationalization Tag Set (ITS) Version 1.0. W3C (World Wide Web Consortium), May 2006.

    Non-Normative

    [ISO]

    International Organization for Standardization Web site.

    [LISA]

    Localization Industry Standards Association Web site.

    [OASIS]

    Organization for the Advancement of Structured Information Standards Web site.

    [Unicode]

    Unicode Web site.

    [W3C]

    World Wide Web Consortium Web site.

    [XML Schema]

    XML Schema is a language for describing the structure and constraining the contents of XML documents.