start

TBX

TBX is the LISA standard for terminology and term exchange.

For information on more file formats, see conformance.

References

You might also be interested in reading about TBX-Basic - a simpler, reduced version of TBX with most of the useful features included.

Standard conformance

Done

Todo

Implementation notes for missing features

Note here:

Synonyms

NLS: Extra listing
TBX:

<termNote type="termNote">synonym</termNote>

according to this TBX documentation. In another place:

<termNote type="termType">synonym</termNote>

inside a <termGrp>, following <term>

Definition

NLS: term {definition/contextual information}
TBX

<descripGrp>
   <descrip type="definition">The longish definition of the term</descrip>
</descripGrp>

inside langSet <descript> can probably be used directly under langSet

Context

NLS: term {definition/contextual information} (see above)
TBX:

<descrip type="context">A usually somewhat longer contextual sentence.</descrip>

inside <ntig>

Parts of speech

NLS: term v. (or adj, or n.)
TBX:

<termNote type="partOfSpeech" >noun</termNote>

following <term>

Cross reference

NLS: alternate term → real lemma
TBX: <ref> TODO

Abbreviations

NLS: same as alternate term: a.m. → before noon
TBX: TODO

TBX cheat sheet

  1. source word in English
  2. definition in English
  3. translation of source word to XX
  4. definition in XX
  5. comment
  6. syntactic group
  7. one or more tags
  8. a reference number
<termEntry id="4324 (8)">
    <note>tag1, tag2, tag3 (7) -
(Actually not clear what the best mapping to TBX is in this case.)</note>
    <langSet xml:lang="en">
        <tig>
            <term>sound (1)</term>
            <termNote type="partOfSpeech">noun (6)</termNote>
        </tig>
        <descripGrp>
            <descrip type="definition">Something you can hear (2) -
definition with an associated external source)</descrip>
            <xref type="xSource" target="http://www.something.org/?id=234">Glossmaster</xref>
        </descripGrp>
        <note>Any random note about the term. (5)
(Actually there are ways of storing pretty specific stuff in specific spaces,
but while it seems the comment could be a more verbose definition, examples,
usage notes or anything else, we'll use this generic way.)
        </note>
    </langSet>
    <langSet xml:lang="af">
        <tig>
            <term>klank (3)</term>
        </tig>
        <descrip type="definition">Iets wat jy kan hoor (4) -
definition without an external source)</descrip>
        <note>A note in the target language (5).</note>
    </langSet>
</termEntry>

Note that the <xref> tags are optional (as are just about everything except termEntry, langSet and tig). They allow to link to an external source. An internal source can also be specified, or the definition can be specified without a source as shown for the term “klank”.

TBX requirements by Galician translation team (Proxecto Trasno)

Here you have a list of TBX requirements needed by the Galician translation team (Proxecto Trasno). Its translation to english is below. You can see a terminology management system software specification draft in http://translate.sourceforge.net/wiki/developers/terminology_management_system

A very important feature is to allow the exporting using pretty printing (like in the first example below) since the exported glossaries should be able to be read both by humans and software.

Before the example you can see a list priorizing the features from more interesting and needed to less interesting and needed.

The chosen TBX tags are determined by the needs of our terminology management system (the galician translation team one). That terminology management system needs several glossaries, each glossary has several concepts, and each concept can have several definitions (only one definition per language in a given concept), and also can have several translations for each concept (several translations per language in a given concept). The concepts will also have associated some links to get more information (several links per language in a given concept). Also is needed to have defined several languages. Now we have a list of all the needed entities lets go with the list of attributes for each of that entities:

Each glossary has a name and a description.

Each concept has an unique id, a subject field (which is another concept in the same glossary), it can have several concepts that people may wish to see (lets call it related concepts), and it can also have a parent concept (broader concept).

Each link has a type (image, Wikipedia page,…), the address of the link, and a tiny description.

Each definition has a definition text.

We want to save the ISO 639 code of each language.

Each translation can have a translation text, it has an unique id, the part of speech, the grammatical genre (if applicable), the grammatical number (if applicable), a field that indicates if the translation is an abbreviation or an acronym, an explaining note, examples of use (created by the people that make the terminology), links to examples of real use (a corpus or translation database), a field that indicates if the translation is completed or if it is still incomplete (completion status), and we also need to save the translation administrative status (if it is a recommedend translation, a not recommended one, or if it is a forbidden translation) and the reason why the translation has the actual administrative status (a simple text string) that only applies when the administrative status is other than “recommended”.

Once listed the needs we proceeded with reading the TBX ISO 300042 standard in search of the elements that support these needs, and we found at least one tag (or attribute) for every need, except for only a few that doesn't have. We should comment that TBX stores the information grouping it by concepts, and within each concept part of the information is stored at the beginning of the concept and other part of that information (the language-dependant information) is splited between the different languages, and within every language section it is splited another time between the translations of that language. This way it has a three level structure: concept level, language level and translation level (also called term level).

Next we list the needs and the tag chosen for that need, indicating the level in which the tag goes:

Below you can see a diagram that shows the levels and the data that goes in each level. Click on the image to enlarge.

Features priorization

The upper ones are the most needed and interesting:

Example for galician TBX requirements

 
 
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE martif SYSTEM 'TBXcoreStructV02.dtd'>
<martif type='TBX' xml:lang='en'>
    <martifHeader>
        <fileDesc>
            <titleStmt>
                <title>Localization glossary</title>
            </titleStmt>
            <sourceDesc>
                <p>Test glossary with concepts from software localization...</p>
            </sourceDesc>
        </fileDesc>
        <encodingDesc>
            <p type='XCSURI'>http://www.lisa.org/fileadmin/standards/tbx/TBXXCSV02.xcs</p>
        </encodingDesc>
    </martifHeader>
    <text>
        <body>
 
 
            <termEntry id="cid-23">
                <descrip type="subjectField">computer science</descrip><!-- enclosed text in english since it is the glossary 
                language (see martif opening tag) -->
                <ref type="crossReference" target="cid-12">microprocessor</ref><!-- enclosed text in english since it is the 
                glossary language (see martif opening tag) -->
                <ref type="crossReference" target="cid-16">keyboard</ref><!-- enclosed text in english since it is the glossary 
                language (see martif opening tag) -->
                <descrip type="broaderConceptGeneric" target="cid-7">hardware</descrip><!-- enclosed text in english since it is 
                the glossary language (see martif opening tag) -->
 
                <langSet xml:lang="en">
                    <descrip type="definition">A computer is a programmable machine that receives input, stores and manipulates 
data, and provides output in a useful format.</descrip>
                    <xref type="xGraphic" target="http://en.wikipedia.org/wiki/File:HPLaptopzv6000series.jpg">computer image</xref>
                    <xref type="externalCrossReference" target="http://en.wikipedia.org/wiki/Computer">English Wikipedia computer page</xref>
 
                    <tig id="tid-59">
                        <term>computer</term>
                    </tig>
                    <tig>
                        <term>PC</term>
                        <termNote type="termType">acronym</termNote><!-- "PC" is an acronym of "Personal Computer" -->
                        <termNote type="administrativeStatus">admittedTerm-admn-sts</termNote>
                        <termNote type="usageNote">Do not abuse of using this translation.</termNote>
                    </tig>
                    <tig>
                        <term>comp.</term>
                        <termNote type="termType">abbreviation</termNote><!-- "comp." is an abbreviation of "computer" -->
                        <termNote type="administrativeStatus">admittedTerm-admn-sts</termNote>
                    </tig>
                </langSet>
 
                <langSet xml:lang="es">
                    <descrip type="definition">Máquina  electrónica que recibe y procesa datos para convertirlos en información 
útil</descrip><!-- definition text in spanish -->
 
                    <tig>
                        <term>sistema</term>
                        <termNote type="administrativeStatus">admittedTerm-admn-sts</termNote>
                    </tig>
                    <tig>
                        <term>equipo</term>
                        <termNote type="administrativeStatus">deprecatedTerm-admn-sts</termNote>
                        <termNote type="processStatus">provisionallyProcessed</termNote>
                    </tig>
                    <tig>
                        <term>ordenador</term>
                        <termNote type="partOfSpeech">noun</termNote>
                        <termNote type="grammaticalGender">masculine</termNote>
                        <termNote type="grammaticalNumber">singular</termNote>
                        <termNote type="administrativeStatus">preferredTerm-admn-sts</termNote>
                        <descrip type="context">El ordenador personal ha supuesto la generalización de la informática.</descrip><!-- example phrase -->
                        <xref type="corpusTrace" target="http://es.en.open-tran.eu/suggest/ordenador">ordenador en open-tran.eu</xref><!-- enclosed text in spanish -->
                    </tig>
                    <tig>
                        <term>computador</term>
                        <termNote type="administrativeStatus">deprecatedTerm-admn-sts</termNote>
                    </tig>
                    <tig>
                        <term>computadora</term>
                        <termNote type="administrativeStatus">deprecatedTerm-admn-sts</termNote>
                    </tig>
                </langSet>
 
                <langSet xml:lang="fr">
                    <descripGrp><!-- Using descripGrp tags for enclosing the definition and its source -->
                        <descrip type="definition">Un ordinateur est une machine dotée d'une unité de traitement lui permettant 
d'exécuter des programmes enregistrés. C'est un ensemble de circuits électroniques permettant de manipuler des données sous forme 
binaire, ou bits. Cette machine permet de traiter automatiquement les données, ou informations, selon des séquences d'instructions 
prédéfinies appelées aussi programmes.
                        Elle interagit avec l'environnement grâce à des périphériques comme le moniteur, le clavier, la souris, 
l'imprimante, le modem, le lecteur de CD (liste non-exhaustive). Les ordinateurs peuvent être classés selon plusieurs critères 
(domaine d'application, taille ou architecture).</descrip>
                        <xref type="xSource" target="http://fr.wikipedia.org/wiki/Ordinateur">Wikipedia: ordinateur</xref>
                    </descripGrp>
 
                    <tig>
                        <term>ordinateur</term>
                    </tig>
                </langSet>
            </termEntry>
 
 
            <termEntry id="cid-27"><!-- Another concept -->
                <descrip type="subjectField">computer science</descrip>
 
                <langSet xml:lang="en">
                    <descrip type="definition">A technical standard is an established norm or requirement. It is usually a formal 
document that establishes uniform engineering or technical criteria, methods, processes and practices. In contrast, a custom, 
convention, company product, corporate standard, etc. which becomes generally accepted and dominant is often called a de facto standard.</descrip>
 
                    <tig>
                        <term>standard</term>
                        <termNote type="partOfSpeech">noun</termNote>
                        <termNote type="administrativeStatus">preferredTerm-admn-sts</termNote>
                    </tig>
                </langSet>
 
                <langSet xml:lang="gl">
                    <descrip type="definition">Norma que mediante documentos técnicos fixa a especificación de determinado tema.</descrip>
 
                    <tig>
                        <term>estándar</term>
                        <termNote type="administrativeStatus">preferredTerm-admn-sts</termNote>
                    </tig>
 
                    <tig>
                        <term>standard</term>
                        <termGrp><!-- Example of administrative status along with its reason -->
                            <termNote type="administrativeStatus">deprecatedTerm­admn­sts</termNote>
                            <note>Razón: anglicismo</note><!-- the translation of the enclosed text is: "Reason: anglicism" -->
                        </termGrp>
                    </tig>
                </langSet>
            </termEntry>
 
        </body>
    </text>
</martif>