gtpx1m07XML User's Guide

XML Support on TPF

XML4C parser 3.5.1 (APAR PJ28176) is a port of XML Parser for C++ (XML4C) Version 3.5.1 to the TPF 4.1 system. The parser is XML Version 1.0 compliant and allows TPF 4.1 applications written in C++ language to do the following:

In addition, the parser fully implements the ability to use namespaces in support of unique tagging structures.

IBM contributed the XML4C parser to the Apache XML Project (http://xml.apache.org) as open source in November 1999. XML4C Version 3.5.1 is based on Xerces-C Version 1.5.0 and is fully compliant with the Unicode 3.0 specification. While the Apache Xerces-C parser can be updated by the open source community, the XML4C parser is maintained only by IBM and may differ from Xerces-C.

Character Encodings

XML documents, DTDs, and XML Schema documents require that you declare which version of XML you are using as well as what encoding you are using. This declaration is done in the first line and is similar to the following: <?xml version="1.0" encoding="ISO-8859-1"?>

Note: In general, parsers often have the ability to auto-detect certain encodings. When using this version of the XML4C parser, you do not need to specify the encoding when your documents are written in either UTF-8, UTF-16 Little Endian, or UTF-16 Big Endian.

The following table shows which character encodings are supported on TPF. The first column indicates the encoding and the second column lists common names associated with that encoding. The third column shows acceptable values for the encoding= portion of the XML declaration. The fourth column indicates if the encoding is supported on TPF, and the last column indicates if the encoding is supported in XML4C version 3.5.1. Note that some encodings supported in XML4C version 3.5.1 are not supported on TPF and some encodings supported on TPF are not supported in XML4C version 3.5.1.

Table 1. XML Character Encodings Supported on TPF

Encoding Common Name Declaration (encoding= ) Supported on TPF 4.1 Supported in XML4C
ASCII  

US-ASCII

USASCII

ASCII

US_ASCII


X X
IBM037 1 EBCDIC US

EBCDIC-CP-US

IBM037


X X
IBM500 1  

IBM-500


X  
IBM1047 1 2  

IBM-1047


X  
IBM1140 1 EBCDIC with Euro symbol

IBM1140


X X
ISO-8859-1 ISO Latin 1

ISO8859-1

ISO-8859-1

ISO_8859-1

IBM-819

IBM819

LATIN1

LATIN-1

LATIN_1


X X
UTF-8 8-bit Unicode

UTF-8

UTF8


X X
UTF-16 Little Endian  

UTF-16 (LE)

UTF-16LE

UTF-16

UCS2

IBM1200

IBM-1200


X X
UTF-16 Big Endian  

UTF-16 (BE)

UTF-16BE

UTF-16

UCS2

IBM1200

IBM-1200


X X
UCS4 Little Endian  

UCS-4 (LE)

UCS-4LE

UCS4

UCS-4

UCS_4


X X
UCS4 Big Endian  

UCS-4 (BE)

UCS-4BE

UCS4

UCS-4

UCS_4


X X
Windows-1252  

WINDOWS-1252


X X
Big5 Chinese, Big5     X
euc-kr Korean, Extended UNIX code     X
gb2312 Chinese, PRC     X
ISO-8859-2 ISO Latin 2     X
ISO-8859-3 ISO Latin 3     X
ISO-8859-4 ISO Latin 4     X
ISO-8859-5 ISO Latin Cyrillic     X
ISO-8859-6 ISO Latin Arabic     X
ISO-8859-7 ISO Latin Greek     X
ISO-8859-8 ISO Latin Hebrew     X
ISO-8859-9 ISO Latin 5     X
koi8-r Cyrillic     X
Shift_JIS Japanese, Shift JIS     X

Notes:

  1. This encoding is an EBCDIC code page.

  2. IBM1047 is the code page used by the C language compiler for TPF.

How satisfied are you with this encoding support? If you would like support for additional encodings that are not currently supported on TPF, contact your TPF service representative to open a requirement or enhancement request.