www.alphaworks.ibm.comwww.ibm.com/developerwww.ibm.com

Home

XML4J Readme
Xerces Readme
Installation

Samples
API JavaDoc
XNI Manual
FAQs

Features
Properties

Release Info
Migration details
Limitations
Report a Bug

Introduction
 

This document describes the difference between XML4J 4.0.5 and previous versions of XML4J. It points out problems that may arise during migration and discusses solutions to the problems. It also highlights the new features available in the current version of the product.

Since XML4J 4.0.5 is identical to XML4J 4.0.0 in terms of its behaviour, except that a number of significant bugs have been fixed, we only mention XML4J 4.0.5 here. The differences between the various PTF's of XML4J 4.0.x are documented below. Those between 4.0.5 and 4.0.4 are:

  • Problem decoding XML files on Turkish systems was fixed;
  • Performance patches were applied so that NullPointerExceptions and ArrayIndexOutOfBoundsExceptions are no longer used as signals for array initialization or resizing.

Between XML4J 4.0.3 and 4.0.4, the fixes that were applied were:

  • Translations of XML4J error messages have been fully tested, corrected and verified.
  • A fix for schema validation of hexbin and base64 on non-English locales was applied.
  • DTD handling was fixed so that large DTD's using many parameter entities (such as DocBook) would be parsed correctly.
  • A bug in the HTMLSerializer relating to static mutability was fixed.

The exact differences between XML4J 4.0.2 and 4.0.3 are:

  • Fixes to ensure XML4J is statically immutable. These mainly involved restricting access to member variables and methods that application code should not have been using. Application code that did not use code from the org.apache.xerces.impl package, or any of its subpackages, should be absolutely unaffected by these changes.
  • A bug was fixed in which schema parsing code was not reset before a parse, so the schema validation code can now reliably handle schemaLocation attributes with extended lists of schemas.
  • A bug with serializing namespaces was fixed, so that serialization is fully compatible with XML4J 3.2.1.
  • A fix was applied to XML4J's classloading strategy. Now XML4J will search for implementations on the system class loader if no context class loader has been set or if class loader this throws a ClassNotFoundexception when attempting to invoke the desired class.
  • XML4J will now recognize IANA aliases for IBM-1140, and a number of other mappings between IANA names and JDK names were fixed.
  • All XML4J messages have been fully localized, and translations into many different languages have been provided.

There were also a number of bugs fixed between XML4J 4.0.1 and 4.0.2. They may be described as follows:

  • XML4J 4.0.2 reports errors in XML schema files with lines and column information appropriate for that file, instead of giving the line and column of the reference to the schema in the instance document as the location of the error.
  • XML Schema element declarations with mixed content and default values are now handled correctly when no value is present in the instance document.
  • All information pertinent to a request for the resolution of an XML schema document by an EntityResolver--e.g., the namespace of the schema, the context of the request, etc.--is now available via an internal API.
  • Methods added to the PSVI implementation to permit a form of access to the schema components by which the element or attribute in the instance was validated;
  • PSVI information for empty elements (e.g., <x ... />) is now returned (formerly only PSVI information for elements with content was made available);
  • An NPE is no longer thrown from DOMASBuilderImpl when a schema document is parsed by providing the parser with its URI in the case that the schema document is ill-formed.
  • Attribute values are provided even when schema validation is enabled but validation is disabled;
  • Error messages for XML Schema identity constraints have been systematically localized in an appropriate properties file.

The differences between XML4J 4.0.1 and XML4J 4.0.0 are as follows:

  • Problems decoding base64-encoded material on OS/390;
  • Problems with the XML serializer not producing namespace-wellformed documents;
  • Thread-safety problems in both the DTD and XML Schema validation routines;
  • Problems with default URI resolution when the URI contained a path of the form ../../;
  • Schema group declarations with empty content were not treated correctly;
  • DTD validator threw ArrayIndexOutOfBound exceptions with deeply nested elements;
  • DOM implementation was not Java-serializable;
  • The deferred DOM implementation no longer corrupts CDATA sections;
  • The DOM current-element-node property did not work correctly;
  • The DOM range package's NodeIterator had an error which produced NullPointerExceptions under certain circumstances;
  • Performance problems when reading from files, especially in EBCDIC environments.

In this document, we cover:

This new version of XML4J introduces the Xerces Native Interface (XNI), a complete framework for building parser components and configurations that is extremely modular. XML4J4 is the reference implementation of XNI, but other parser components, configurations, and parsers can be written using the Xerces Native Interface. XNI has achieved a high degree of stability, but changes are still possible.

For application writers preferring to use standard DOM level 1, level 2, SAX 1.0 or 2.0 or JAXP 1.1 API's to process XML, XML4J 4.0.1 provides at least the same level of conformance as XML4J 3.2.x. XML4J 4.0.1 now comes in two jarfiles: one, xmlParserAPIs.jar containing the standard API's that are implemented, the other, xercesImpl.jar, the implementation of those API's. With this exception, users of standard API's should find XML4J4.0.1 to be a drop-in replacement for XML4J 3.2.x.

In most cases, XML4J 4.0.1's various components are complete rewrites of the corresponding functionality of XML4J 3.2.x. The implementation of the W3C schema specification in particular has been rewritten so as to make schema construction and validation faster and less memory-intensive, and to remove the limitations that existed in the previous implementation. XML4J 4.0.1 also provides a measure of access to the post schema validation infoset (PSVI).


Quick summary of differences
 

The following table presents a quick summary of the major features which are present in each XML4J family. Symbols in the table have the following meanings:

  • "-": feature absent;
  • "X": completely supported;
  • "P": partially supported;
  • "D": present but deprecated.
Feature  XML4J 1.x  XML4J 2.x  XML4J 3.0.x  XML4J 3.1.x  XML4J 3.2.x  XML4J 4.0.1 
com.ibm.xml.parser.* 
com.ibm.xml.parsers.* 
org.apache.xerces.* 
Xerces Native Interface (XNI) internal API 
API's and Implementation in Xerces.jar 
API's in xmlParserAPIs.jar; Implementation in XercesImpl.jar 
TX DOM API support 
DOM level 2 API's 
DOM level 3 API's 
SAX 2.0 support 
JAXP 1.0 API 
JAXP 1.1 API 
Conformant to Sun JCK and CTS Test Suite 
XML Schema WD (04/2000) 
XML Schema Rec (05/2001) 
XML Schema Grammar Caching 
XML Schema Preparsing 
Access to the PSVI 
systemId's are URI's, not filenames 
Built-in serialization of DOM trees 
support for XPointer 
TXCatalog support 
XCatalog support 
Revalidating DOM parser 

Migrating from XML4J Version 1.x
 
Deprecated and Obsolescent Interfaces
 

In XML4J version 1.x, most of XML4J's functionality was located in the com.ibm.xml.parser package. Most of the code which user applications may have relied upon is in the Parser.java (for DOM level 1 support) and SAXDriver.java (for SAX 1.0 support) classes. Neither of these classes are present in XML4J4.0.1.

The DOM support is now found in the org.apache.xerces.parsers.DOMParser.java class, and the SAX support has been moved to the org.apache.xerces.parsers.SAXParser.java class. These classes support both the older DOM and SAX API's, so should be quite backward compatible.

Support for the TX DOM parser has also been completely removed from XML4J4.0.1. That is, none of the com.ibm.xml.parser.TX*.java classes are present. Code relying on this parser should be converted to rely on the DOM parser, through the org.w3c.dom.* classes or the org.apache.xerces.parsers.DOMParser class, or the SAX parser, through org.apache.xerces.parsers.SAXParser, depending on which aspects of the TXDOM interface are being used. For details regarding features of the TX DOM API that have no equivalent in currently-supported API's, see the section describing TX features with no equivalent.

The org.apache.xerces.parsers.DOMParser.java class has features which control whether it is validating, whether the construction of the DOM tree is deferred, etc.; the API docs should be consulted for details. The SAX parser has similar functionality.

Another change that will undoubtedly affect some users is that the SAX InputSource interface is now used for inputting XML streams rather than the com.ibm.xml.parser.Source class. This class is no longer included in XML4J.


Modified Behaviour
 

The most pervasive change in XML4J's behaviour between the two versions is the fact that SystemId fields in DTD DOCTYPEs and schemaLocations should be URI's and not filenames. Thus, "c:\files\file.dtd" should be replaced with "file:///c:/files/file.dtd". XML4J 4.0.1 contains code which tries to convert DOS filenames into URI's, but we cannot guarantee that this will succeed in all cases; for interoperability with other parsers, URI's should always be used.

In our effort to conform more strictly to specifications, we have also changed the behaviour by which text is added to element content in the DOM: in XML4J4.0.1, a "text" node must be created and added to an element as a child node.


New Features
 

Many new features and API's have been added to XML4J since the 1.x release. These are described below.

As our codebase has matured, a vast number of conformance and performance bugs have also been fixed. To enhance performance, we have also implemented a feature (http://apache.org/xml/features/dom/defer-node-expansion) that allows the DOM parser only to expand nodes of the DOM tree when necessary.



Migrating from XML4J Version 2.x
 
Deprecated and Obsolescent Interfaces
 

In XML4J version 1.x, most of XML4J's functionality was located in the com.ibm.xml.parser package. This code, in the Parser.java (for DOM level 1 support) and SAXDriver.java (for SAX level 1 support) classes, was preserved in XML4J2.x for backward compatibility. It has been removed from XML4J4.0.1. Neither of these classes are present in XML4J4.0.1.

The DOM support is now found in the org.apache.xerces.parsers.DOMParser.java class, and the SAX support has been moved to the org.apache.xerces.parsers.SAXParser.java class. These classes support both the older DOM and SAX API's, so should be quite backward compatible.

Support for the TX DOM parser has also been completely removed from XML4J4.0.1. That is, none of the com.ibm.xml.parser.TX*.java classes are present, nor are the com.ibm.xml.parsers.TXParser.java, com.ibm.xml.parsers.NonvalidatingTXParser.java or com.ibm.xml.parsers.RevalidatingTXParser.java classes. The TX DOM API effectively represents a superset of the W3C DOM level 1 API. Many of the TX DOM's functions, such as its handling of namespaces, are available in the DOM level 2 API, included in the org.w3c.dom.* classes and fully implemented in XML4J; see the org.apache.xerces.parsers.DOMParser class, for instance. However, there are certain features of the TX DOM API that have no equivalent within the DOM level 2 API. Some of the more significant are:

  1. The DOM has no "write validation"-- i.e., one cannot ask "What can legally be inserted here?"
  2. The DOM does not offer as much flexibility in terms of accessing the XML prolog (the version and encoding attributes, for example). That is, methods like com.ibm.xml.parser.TXDocument#getEncoding() have no equivalent in W3C DOM level 2).

However, it should be noted that experimental support for some aspects of DOM level 3 has been added; see DOM level 3 discussion for details. The parser also no longer offers support for XPointer. However, support for namespaces has been added to both the DOM level 2 and the SAX 2.0 API's which XML4J 4.0.1 fully supports. Code relying on any of these classes should be converted to rely on the DOM parser, through the org.w3c.dom.* classes or the org.apache.xerces.parsers.DOMParser class, or the SAX parser, through org.apache.xerces.parsers.SAXParser, depending on which aspects of the TXDOM interface are being used.

The org.apache.xerces.parsers.DOMParser.java class has features which control whether it is validating, whether the construction of the DOM tree is deferred, etc. The API docs should be consulted for details. The SAX parser has similar functionality.

Another change that will undoubtedly affect some users is that the SAX InputSource interface is now used for inputting XML streams rather than the com.ibm.xml.parser.Source class. This class is no longer included in XML4J3.2.x. It is also of note that TXCatalog support (implemented in the com.ibm.xml.internal.TXCatalog.java class) no longer exists in the parser; XCatalog support has been similarly discontinued.

In previous versions of XML4J (3.0.x and 3.1.x), four classes (for validating and nonvalidating DOM and SAX parsers) from the com.ibm.xml.parsers package had been preserved for backward compatibility. These classes are no longer included in versions of XML4J later than 3.2.0; applications must make use of the classes provided in the org.apache.xerces.parsers package.

It should be noted that, instead of supplying separate validating and nonvalidating parsers, a Configurable API is used to control whether the supplied parser is validating or not. Validation is turned on for either the SAX or DOM parser by setting the http://xml.org/sax/features/validation feature to true; please consult the SAX API on how to do this (the method to do this also works with the DOMParser class).

The RevalidatingDOMParser class has also been completely removed because of difficulties relating to maintenance.


Modified Behaviour
 

The most pervasive change in XML4J's behaviour between the two versions is the fact that SystemId fields in DTD DOCTYPEs and schemaLocations should be URI's and not filenames. Thus, "c:\files\file.dtd" should be replaced with "file:///c:/files/file.dtd". XML4J 4.0.1 contains code which tries to convert DOS filenames into URI's, but we cannot guarantee that this will succeed in all cases; for interoperability with other parsers, URI's should always be used.


New Features
 

Many new features and API's have been added to XML4J since the 2.x release. Some of these are:

  • The DOM level 2 Core, Events, Ranges and Traversal API's have all been fully implemented;
  • SAX 2.0, which implements namespaces among other features, has been implemented. The SAX parser, both for SAX version 1.0 and 2.0, can also be used for validation;
  • JAXP versions 1.0 and 1.1 are now included;
  • In addition to being fully conformant to the XML 1.0 spec, XML4J 4.0.1 also conforms to the W3C's XML Schema Recommendation version 1.0;
  • Unlike previous versions of XML4J, XML4J4.0.1 offers a means to get at the information in the PSVI;
  • XML4J 4.0.1 provides a means for preparsing W3C schema documents, then caching the compiled versions for later use in validating instance documents (grammar caching);
  • XML4J 4.0.1 also supports many configuration options based on the SAX setFeature() interface;
  • XML4J 4.0.1's XNI API can be used to provide a tremendous amount of flexibility, including creating custom parsers;
  • A package, org.apache.xml.serialize, for serializing DOM trees is also included;
  • Packages to construct a DOM tree from an HTML (org.apache.html.dom) or a WML (org.apache.wml.dom) document have been added.

As our codebase has matured, a vast number of conformance and performance bugs have also been fixed. To enhance performance, we have also implemented a feature (http://apache.org/xml/features/dom/defer-node-expansion) that allows the DOM parser only to expand nodes of the DOM tree when necessary. It is noteworthy, however, that do to the addition of so many new features and API's, XML4J4.0.1 may be somewhat slower when parsing certain files than XML4J2.x, both in validating and nonvalidating mode.



Migrating from XML4J Version 3.0.x
 

Many new features have been added to XML4J since the XML4J v3.0.x parsers were released; see the description of new features as compared with XML4J 3.2.x for details. Nonetheless, in most cases, code that works with XML4J 3.0.x should work without change with XML4J4.0.1, except if it relies on the com.ibm.xml.parsers package, which was provided for backwards compatibility with XML4J 2.0.x. This package has been entirely removed, and code relying upon it needs to be modified to work with the org.apache.xerces.parsers package.


Migrating from XML4J 3.1.x
 

Many new features have been added to XML4J since the XML4J v3.1.x parsers were released; see the description of new features as compared with XML4J 3.2.x for details. Nonetheless, in most cases, code that works with XML4J 3.1.x should work without change with XML4J4.0.1. There are two main exceptions to this rule, however.

Code which relies on the com.ibm.xml.parsers package, which was provided for backwards compatibility with XML4J 2.0.x, will no longer function since this package has been entirely removed. Such code needs to be modified to work with the org.apache.xerces.parsers package.

XML4J 3.1.x parsers supported a subset of the W3C Schema 1.0 Working Draft of October, 2000. XML4J 3.2.x and XML4J 4.0.1 support the W3C Schema Recommendation of May 2001. As a result of changes in the specification between its Working Draft and final version, all schema documents validated by XML4J 3.1.x will no longer be validated either by XML4J 3.2.x or XML4J 4.0.1. An understanding of the Schema recommendation will be needed to determine precisely how to modify the schemas so that they come into compliance with the W3C's recommendation, but it will help to realize that the schema namespace has changed to http://www.w3.org/2001/XMLSchema and that XML4J will only use schemas from this namespace. Instance documents should refer to the schemaLocation and noNamespaceSchemaLocation attributes taken from the http://www.w3.org/2001/XMLSchema-instance namespace.

Like XML4J 3.2.x, XML4J 4.0.1 also fully implements the JAXP specification version 1.1. This is a change from the previous 3.1.x versions of the parser, which supported JAXP 1.0.


Migrating from XML4J Version 3.2.x
 

XML4J 4.0.1 is considerably more feature-rich than were either XML4J 3.2.0 or XML4J 3.2.1. Care has been taken to make XML4J 4.0.1 as much of a drop-in replacement for XML4J 3.2.X as possible. Nonetheless, issues which affect some users exist and are discussed below, along with a summary of the many new features that have been added in this release.

Deprecated Interfaces and Behaviour Modifications
 

All XML4J 3.x distributions contained one parser jar file called xerces.jar. In order to lessen duplication with many other products that ship with similar API's as those implemented by XML4J, we have divided this jar in XML4J 4.0.1 into two files:

  • xmlParserAPIs.jar: contained the standard API's that XML4J implements--i.e., those in the org.w3c.dom, org.xml.sax, and jaxp.xml.parsers package hierarchies;
  • xercesImpl.jar: contains XML4J's implementation of these API's

While XML4J 4.0.1 has only been thoroughly tested--and can only be supported--when both jar files are used, it may be possible in certain cases to use xmlParserAPIs.jar instead of API jarfiles supplied with other products, or vice versa depending on product requirements. A very thorough understanding of the API's included in the relevant jarfiles needs to be attained before this can be contemplated. As with any other technical question about the product, we would be pleased to help examine specific situations.

To help ease users' transitions, we provide in this release a distribution containing the old-style "unified" jar file. This distribution has the same name as the standard binary distributions, except the word depeecated is prepended to the name.

Another change that will impact some users relates to the DOM level 3 support that was provided in XML4J 3.2.1. In order for XML4J 4.0.1 to conform to Sun's test suites for the J2EE 1.3 specification (the CTS tests) and the JCK 1.4 specification (the JCK tests), we were obliged to repackage our initial DOM level 3 support for this release. XML4J 4.0.1 offers some support for the DOM level 3 Core, Abstract Schemas, and Load/Save Working Drafts.

In summary, the DOM level 3 functionality that was present in XML4J 3.2.1 has been retained in XML4J 4.0.1; to access it, however, the user can no longer make method calls on, for instance, org.w3c.dom.Document objects directly. Instead, they must cast down to the org.apache.xerces.dom.DocumentImpl on which they may make the same method calls. The same is true of other DOM level 3 methods that XML4J 4.0.1 supports.

An indication of the DOM level 3 functionality that XML4J 4.0.1 supports can be found by examining the org.apache.xerces.dom3 package. The reader is also urged to consult the DOM level 3 section of this documentation.


New Features of XML4J 4.0.1
 

Many features have been added to XML4J 4.0.1 that did not exist, or were present only in an incomplete state, in XML4J 3.2.x. Perhaps the most interesting of these is the Xerces Native Interface (XNI) API. This API was designed as a general-purpose XML parsing API; modularity, flexibility, and information losslessness were its top goals. For more information on the API, see the manual, included in this documentation. Application writers who have specific needs should be able to write their own custom components, integrate them with standard components shipped with XML4J 4.0.1 and thus create software optimal for their specific needs.

As well as a completely re-architected XML Schema implementation, XML4J 4.0.1 also provides access to the post-schema validation infoset (PSVI) of an XML document validated by an XML Schema. For information on how XNI was exploited to bring this about, and how through XNI all of the PSV information can be accessed by an application, see the Core section of the XNI manual. For information on how XML4J 4.0.1 produces an XML representation of the PSVI similar to that produced by Henry Thomson's XSV tool, see the PSVIWriter and PSVIConfiguration sections of the XNI sample documentation.

XML4J 4.0.1 also provides a means of validating XML Schema documents without having to provide an instance. This same facility can be used, in conjunction with the DOMParser, to parse and store XML Schema grammars in advance of validating documents with them, and therefore to obtain a very considerable performance gain. For information on this, see the DOMAsBuilder sample in the DOM samples documentation.

Finally, XML4J 4.0.1 allows applications to make use of some Xerces-specific features when XML4J is available on the system in a way that will not interfere with normal operation when other parsers that do not support such features are used. For a brief description of this, see the release documentation after consulting the XNI manual to find out what parser configurations are.

The release of XML4J 4.0.1 also represents many bugfixes, especially with respect to the XML Schema implementation and conformance to JAXP 1.1 and SAX 2.0. Since, as has been mentioned above, XML4J 4.0.1 is a very substantial rewrite of XML4J 3.2.x, it is not known what performance characteristics the parser will exhibit in all conditions. Nonetheless, particularly if advanced features like XNI parser configurations are used to optimize the parser for the task at hand, we are confident that performance will generally be at least as good as in previous versions.