



















|
| |
This document describes the difference between XML4J
4.0.5 and previous versions of XML4J. It points out
problems that may arise during migration and discusses
solutions to the problems. It also highlights the new
features available in the current version of the product.
Since XML4J 4.0.5 is identical to XML4J 4.0.0 in terms of
its behaviour, except that a number of significant bugs
have been fixed, we only mention XML4J 4.0.5 here. The
differences between the various PTF's of XML4J 4.0.x
are documented below. Those between 4.0.5 and
4.0.4 are:
- Problem decoding XML files on Turkish systems
was fixed;
- Performance patches were applied so that NullPointerExceptions and
ArrayIndexOutOfBoundsExceptions are no longer
used as signals for array initialization or resizing.
Between XML4J 4.0.3 and 4.0.4, the fixes that were
applied were:
-
Translations of XML4J error messages have been
fully tested, corrected and verified.
-
A fix for schema validation of hexbin and base64 on
non-English locales was applied.
-
DTD handling was fixed so that large DTD's using many
parameter entities (such as DocBook) would be parsed
correctly.
-
A bug in the HTMLSerializer relating to static
mutability was fixed.
The exact differences between XML4J 4.0.2 and 4.0.3 are:
- Fixes to ensure XML4J is statically immutable. These
mainly involved restricting access to member variables and
methods that application code should not have been using.
Application code that did not use code from the
org.apache.xerces.impl package, or any of its
subpackages, should be absolutely unaffected by these
changes.
- A bug was fixed in which schema parsing code was not reset before a
parse, so the schema validation code can now
reliably handle schemaLocation attributes with extended lists of
schemas.
- A bug with serializing namespaces was fixed, so that
serialization is fully compatible with XML4J 3.2.1.
- A fix was applied to XML4J's classloading strategy. Now
XML4J will search
for implementations on the system class loader if no context
class loader has been set or if class loader this throws a
ClassNotFoundexception when attempting to invoke the desired
class.
- XML4J will now recognize IANA aliases for IBM-1140, and a number of other
mappings between IANA names and JDK names were fixed.
- All XML4J messages have been fully localized, and
translations into many different languages have been
provided.
There were also a number of bugs fixed between XML4J 4.0.1 and
4.0.2. They may be described as follows:
-
XML4J 4.0.2 reports errors in XML schema files with lines and column information appropriate
for that file, instead of giving the line and column of the reference to the schema
in the instance document as the location of the error.
-
XML Schema element declarations with mixed content and
default values are now handled correctly when no value
is present in the instance document.
-
All information pertinent to a request for the resolution of an XML schema
document by an EntityResolver--e.g., the namespace of the schema,
the context of the request, etc.--is now available via an internal API.
-
Methods added to the PSVI implementation to permit a form of access to the schema
components by which the element or attribute in the instance was validated;
-
PSVI information for empty elements (e.g., <x ... />) is now returned
(formerly only PSVI information for elements with content
was made available);
-
An NPE is no longer thrown from DOMASBuilderImpl when a
schema document is parsed by providing the parser with
its URI in the case that the schema document is ill-formed.
-
Attribute values are provided even when schema validation
is enabled but validation is disabled;
-
Error messages for XML Schema identity constraints have
been systematically localized in an appropriate properties
file.
The differences between XML4J 4.0.1 and XML4J 4.0.0 are as follows:
-
Problems decoding base64-encoded material on OS/390;
-
Problems with the XML serializer not producing
namespace-wellformed documents;
-
Thread-safety problems in both the DTD and XML Schema
validation routines;
-
Problems with default URI resolution when the URI
contained a path of the form
../../ ;
-
Schema group declarations with empty content were not
treated correctly;
-
DTD validator threw ArrayIndexOutOfBound exceptions with
deeply nested elements;
-
DOM implementation was not Java-serializable;
-
The deferred DOM implementation no longer corrupts CDATA
sections;
-
The DOM current-element-node property did not work
correctly;
-
The DOM range package's
NodeIterator had an
error which produced NullPointerExceptions under certain
circumstances;
-
Performance problems when reading from files, especially
in EBCDIC environments.
In this document, we cover:
This new version of XML4J introduces the Xerces Native Interface
(XNI), a complete framework for building parser components and
configurations that is extremely modular. XML4J4
is the reference implementation of XNI, but other parser components,
configurations, and parsers can be written using the Xerces Native
Interface. XNI has achieved a high degree of stability, but changes
are still possible.
For application writers preferring to use standard DOM level 1, level
2, SAX 1.0 or 2.0 or JAXP 1.1 API's to process XML, XML4J 4.0.1
provides at least the same level of conformance as XML4J 3.2.x. XML4J 4.0.1 now
comes in two jarfiles: one, xmlParserAPIs.jar containing the
standard API's that are implemented, the other,
xercesImpl.jar , the
implementation of those API's. With this exception, users of
standard API's should find XML4J4.0.1 to be a drop-in replacement for
XML4J 3.2.x.
In most cases, XML4J 4.0.1's various components are complete
rewrites of the corresponding functionality of XML4J 3.2.x. The
implementation of the W3C schema specification in particular has
been rewritten so as to make schema construction and validation faster and less
memory-intensive, and to remove the limitations that existed in the
previous implementation. XML4J 4.0.1 also provides a measure of access to the post schema validation infoset (PSVI).
|
 |  |  |  | Migrating from XML4J Version 1.x |  |  |  |  |
| |
 |  |  |  | Deprecated and Obsolescent Interfaces |  |  |  |  |
| |
In XML4J version 1.x, most of XML4J's functionality was
located in the com.ibm.xml.parser package. Most of the
code which user applications may have relied upon is in
the Parser.java (for DOM level 1 support) and
SAXDriver.java (for SAX 1.0 support) classes.
Neither of these classes are present in XML4J4.0.1.
The DOM support is now found in the
org.apache.xerces.parsers.DOMParser.java class, and
the SAX support has been moved to the
org.apache.xerces.parsers.SAXParser.java class. These
classes support both the older DOM and SAX API's, so
should be quite backward compatible.
Support for the TX DOM parser has also been completely
removed from XML4J4.0.1. That is, none of the
com.ibm.xml.parser.TX*.java classes are
present.
Code relying on this parser should be converted to rely
on the DOM parser, through the org.w3c.dom.* classes or
the org.apache.xerces.parsers.DOMParser class, or the
SAX parser, through
org.apache.xerces.parsers.SAXParser ,
depending on which aspects of the TXDOM interface are
being used. For details regarding features of the TX DOM
API that have no equivalent in currently-supported API's,
see the section describing TX features
with no equivalent.
The org.apache.xerces.parsers.DOMParser.java class has
features which control whether it is validating, whether
the construction of the DOM tree is deferred, etc.; the
API docs should be consulted for details. The SAX
parser has similar functionality.
Another change that will undoubtedly affect some users is
that the SAX InputSource interface is now used for
inputting XML streams rather than the
com.ibm.xml.parser.Source class. This class is no longer
included in XML4J.
|
| |
The most pervasive change in XML4J's behaviour between
the two versions is the fact that SystemId
fields in DTD DOCTYPEs and schemaLocations should be URI's
and not filenames. Thus, "c:\files\file.dtd" should be
replaced with "file:///c:/files/file.dtd".
XML4J 4.0.1 contains code which tries to convert DOS filenames
into URI's, but we cannot guarantee that this will
succeed in all cases; for interoperability with other parsers,
URI's should always be used.
In our effort to conform more strictly to specifications,
we have also changed the behaviour by which text is added
to element content in the DOM: in XML4J4.0.1, a "text"
node must be created and added to an element as a child
node.
|
| |
Many new features and API's have been added to XML4J
since the 1.x release. These are described below.
As our codebase has matured, a vast number of conformance
and performance bugs have also been fixed. To enhance
performance, we have also implemented a feature
(http://apache.org/xml/features/dom/defer-node-expansion ) that
allows the DOM parser only to expand nodes of the DOM
tree when necessary.
|
|
 |  |  |  | Migrating from XML4J Version 2.x |  |  |  |  |
| |
 |  |  |  | Deprecated and Obsolescent Interfaces |  |  |  |  |
| |
In XML4J version 1.x, most of XML4J's functionality was
located in the com.ibm.xml.parser package. This code, in
the Parser.java (for DOM level 1 support) and
SAXDriver.java (for SAX level 1 support) classes, was
preserved in XML4J2.x for backward compatibility. It has
been removed from XML4J4.0.1. Neither of these classes are present in XML4J4.0.1.
The DOM support is now found in the
org.apache.xerces.parsers.DOMParser.java class, and the
SAX support has been moved to
the org.apache.xerces.parsers.SAXParser.java class. These
classes support both the older DOM and SAX API's, so
should be quite backward compatible.
Support for the TX DOM parser has also been completely
removed from XML4J4.0.1. That is, none of the
com.ibm.xml.parser.TX*.java classes are present, nor are
the com.ibm.xml.parsers.TXParser.java ,
com.ibm.xml.parsers.NonvalidatingTXParser.java
or com.ibm.xml.parsers.RevalidatingTXParser.java classes.
The TX DOM API effectively represents a superset of the
W3C DOM level 1 API. Many of the TX DOM's functions,
such as its handling of namespaces, are available in the
DOM level 2 API, included in the
org.w3c.dom.* classes and fully implemented in
XML4J; see the org.apache.xerces.parsers.DOMParser
class, for instance.
However, there are certain features of the TX DOM API that have no
equivalent within the DOM level 2 API. Some of the more
significant are:
-
The DOM has no "write validation"--
i.e., one cannot ask "What can legally be
inserted here?"
-
The DOM does not offer as much flexibility in terms
of accessing the XML prolog (the version and encoding
attributes, for example).
That is, methods like
com.ibm.xml.parser.TXDocument#getEncoding()
have no equivalent in W3C DOM level 2).
However, it should be noted that experimental support for
some aspects of DOM level 3 has been added; see DOM level 3 discussion for details.
The parser also no longer offers support for XPointer.
However, support for namespaces has been added to both the DOM
level 2 and the SAX 2.0 API's which XML4J 4.0.1 fully
supports.
Code relying on any of these classes should be converted to rely
on the DOM parser, through the org.w3c.dom.* classes or
the org.apache.xerces.parsers.DOMParser class, or the
SAX parser, through
org.apache.xerces.parsers.SAXParser ,
depending on which aspects of the TXDOM interface are
being used.
The
org.apache.xerces.parsers.DOMParser.java class has
features which control whether it is validating, whether
the construction of the DOM tree is deferred, etc. The
API docs should be consulted for details. The SAX
parser has similar functionality.
Another change that will undoubtedly affect some users is
that the SAX InputSource interface is now used for
inputting XML streams rather than the
com.ibm.xml.parser.Source class. This class is no longer
included in XML4J3.2.x.
It is also of note that TXCatalog support (implemented in
the com.ibm.xml.internal.TXCatalog.java class) no longer exists in the
parser; XCatalog
support has been similarly discontinued.
In previous versions of XML4J (3.0.x and 3.1.x),
four classes (for validating and
nonvalidating DOM and SAX parsers) from the
com.ibm.xml.parsers package had been preserved for backward
compatibility. These classes are no longer included in
versions of XML4J later than
3.2.0; applications must make use of the classes provided in
the org.apache.xerces.parsers package.
It should be noted that, instead of supplying
separate validating and nonvalidating parsers,
a Configurable API is used to control whether the
supplied parser is validating or not. Validation is
turned on for either the SAX or DOM parser by setting the
http://xml.org/sax/features/validation
feature to true; please consult the SAX API on how to do
this (the method to do this also works with the DOMParser
class).
The RevalidatingDOMParser class has also been completely
removed because of difficulties relating to maintenance.
|
| |
The most pervasive change in XML4J's behaviour between
the two versions is the fact that SystemId
fields in DTD DOCTYPEs and schemaLocations should be URI's
and not filenames. Thus, "c:\files\file.dtd" should be
replaced with "file:///c:/files/file.dtd".
XML4J 4.0.1 contains code which tries to convert DOS filenames
into URI's, but we cannot guarantee that this will
succeed in all cases; for interoperability with other parsers,
URI's should always be used.
|
| |
Many new features and API's have been added to XML4J
since the 2.x release.
Some of these are:
- The DOM level 2 Core, Events, Ranges and Traversal
API's have all been fully implemented;
- SAX 2.0, which implements namespaces among other
features, has been implemented. The SAX parser, both
for SAX version 1.0 and 2.0, can also be used for validation;
- JAXP versions 1.0 and 1.1 are now included;
- In addition to being fully conformant to the XML 1.0
spec, XML4J 4.0.1 also conforms to the W3C's XML Schema
Recommendation version 1.0;
- Unlike previous versions of XML4J, XML4J4.0.1 offers
a means to get at the information in the PSVI;
- XML4J 4.0.1 provides a means for preparsing W3C
schema documents, then caching the compiled versions for
later use in validating instance documents (grammar
caching);
- XML4J 4.0.1 also supports many configuration options
based on the SAX
setFeature()
interface;
- XML4J 4.0.1's XNI API can be used to provide a
tremendous amount of flexibility, including creating
custom parsers;
- A package,
org.apache.xml.serialize , for serializing
DOM trees is also
included;
- Packages to construct a DOM tree from an HTML
(
org.apache.html.dom ) or a WML
(org.apache.wml.dom ) document have been
added.
As our codebase has matured, a vast number of conformance
and performance bugs have also been fixed. To enhance
performance, we have also implemented a feature
(http://apache.org/xml/features/dom/defer-node-expansion ) that
allows the DOM parser only to expand nodes of the DOM
tree when necessary. It is noteworthy, however, that do
to the addition of so many new features and API's,
XML4J4.0.1 may be somewhat slower when parsing certain files than
XML4J2.x, both in validating and nonvalidating mode.
|
|
 |  |  |  | Migrating from XML4J 3.1.x |  |  |  |  |
| |
Many new features have been added to XML4J since the XML4J
v3.1.x parsers were released; see the description of new features as
compared with XML4J 3.2.x for details. Nonetheless, in
most cases, code that works with XML4J
3.1.x should work without change with XML4J4.0.1. There are
two main exceptions to this rule, however.
Code which
relies on the com.ibm.xml.parsers package,
which was provided for backwards compatibility with
XML4J 2.0.x, will no longer function since this package has
been entirely removed.
Such code needs to be modified to work with
the org.apache.xerces.parsers package.
XML4J 3.1.x parsers
supported a subset of the W3C Schema 1.0 Working Draft of
October, 2000.
XML4J 3.2.x and XML4J 4.0.1 support the W3C Schema
Recommendation of May 2001.
As a result of changes in the specification between its
Working Draft and final version,
all schema documents validated
by XML4J 3.1.x will no longer be validated either by XML4J
3.2.x or XML4J 4.0.1. An understanding of the Schema recommendation
will be needed to determine precisely how to modify the
schemas so that they come into compliance with the W3C's
recommendation, but it will help to realize that the
schema namespace has changed to
http://www.w3.org/2001/XMLSchema and that XML4J
will only use schemas from this namespace. Instance
documents should refer to the schemaLocation and
noNamespaceSchemaLocation attributes taken from the
http://www.w3.org/2001/XMLSchema-instance
namespace.
Like XML4J 3.2.x, XML4J 4.0.1 also fully implements the JAXP specification
version 1.1. This is a change from the previous 3.1.x
versions of the parser, which supported JAXP 1.0.
|
 |  |  |  | Migrating from XML4J Version 3.2.x |  |  |  |  |
| |
XML4J 4.0.1 is considerably more feature-rich than were either
XML4J 3.2.0 or XML4J 3.2.1. Care has been taken to make
XML4J 4.0.1 as much of a drop-in replacement for XML4J 3.2.X as
possible. Nonetheless, issues which affect some users exist and
are discussed below, along with a summary of the many new features
that have been added in this release.
 |  |  |  | Deprecated Interfaces and Behaviour Modifications |  |  |  |  |
| |
All XML4J 3.x distributions contained one parser jar file
called xerces.jar . In order to lessen duplication
with many other products that ship with similar API's as
those implemented by XML4J, we have divided this jar in XML4J
4.0.1 into two files:
xmlParserAPIs.jar : contained the standard
API's that XML4J implements--i.e., those in the
org.w3c.dom , org.xml.sax , and
jaxp.xml.parsers package hierarchies;
xercesImpl.jar : contains XML4J's
implementation of these API's
While XML4J 4.0.1 has only been thoroughly tested--and can
only be supported--when both jar files are used, it may be
possible in certain cases to use xmlParserAPIs.jar
instead of API jarfiles supplied with other products, or vice
versa depending on product requirements. A very thorough
understanding of the API's included in the relevant jarfiles
needs to be attained before this can be contemplated. As with
any other technical question about the product, we would be
pleased to help examine specific situations.
To help ease users' transitions, we provide in this
release a distribution containing the old-style
"unified" jar file. This distribution has the same name
as the standard binary distributions, except the word
depeecated is prepended to the name.
Another change that will impact some users relates to the DOM
level 3 support that was provided in XML4J 3.2.1. In order
for XML4J 4.0.1 to conform to Sun's test suites for the
J2EE 1.3 specification (the CTS tests) and the JCK 1.4
specification (the JCK tests), we were obliged to repackage
our initial DOM level 3 support for this release. XML4J 4.0.1
offers some support for the DOM level 3 Core, Abstract
Schemas, and Load/Save Working Drafts.
In summary, the DOM level 3 functionality that was present in
XML4J 3.2.1 has been retained in XML4J 4.0.1; to access it,
however, the user can no longer make method calls on, for
instance,
org.w3c.dom.Document objects directly. Instead,
they must cast down to the
org.apache.xerces.dom.DocumentImpl on which they
may make the same method calls. The same is true of other DOM
level 3 methods that XML4J 4.0.1 supports.
An indication of the DOM level 3 functionality that XML4J
4.0.1 supports can be found by examining the
org.apache.xerces.dom3 package. The reader is
also urged to consult the DOM level 3
section of this documentation.
|
 |  |  |  | New Features of XML4J 4.0.1 |  |  |  |  |
| |
Many features have been added to XML4J 4.0.1 that did not
exist, or were present only in an incomplete state, in XML4J
3.2.x. Perhaps the most interesting of these is the Xerces
Native Interface (XNI) API.
This API was designed as a general-purpose XML parsing API;
modularity, flexibility, and information losslessness were its
top goals. For more information on the API, see the manual, included in this documentation.
Application writers who have specific needs should be able to
write their own custom components, integrate them with
standard components shipped with XML4J 4.0.1 and thus create
software optimal for their specific needs.
As well as a completely re-architected XML Schema
implementation, XML4J 4.0.1 also provides access to the
post-schema validation infoset (PSVI) of an XML document validated by
an XML Schema. For information on how XNI was exploited to
bring this about, and how through XNI all of the PSV
information can be accessed by an application, see the Core section of the XNI manual.
For information on how XML4J 4.0.1 produces an XML
representation of the PSVI similar to that produced by Henry
Thomson's XSV tool, see the PSVIWriter and
PSVIConfiguration sections of the XNI
sample documentation.
XML4J 4.0.1 also provides a means of validating XML Schema
documents without having to provide an instance. This same
facility can be used, in conjunction with the
DOMParser , to parse and store XML Schema grammars
in advance of validating documents with them, and therefore to
obtain a very considerable performance gain. For information
on this, see the DOMAsBuilder sample in the DOM samples documentation.
Finally, XML4J 4.0.1 allows applications to make use of
some Xerces-specific features when XML4J is available on the
system in a way that will not interfere with normal operation
when other parsers that do not support such features are used.
For a brief description of this, see the release documentation after consulting
the XNI manual to find out what parser
configurations are.
The release of XML4J 4.0.1 also represents many bugfixes,
especially with respect to the XML Schema implementation and
conformance to JAXP 1.1 and SAX 2.0. Since, as has been
mentioned above, XML4J 4.0.1 is a very substantial rewrite of
XML4J 3.2.x, it is not known what performance characteristics
the parser will exhibit in all conditions. Nonetheless,
particularly if advanced features like XNI parser
configurations are used to optimize the parser for the task at
hand, we are confident that performance will generally be
at least as good as in previous versions.
|
|
|
|