| Overview |
Nux is a small, natural, straightforward and surprisingly
effective open-source extension of the
XOM and
Saxon XML libraries.
Nux is geared towards versatile embedded integration and interchange,
in particular for high-throughput server container environments
(e.g. high-speed real-time data streaming applications,
large-scale Peer-to-Peer messaging network infrastructures over
high-bandwidth networks, scalable message oriented middleware, etc).
But its simplicity
also makes it useful for client side XML query/transformation workflow pipelines.
Have you ever tried to take advantage of a robust and natural commodity Java tool set for
XML, XQuery, XPath, schema validation and related technologies,
yet were not ready to accept a significant performance penalty? Chances are most tool sets
turned out not to be particularly robust and natural, that they incurred dramatic penalties
when used in straightforward manners, and that their complex idiosyncracies had a strong
tendency to destract from the real job and use cases you wanted to get done in a timely
manner.
Nux helps to avoid XML nightmares, enabling you to mix and match powerful
XML tools that fit your needs, in natural, straightforward, seamless and
effective manners.
Features include:
- Seamless W3C XQuery and XPath support for XOM
(see API).
Also see XQueryBenchmark.
- Efficient and flexible pools
and factories for XQueries, XSL Transforms, as well as
document Builders that validate against various schema languages, including W3C XML Schemas, DTDs, RELAX NG, Schematron, etc.
(see API).
- Optional serialization and deserialization of XOM XML documents to and from
an efficient and compact custom binary XML data format (bnux
format), without loss or change of any information.
Serialization and deserialization is much faster than with the standard textual XML format,
and the resulting binary data is more compressed than textual XML
(see API).
- For simple and complex continuous queries and/or transformations over very
large or infinitely long XML input,
a convenient streaming path filter API combines full XQuery and XPath support with
straightforward filtering
(see API).
- Glue for integration with JAXB and for queries over ill-formed HTML
(see API).
- Well documented API. Ships in a jar file that weighs just 60 KB.
|
Motivation |
Have you ever tried to do queries and/or transformations over XML data sources?
Chances are that manual SAX/DOM processing was cumbersome at best, that XPath was not powerful or flexible enough,
or XSLT perhaps too complicated, and that most related APIs have a steep learning curve, and
contain quite a few bugs.
This is where the power and simplicity of XQuery comes in.
Nux provides seamless XQuery support for XOM, leveraging the
standards compliance, efficiency and maturity of the Saxon engine,
in combination with a robust, lean and mean adapter for XOM that Nux contributed to Saxon.
Since XQuery is a superset of XPath 2.0 it
can also be used with plain XPath expressions as queries.
It implements most of the features of W3C Working Draft 11 February 2005, and passes
several exhaustive test suites.
Have you ever tried to build an XML system that is straightforward, works correctly and
processes thousands or tens of thousands of small XML messages per second in non-trivial ways? Chances are you've encountered lots of
non-obvious obstacles down that path. For that scenario, Nux couples the simplicity and correctness qualities of XOM with
efficient and flexible pools
and factories for XQueries, XSL Transforms, as well as
document Builders that validate against various schema languages, including
W3C XML Schemas (leveraging Xerces),
RELAX NG, Schematron, etc. (leveraging MSV).
For particularly stringent performance requirements
an option for lightning-fast binary XML serialization and deserialization
is offered.
Glue for integration with JAXB
and for queries over ill-formed HTML is also provided.
Note for anthropologists: The Goddess Nux (Night) was one of the four original primeval Greek deities,
who were the first entities to be formed out of Chaos, her siblings being
Eros (Love), Gaea (Earth) and Erebos (Darkness).
Nux vomica also refers to a tree native to the East Indies, as well as its nut-like seeds.
In homeopathy, it is one of the most commonly prescribed remedies.
|
Example Usage |
Document doc = new Builder().build(new File("/tmp/test.xml"));
// find the atom named 'Zinc' in the periodic table:
Node result = XQueryUtil.xquery(doc, "/PERIODIC_TABLE/ATOM[NAME = 'Zinc']").get(0);
System.out.println("result=" + result.toXML());
// equivalent via the more powerful underlying API:
XQuery xquery = new XQuery("/PERIODIC_TABLE/ATOM[NAME = 'Zinc']", null);
Node result = xquery.execute(doc).next();
// count the numer of elements in a document tree
int count = XQueryUtil.xquery("//*").size();
System.out.println("count=" + count);
Document doc = new Builder().build(new File("/tmp/test.xml"));
Nodes results = XQueryUtil.xquery(doc, "//*:img/@src");
//Nodes results = XQueryUtil.xquery(doc, "//*:img/@src[matches(., '.jpg')]");
for (int i=0; i < results.size(); i++) {
System.out.println("node "+i+": " + results.get(i).toXML());
//System.out.println("node "+i+": " + XOMUtil.toPrettyXML(results.get(i)));
}
<bib>
{
for $b in doc("http://bstore1.example.com/bib.xml")/bib/book
where $b/publisher = "Addison-Wesley" and $b/@year > 1991
return
<book year="{ $b/@year }">
{ $b/title }
</book>
}
</bib>
for $i in doc("items.xml")//item_tuple
let $b := doc("bids.xml")//bid_tuple[itemno = $i/itemno]
where contains($i/description, "Bicycle")
order by $i/itemno
return
<item_tuple>
{ $i/itemno }
{ $i/description }
<high_bid>{ max($b/bid) }</high_bid>
</item_tuple>
To get started, you can use XQueryCommand,
which is a simple command line demo that runs a given XQuery against a set of files
and prints the result sequence.
|
Querying Nasty HTML |
If you'd like to query non-XML documents such as the typical HTML that lives out there,
you can combine Nux with TagSoup,
which is a "SAX-compliant parser that, instead of parsing well-formed or valid XML,
parses HTML as it is found in the wild: nasty and brutish, though quite often far from short".
TagSoup plugs into XOM and makes ill-formed HTML appear as well-formed XML.
Just add tagsoup.jar to the classpath and try this:
// find the links of all images in an ill-formed HTML document
XMLReader parser = new org.ccil.cowan.tagsoup.Parser(); // tagsoup parser
Document doc = new Builder(parser).build("http://www.yahoo.com");
Nodes results = XQueryUtil.xquery(doc, "//*:img/@src");
for (int i=0; i < results.size(); i++) {
System.out.println("node "+i+": " + results.get(i).toXML());
//System.out.println("node "+i+": " + XOMUtil.toPrettyXML(results.get(i)));
}
|
FAQ |
- What about support for database integration and the
XQJ standard?
XQJ will be very useful, but it is still immature, in flux and without real implementations. Once XQJ stabilizes, and implementations of it become available,
we will also support plugging in any XQJ implementation via a factory interface.
But that day doesn't seem close yet.
|
Related Information |
A GUI XQuery Editor helps to learn the query language, and to quickly try out
queries during early development stages. Such editors include
Oxygen
(commercial but with free trial, including Eclipse plugin)
and Stylus Studio
(commercial but with free trial).
Using such rapid prototyping GUIs before deploying into your Nux-based production
application can speed up development and early testing.
Incidentally, these GUIs also use the Saxon XQuery engine internally, just like Nux.
Easy-to-read tutorials and other material about XQuery includes:
|
|