Search
Overview

Nux is a small, natural, straightforward and surprisingly effective open-source extension of the XOM and Saxon XML libraries.

Nux is geared towards versatile embedded integration and interchange, in particular for high-throughput server container environments (e.g. high-speed real-time data streaming applications, large-scale Peer-to-Peer messaging network infrastructures over high-bandwidth networks, scalable message oriented middleware, etc). But its simplicity also makes it useful for client side XML query/transformation workflow pipelines.

Have you ever tried to take advantage of a robust and natural commodity Java tool set for XML, XQuery, XPath, schema validation and related technologies, yet were not ready to accept a significant performance penalty? Chances are most tool sets turned out not to be particularly robust and natural, that they incurred dramatic penalties when used in straightforward manners, and that their complex idiosyncracies had a strong tendency to destract from the real job and use cases you wanted to get done in a timely manner.

Nux helps to avoid XML nightmares, enabling you to mix and match powerful XML tools that fit your needs, in natural, straightforward, seamless and effective manners.

Features include:

  • Seamless W3C XQuery and XPath support for XOM (see API). Also see XQueryBenchmark.
  • Efficient and flexible pools and factories for XQueries, XSL Transforms, as well as document Builders that validate against various schema languages, including W3C XML Schemas, DTDs, RELAX NG, Schematron, etc. (see API).
  • Optional serialization and deserialization of XOM XML documents to and from an efficient and compact custom binary XML data format (bnux format), without loss or change of any information. Serialization and deserialization is much faster than with the standard textual XML format, and the resulting binary data is more compressed than textual XML (see API).
  • For simple and complex continuous queries and/or transformations over very large or infinitely long XML input, a convenient streaming path filter API combines full XQuery and XPath support with straightforward filtering (see API).
  • Glue for integration with JAXB and for queries over ill-formed HTML (see API).
  • Well documented API. Ships in a jar file that weighs just 60 KB.

Motivation

Have you ever tried to do queries and/or transformations over XML data sources? Chances are that manual SAX/DOM processing was cumbersome at best, that XPath was not powerful or flexible enough, or XSLT perhaps too complicated, and that most related APIs have a steep learning curve, and contain quite a few bugs.

This is where the power and simplicity of XQuery comes in. Nux provides seamless XQuery support for XOM, leveraging the standards compliance, efficiency and maturity of the Saxon engine, in combination with a robust, lean and mean adapter for XOM that Nux contributed to Saxon. Since XQuery is a superset of XPath 2.0 it can also be used with plain XPath expressions as queries. It implements most of the features of W3C Working Draft 11 February 2005, and passes several exhaustive test suites.

Have you ever tried to build an XML system that is straightforward, works correctly and processes thousands or tens of thousands of small XML messages per second in non-trivial ways? Chances are you've encountered lots of non-obvious obstacles down that path. For that scenario, Nux couples the simplicity and correctness qualities of XOM with efficient and flexible pools and factories for XQueries, XSL Transforms, as well as document Builders that validate against various schema languages, including W3C XML Schemas (leveraging Xerces), RELAX NG, Schematron, etc. (leveraging MSV).

For particularly stringent performance requirements an option for lightning-fast binary XML serialization and deserialization is offered. Glue for integration with JAXB and for queries over ill-formed HTML is also provided.

Note for anthropologists: The Goddess Nux (Night) was one of the four original primeval Greek deities, who were the first entities to be formed out of Chaos, her siblings being Eros (Love), Gaea (Earth) and Erebos (Darkness). Nux vomica also refers to a tree native to the East Indies, as well as its nut-like seeds. In homeopathy, it is one of the most commonly prescribed remedies.

Example Usage

Document doc = new Builder().build(new File("/tmp/test.xml")); // find the atom named 'Zinc' in the periodic table: Node result = XQueryUtil.xquery(doc, "/PERIODIC_TABLE/ATOM[NAME = 'Zinc']").get(0); System.out.println("result=" + result.toXML()); // equivalent via the more powerful underlying API: XQuery xquery = new XQuery("/PERIODIC_TABLE/ATOM[NAME = 'Zinc']", null); Node result = xquery.execute(doc).next(); // count the numer of elements in a document tree int count = XQueryUtil.xquery("//*").size(); System.out.println("count=" + count);

Document doc = new Builder().build(new File("/tmp/test.xml")); Nodes results = XQueryUtil.xquery(doc, "//*:img/@src"); //Nodes results = XQueryUtil.xquery(doc, "//*:img/@src[matches(., '.jpg')]"); for (int i=0; i < results.size(); i++) { System.out.println("node "+i+": " + results.get(i).toXML()); //System.out.println("node "+i+": " + XOMUtil.toPrettyXML(results.get(i))); }

<bib> { for $b in doc("http://bstore1.example.com/bib.xml")/bib/book where $b/publisher = "Addison-Wesley" and $b/@year > 1991 return <book year="{ $b/@year }"> { $b/title } </book> } </bib>

for $i in doc("items.xml")//item_tuple let $b := doc("bids.xml")//bid_tuple[itemno = $i/itemno] where contains($i/description, "Bicycle") order by $i/itemno return <item_tuple> { $i/itemno } { $i/description } <high_bid>{ max($b/bid) }</high_bid> </item_tuple>

To get started, you can use XQueryCommand, which is a simple command line demo that runs a given XQuery against a set of files and prints the result sequence.

Querying Nasty HTML

If you'd like to query non-XML documents such as the typical HTML that lives out there, you can combine Nux with TagSoup, which is a "SAX-compliant parser that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: nasty and brutish, though quite often far from short". TagSoup plugs into XOM and makes ill-formed HTML appear as well-formed XML. Just add tagsoup.jar to the classpath and try this:

// find the links of all images in an ill-formed HTML document XMLReader parser = new org.ccil.cowan.tagsoup.Parser(); // tagsoup parser Document doc = new Builder(parser).build("http://www.yahoo.com"); Nodes results = XQueryUtil.xquery(doc, "//*:img/@src"); for (int i=0; i < results.size(); i++) { System.out.println("node "+i+": " + results.get(i).toXML()); //System.out.println("node "+i+": " + XOMUtil.toPrettyXML(results.get(i))); }
FAQ
  • What about support for database integration and the XQJ standard?

    XQJ will be very useful, but it is still immature, in flux and without real implementations. Once XQJ stabilizes, and implementations of it become available, we will also support plugging in any XQJ implementation via a factory interface. But that day doesn't seem close yet.

Related Information

A GUI XQuery Editor helps to learn the query language, and to quickly try out queries during early development stages. Such editors include Oxygen (commercial but with free trial, including Eclipse plugin) and Stylus Studio (commercial but with free trial). Using such rapid prototyping GUIs before deploying into your Nux-based production application can speed up development and early testing. Incidentally, these GUIs also use the Saxon XQuery engine internally, just like Nux.

Easy-to-read tutorials and other material about XQuery includes:


© 2003-2004, Lawrence Berkeley National Laboratory Valid HTML 4.01! Valid CSS!