SAXON home page

SAXON: Change History

This file describes changes for versions 6.0 and later. For changes prior to version 6.0, see changes5.html. For changes prior to version 5.0, see history.html

Changes in version 6.4 (2001-07-03)

Documentation

I have given the API Guide a much needed overhaul, including improved descriptions of the APIs for invoking XPath expressions from Java code.

Defects cleared

The following errors were found version 6.3, and have been cleared.

6.3/001 If the match pattern in an xsl:key definition matches both element and attribute nodes, only the attributes will actually be indexed. (Present in all previous releases.)

6.3/002 If a Writer is supplied to receive the output of a transformation (when using the JAXP 1.1 API), Saxon has no control over the output encoding. It is therefore possible that the value of the encoding attribute written to the XML declaration will bear no relation to the actual encoding of the output file. As a partial fix to this, Saxon now determines the encoding used by the Writer if it can (namely, if it is an OutputStreamWriter) and writes this encoding name to the XML declaration, ignoring any encoding that was requested in xsl:output or via the setOutputProperty() method. On my Windows configuration, this will generally result in the XML declaration saying encoding="Cp1252". The recommended circumvention to this problem is to supply an OutputStream for the output, rather than a Writer. (Present in all previous releases.)

6.3/003 It is not possible to call an external function that expects an argument of class com.icl.saxon.expr.FragmentValue (or any other subclass of NodeSetValue), even if the supplied argument is the correct class. [Fixed but not tested]. (Present in 6.3 only)

6.3/004 The following axis, starting at an attribute or namespace node, should include the descendants of the element that is the parent of the attribute or namespace node. It currently returns only the nodes that are on the following axis from the parent node. (Present in all previous releases.)

6.3/005 A NullPointerException occurs if a StreamSource is supplied without calling setSystemId(). (Present in 6.3 only)

6.3/006 A bug in the AElfred XML parser: if the DTD declares an element type as having element content, but an element of that type wrongly contains non-whitespace text, then AElfred simply ignores the offending text; it reports no error, and it doesn't report the text to the application. As a non-validating parser, AElfred should report the text content exactly as if the DTD declared the element as having mixed content. (Present in all previous releases.)

6.3/007 AElfred fails to detect and report a well-formedness error: specifically, when the source text contains the disallowed sequence ']]>' immediately after an entity reference such as '<'. (Present in all previous releases.)

6.3/008 A keyword used as an operator (div, mod, and, or) cannot be used as a variable name within an XPath expression. (Present in all previous releases.)

6.3/009 When a Saxon tree is supplied as input to a transformation (as a DOMSource), and needs to be rebuilt in order to strip whitespace nodes, and when the target format is a standard tree rather than a tinytree, then a NullPointerException may occur when reading the children of the root node (after processing the children that exist). (Present in Saxon 6.3 only.)

6.3/010 The Ælfred parser, after reading an external entity, does not close the input file. It has been reported that on the Microsoft platform this can result in the operating system keeping the file locked indefinitely, preventing other processes updating it. The fix for this problem closes the input stream or reader even if this was supplied by a user-supplied entity resolver. (Present in all previous releases.)

6.3/011 The TemplatesHandler (which allows a stylesheet to be built using SAX events) does not work. (Present since JAXP support was introduced.)

6.3/012 The Ælfred XML parser, when invoked with http://xml.org/sax/features/namespace-prefixes set to true, does not report namespace declarations to the application as attributes on the startElement() call. This doesn't affect Saxon, because Saxon always sets this feature to false, but it may affect other applications using Ælfred. (Present since Saxon 6.3: a side effect of the fix for bug 6.2.2/011.)

6.3/013 Namespace aliasing (xsl:namespace-alias) on attribute names does not work. The new attribute name that is generated will have the local part of the attribute name overwritten with the local part of the containing element name. (Present in all previous releases.)

6.3/014 When calling an extension function that expects an argument declared as being of type java.lang.Object, a supplied string, number, or boolean is passed as an instance of an internal Saxon class, rather than being converted to a String, Double, or Boolean.

JAXP 1.1 support

Saxon no longer sets itself as the default DocumentBuilderFactory for use when building a DOM. This is because the Saxon DOM implementation, being read-only, is suitable only for specialized use.

Saxon still sets itself as both the default XSLT transformer and the default SAX2 ParserFactory.

FOP integration

Saxon's FOP integration has been updated to use FOP 0.19.0

Two new attributes are available on the xsl:output and xsl:document elements, for use when method="saxon:fop":

Here fop: is the prefix of a namespace whose URI must be http://icl.com/saxon/fop

These two attributes have not been fully tested.

Internal API changes and code reorganisation

These changes are made partly to improve maintainability of the code, partly to reduce its size, and partly to enable the future support of a wider variety of data structures that the XPath implementation can access (for example, non-SAXON DOM structures, databases, etc). Some of the changes will affect Java applications, especially those that make intimate use of internal Saxon implementation classes.

The NodeInfo interface

The main change is a major simplification of the NodeInfo interface, greatly reducing the number of methods and subclasses that need to be implemented to support a new kind of tree structure, but hopefully without reducing the usability of the interface or the performance of its implementations.

The interface classes that are subclasses of NodeInfo have been eliminated, (for example the old favorite ElementInfo). The only exception is DocumentInfo (representing the root node or the document as a whole). This reflects the fact that in the XPath data model, all methods are available on any kind of node. Tests that were previously written if (node instanceof TextInfo) should now be written if (node.getNodeType()==NodeInfo.TEXT). In other cases, simply replace the specific interface (for example ElementInfo) by the general class NodeInfo.

The NodeInfo interface, which is the main interface to Saxon's tree model, no longer extends the DOM Node interface. This means that methods such as getNextSibling() are no longer available on this interface. Navigation from a node should be done instead by creating an enumeration using one of the XPath axes, using the getEnumeration() method.

However, the two implementations of the NodeInfo interface, that is the standard tree and the tiny tree, continue to implement the DOM Node interface. To achieve this, the two implementation types (NodeImpl and TinyNodeImpl) both inherit from a new abstract class called AbstractNode. This class implements both the Saxon NodeInfo interface and the DOM Node interface; it also includes methods needed only for element, text, comment, or root nodes. (This is done to make these methods shared between the two tree implementations: it is not possible in Java for a class such as TextImpl to inherit both from NodeImpl containing the Saxon methods and from a generic AbstractTextImpl containing the DOM methods.)

The NodeInfo class now implements the JAXP Source interface, which means that any NodeInfo can be used directly to define the source of a transformation, with no need to wrap it in a DOMSource object. Note that if you supply the source tree in this way, it is your own responsibility to strip any unwanted whitespace nodes before XSLT processing begins. The xsl:strip-space and xsl:preserve-space instructions in the stylesheet will be ignored.

Saxon still uses DOM methods such as getNextSibling() to navigate the stylesheet tree, which is always implemented using the standard tree model. However, Saxon no longer relies on the source document providing DOM interfaces.

As well as the DOM methods, a number of other methods on the NodeInfo interface have been removed. Many of these were "shortcut" methods that weren't really needed, and which were the same in all implementations. In all cases there are alternatives available.

The getValue() method in the NodeInfo class has been renamed getStringValue(), to better reflect its meaning, and to avoid clashing with the getValue() method of the org.w3c.dom.Attr class.

The AxisEnumeration classes are now logically part of the tree implementation, so they are implemented differently for each tree structure. This allows the implementation to use the navigation mechanisms that are most efficient in each data structure.

The subclasses of Axis, which existed essentially to provide information about each axis, have been removed. Instead the Axis class itself provides this information in the form of a number of arrays, indexed by axis number. The Axis class has been moved to the com.icl.saxon.om package.

The unused utility methods in class com.icl.saxon.om.Navigator, for example isFirstInGroup() and getAncestor(), have been deleted. If you need these methods in your application, I suggest reconstructing them within your application code, based on the Saxon 6.3 source code.

The interface com.icl.saxon.om.ExtendedAttributes has been removed from the object model, as the preferred way of accessing all the attributes of an element is now to enumerate the attribute axis.

Multiple documents

In previous releases, certain information held within a document was required to be unique across all documents used within a single transformation: in particular, the document number, and the node sequence numbers. This potentially causes problems when the same source document is used in multiple transformations, perhaps running in parallel. The problems were previously avoided by rebuilding the document for each transformation, which is inefficient.

In Saxon 6.4, a document no longer contains a unique document number. The methods generateId() and getSequenceNumber() now generate numbers which are required to be unique only within a single document; making them globally unique is done by the calling code, with the aid of the document pool maintained by the Controller.

A tree implementation is no longer required to provide sequence numbers for the nodes. Instead, it is required to implement a compareOrder() method that determines the relative ordering of two nodes within the same tree. Comparison of nodes in separate trees is now done at a different level of the software.

The only extra data that a source document now contains to support Saxon transformations is:

The NodeHandler interface

NodeHandler was previously an abstract class in package com.icl.saxon.handlers; it is now an interface in package com.icl.saxon. This may affect user-written node handlers, used either in a Java (non-XSLT) application, or via the saxon:handler extension element.

There is an extra method requiresStackFrame() whose value is a boolean. You can generally return false. Return true only if the node handler maintains variables or parameters that can be accessed from XPath expressions - something that is not especially easy to do.

This change also means that any user-written TraceListener will need to be recompiled.

Sorting

I have made internal changes to the sorting routines to reduce the memory used, especially when the sort involves only a single sort key. The changes are unlikely to affect many users. However, some of the methods in internal classes such as SortedSelection have changed.

JDOM support

To illustrate the way that the new NodeInfo interface can be used to create adapters for other document formats, I have built an adapter for JDOM (see http://www.jdom.org/). Although the code for this is included in the main source tree, it is issued in a separate Jar file, saxon-jdom.jar. The code is still at beta quality. A sample application showing how to use Saxon with JDOM is provided. The JDOM interface requires JDK 1.2.

This facility allows a JDOM tree to be used as the input to an XSLT transformation, or as the target for XPath expressions issued from your Java code. You can direct the output to a JDOM tree by using JDOM's SAX driver as the SAXResult destination object for the transformation.

Saxon currently makes no attempt to merge adjacent text nodes in the JDOM tree: these can arise if the two text nodes are separated by an entity boundary or by a CDATA section boundary.

Using SAXON with JDOM is not likely to be especially efficient; it requires extra memory for the wrapper data structures, and some XPath navigation routes are quite inefficent because they are not supported directly in JDOM (for example, JDOM provides no direct way of getting from a node to its siblings). It is provided partly as an illustration of how to interface other data sources, and partly for users who already have data in JDOM format. It is particularly useful to enable XPath access to JDOM from Java applications.

Ælfred XML parser

I have reviewed the changes made by David Brownell in his version of the Ælfred XML parser (available as project xmlconf in www.sourceforge.net), and have incorporated those that are relevant into the Saxon version. This is basically all changes except those required to report validation errors. Most of the changes are very minor, but there are some enhancements in the handling of character encoding: if an input file is in an encoding that Ælfred itself does not understand, it now attempts to get the Java VM to decode it. The set of character encodings available in the Java VM is platform-dependent.

SQL extension functions

Following a suggestion from René Jansen, I have changed the xsl:insert code so it now prepares the SQL statement only the first time it is executed, and reuses the prepared statement thereafter. Also, it can now handle columns that are not strings.

xsl:attribute and friends

The three instructions xsl:attribute, xsl:comment, and xsl:processing-instruction have been speeded up. Where the content of the instruction is a single text node, or an xsl:value-of instruction, Saxon now avoids the overead of setting up a new output destination; instead of processing the content as a general template body, it evaluates it directly as an expression. Where this is not the case, a streamlined output method is used that avoids many of the overheads previously incurred.

Variables and parameters

Global variables and parameters are no longer evaluated if there is another variable or parameter with the same name and higher import precedence.

An xsl:variable element containing a single text node is now treated specially, bringing the performance close to that of a String variable.

Changes in version 6.3 (2001-05-03)

JAXP 1.1

Saxon now implements the javax.xml.parsers package in JAXP 1.1 as well as the javax.xml.transform package.

If you have the system property javax.xml.parsers.SAXParserFactory set to the value com.icl.saxon.aelfred.SAXParserFactoryImpl, then any call on JAXP 1.1 interfaces to get an XMLReader will select AElfred. Moreover, Saxon itself uses the JAXP 1.1 interfaces to get an XMLReader if none has been explicitly requested, so you can now determine the parser to be used by setting this system property. The default for this property, defined by a services file in saxon.jar, selects the AElfred parser.

Similarly, if you have the system property javax.xml.parsers.DocumentBuilderFactory set to the value com.icl.saxon.om.DocumentBuilderFactoryImpl, then any call on JAXP 1.1 interfaces to get a DOM Document builder will select the Saxon tinytree implementation. However, Saxon does not call JAXP interfaces to get a Document builder: it will always choose its own. Note that Saxon's DOM implementation is an immutable DOM: you can construct the DOM by parsing a source document, but you cannot build it or modify it through the DOM API methods.

Saxon's Builder and Stripper classes have been moved to the package com.icl.saxon.om.

When a Saxon document is supplied as input to the transform() method (using a DOMSource object), in previous releases the tree was rebuilt. At this release the tree is used as is, provided that either (a) the stylesheet does not require whitespace nodes to be stripped, or (b) whitespace stripping has been disabled by calling the new Controller.disableWhitespaceStripping() method. In the cases where the tree does need to be rebuilt, a "fast path" routine has been introduced to do this: previously the same code was used as for a third-party DOM, which incurred unnecessary costs because there are so many different ways namespaces can be represented in a DOM.

When performing multiple transformations on a single source document, it is best to do the whitespace stripping once as a separate operation: this is made possible by a new method PreparedStyleSheet.stripWhitespace(), which uses the xsl:strip-space directives in a stylesheet to remove whitespace from a document (in fact, it returns a new document that is a copy of the original, with relevant whitespace nodes removed; if no whitespace stripping needs to be done, it returns the original document unchanged).

It is now possible to supply a Saxon document as the output of a transformation. (This didn't work at previous releases, though the restriction was undocumented.) The document must be empty, and the node supplied in the DOMResult object must be the document (ie. root) node.

It is now possible to start a transformation at a node other than the root node, if the input is supplied in the form of a DOM (in a DOMSource object). Global variables are still evaluated with the root node as context node, and the entire tree is available to the transformation, but the first template rule applied is not, as is usual, the match="/" rule, but the rule that matches the supplied node. The DOM supplied as input must not contain CDATA or entity reference nodes that are parents or preceding siblings of the start node.

Extension Functions

Saxon's support for Java extension functions has been brought into line with the working draft XSLT 1.1 specification.

Polymorphic methods are now fully supported. If the relevant class has several methods (or constructors) with the same name, the one that is chosen is the one that gives a "best match" to the types of the supplied arguments, following the rules in the XSLT 1.1 draft. If there is no unique method that provides a best match according to these rules, an error is reported.

Methods that return void, null, char, or byte are now handled as described in the XSLT 1.1 working draft.

There is still a restriction that extension functions cannot construct a new DOM tree and return nodes from this tree by using DOM methods. They can only return existing nodes that were constructed by Saxon itself.

Methods that expect a node-set as input can now declare the argument type as com.icl.saxon.expr.NodeEnumeration, as an alternative to com.icl.saxon.expr.NodeSetValue. This is likely to be a bit more efficient. The enumeration will always be positioned at the start when the function is called, and its position on exit can be anywhere. It is also possible to return a NodeEnumeration as the result of a function. Again, the enumeration must be positioned at the start. Returning a NodeEnumeration is especially efficient if the result is then converted to a String or a Boolean.

The rules for spelling of external function names have been brought into line with XSLT 1.1. This may require stylesheet changes. For example, the function has-same-nodes() must now be spelt as "has-same-nodes()" or "hasSameNodes()", it can no longer be spelt as "hassamenodes()" or as "HAS-SAME-NODES()". For backwards compatibility, the node-set() function may be spelt with or without the hyphen (or as "nodeSet()").

EXSLT

EXSLT is an initiative to define a standardized set of extension functions and extension elements that can be used across different XSLT processors.

Saxon now supports the EXSLT modules Common, Math, Sets, an Functions. The full list of extension functions is:

plus the following new elements:

These have considerable overlap with functions that have previously been provided in the Saxon namespace. The Saxon versions of the functions remain available, for the time being, but the EXSLT versions are preferred.

The saxon:function and saxon:return elements have been changed slightly to conform to the EXSLT rules. Specifically: saxon:return can now appear inside xsl:for-each, provided the xsl:for-each iterates at most once. There is now a check that saxon:return is not used inside the definition of a variable or inside another saxon:return. It is an error to instantiate more that one saxon:return within a function.

saxon:closure

Following a suggestion from Christian Nentwich, I have implemented a new extension function saxon:closure(), which forms a node-set by taking the transitive closure of a node-set expression. The function does NOT detect cycles.

Defects cleared

The following errors were found version 6.2.2, and have been cleared. Many of these relate to incorrect handling of error cases, and reflect the fact that I have greatly increased the test coverage of error handling.

6.2.2/001 If the first argument of the key() function is not the name of a key defined in the stylesheet, a diagnostic dump is produced in place of a meaningful error message.
6.2.2/002 If the name attribute of the xsl:call-template instruction is not the name of a template defined in the stylesheet, a diagnostic dump is produced in place of a meaningful error message.
6.2.2/003 No error is reported when the use-attribute-sets attribute of xsl:attribute contains a circular reference. (Instead, the stack overflows). Note: the fix for this only detects the error at run-time if the attribute-set is actually used. Technically, the error should be detected at compile time, and reported even if the attribute set is never used.
6.2.2/004 No error is reported when the xsl:include or xsl:import element is non-empty.
6.2.2/005 A null pointer exception occurs if the href attribute of xsl:import or xsl:include is omitted.
6.2.2/006 No error is reported if a template name, variable name, or mode name does not conform to the lexical rules for a QName.
6.2.2/007 No error is reported if the xsl:key element is non-empty.
6.2.2/008 No error is reported if the xsl:attribute-set element has content other than xsl:attribute elements.
6.2.2/009 An ArrayIndexOutOfBounds exception occurs when attempting to get the children of the last node in the document, if the number of nodes in the document is 4000 times a power of two. Applies to the TinyTree model only. The problem occurred when using preview mode.
6.2.2/010 When the AElfred parser attempts to read a file using the http protocol, the encoding specified in the HTTP header should take precedence over the encoding specified in the XML document declaration. However, the parsing of the HTTP header is incorrect, so the encoding is typically identified as "=UTF-8" rather than "UTF-8". This results in an UnsupportedEncodingException.
6.2.2/011 There is an error in namespace handling in the AElfred parser. When a "real" attribute precedes a namespace declaration in an element start tag, and the QName of the element or of an attribute is the same as the QName of the parent element or one of its attributes, then the namespace URI assigned to the name may be based on the namespace declarations in force for the parent element rather than those for the child element.
6.2.2/012 There is a bug in the current version of JAXP 1.1: when a StreamSource is constructed from a File object, and the filename is of the form "/usr/file.xml", the resulting URL is "file:////usr/file.xml" rather than "file:///usr/file.xml". I have added code to Saxon's TransformerFactoryImpl to circumvent this problem by detecting the incorrect URL and patching it.
6.2.2/013 User-written message emitters don't work.
6.2.2/014 The integer value returned by getNodeType() on a root node is not consistent with the DOM specifications. Applications that call this method should be recompiled.
6.2.2/015 With xsl:output method="html" indent="yes", indentation should be suppressed for output elements that are nested within a <pre> element. It isn't.
6.2.2/016 When an invalid property is passed to the Transformer methods setOutputProperty() or setOutputProperties(), an IllegalArgumentException should be thrown. Instead, the value is silently ignored.
6.2.2/017 When getOutputProperties() is called on the Transformer interface, subsequent changes to the returned properties should have no effect. This isn't currently the case, as the method returns a reference to the internal property set, rather than making a copy.
6.2.2/018 When processing a document containing attributes with undeclared namespace prefixes, Saxon may crash with a NullPointerException after reporting the error.
6.2.2/019 On return from a call of xsl:apply-imports, the current template is not reset. This means that a second call on xsl:apply-imports will invoke the wrong template.

Changes in version 6.2.2 (2001-03-15)

Upgraded to the latest JAXP ("version 1.1 final release") dated 6 Feb 2001. Saxon now uses the JAXP binaries exactly as issued by SUN. Unfortunately the TransformerFactory issued by Sun invokes Xalan as the "platform default" XSLT processor. The saxon.jar file includes a META-INF file to override this, so there should be no problems unless you have other things on the classpath that conflict. If you want to be absolutely sure of loading Saxon rather than any other XSLT processor, set the system property javax.xml.transform.TransformerFactory to the value "com.icl.saxon.TransformerFactoryImpl", either from your application (by calling System.setProperty()), or from the command line (java -Djavax.xml.transform.TransformerFactory=com.icl.saxon.TransformerFactoryImpl classname)

Make sure you remove any older versions of jaxp.jar from your classpath to prevent any incompatibilities.

I have changed the packaging of the FOP integration, to reduce the problems this causes for people who want to rebuild Saxon or load it into a development environment such as IBM's Visual Age for Java. The FOP integration module, FOPEmitter, is now part of a separate package, com.icl.saxon.fop, and is not included in saxon.jar, but is in a separate JAR file, saxon-fop.jar. This must be on the class path if you want to use Saxon with FOP, but you can ignore it otherwise. There are no longer any compile-time references to FOPEmitter from the rest of the Saxon code, so you can recompile the product without first installing FOP, provided that you remove FOPEmitter from the source library first.

I have reinstated the ability to call Java extension functions using the namespace xmlns:ext="full.class.Name" as an alternative to xmlns:ext="java:full.class.Name". However, the "java:" form is preferred.

I added extension functions saxon:before() and saxon:after(), based on the BEFORE and AFTER operators defined in XQuery. These take two node-sets as arguments and return all the nodes in the first node-set that are before/after at least one node in the second node-set, in document order. This provides an alternative to saxon:leading(), e.g. saxon:before(*, s[1]) gets all the child elements that precede the first child <s> element.

A further refinement to class loading: if the loader returned by getContextClassLoader() fails to load a class, we now try to load the class using Class.forName(). This is all something of a black art: different things appear to work in different environments.

I have re-instated saxon:output as a synonym of xsl:document. The reason for this is that some XSLT processors object to finding an xsl:document element in the stylesheet, even when running in forwards compatible mode. Using saxon:output is therefore more portable. Note, however, that the new saxon:output is not completely compatible with the old: attribute names have changed, especially "file" to "href".

Defects cleared

The following errors were reported for version 6.2.1, and have been cleared:

6.2.1/001 An error is reported if, in an XPath expression, one of the symbols "*", "div", "mod", "and", or "or" is used immediately after a comma (that is, as an argument in a function call after the first). The symbol is wrongly interpreted as a binary operator rather than a location path. (Present in all previous releases).
6.2.1/002 The expression select="@prefix:*", which should return all attributes in the given namespace, actually returns all attributes regardless of namespace.
6.2.1/003 When a user-specified trace listener is specified using the -TL option on the command line, line numbering should automatically be switched on; but the attempt to do so fails.
6.2.1/004 If the stylesheet contains more than one xsl:script element, Saxon may attempt to load the wrong Java class. This will usually result in no appropriate method being found.
6.2.1/005 In the message reporting an ambiguous template rule match, a pattern that is a union pattern with three or more components is displayed incorrectly as "null".
6.2.1/006 A null pointer exception occurs when the name of a system function is misspelt.
6.2.1/007 In setting up the SAX2 parser, Saxon fails to state that it requires both the "features/namespaces" and "features/namespace-prefixes" features to be on. A SAX2 XMLReader may therefore fail to supply Saxon with information about namespaces, causing the transformation to produce incorrect results.
6.2.1/008 A null pointer exception occurs when the -a option is used and the source document contains no suitable <?xml-stylesheet?> processing instruction.
6.2.1/009 The get/set OutputProperties() methods on the Templates and Transformer objects do not work as described in the TrAX interface. On the Templates object, getOutputProperties() returns only those values explicitly set in the stylesheet, not the XSLT-defined defaults. On the Transformer object, getOutputProperties() only returns properties that have been explicitly set using setOutputProperties().
6.2.1/010 In an XPath expression, Saxon reports no error when whitespace is used between a "$" sign and the following variable name. No space is allowed in this position.
6.2.1/011 In an XPath expression, Saxon reports no error when a colon is used between a function name or node-type name and the following left parenthesis, if it is separated from the function name by whitespace. For example, no error is reported for "true :()" or "node :()".

Changes in version 6.2.1 (2001-02-20)

Support for Running Saxon in an Applet: I have shamelessly copied the XSLTProcessorApplet module from Xalan, which was written to run any TrAX processor from a Java applet, and have adapted it to work with Saxon. The only changes were to remove a call on a Xalan error-handling routine, and to change the package name. I have also copied and adapted the Xalan sample application which shows how to incorporate this applet into an HTML page. To run a transformation using Saxon requires saxon.jar to be downloaded to the client. At 550Kb this is fairly substantial.

There are some sample applications using Saxon as an applet in the samples/applet folder.

It is now possible to specify the CharacterSet class to be used for a named output encoding by setting the system property, e.g. -D"encoding.EUC-JP"="EUC_JP"; the value of the property should be the name of a class that implements the PluggableCharacterSet interface.

Saxon has been modified to work with FOP 0_17_0; it no longer works with earlier versions of FOP. This has required some extensions to the Emitter interface, to cater for the fact that FOP now requires an OutputStream rather than a Writer as its output destination. Note also that FOP attempts to load Xerces as its default XML parser; if you want to use Saxon's AElfred parser istead, set the system property -Dorg.xml.sax.parser=com.icl.saxon.aelfred.SAXDriver. To run FOP, include the supplied JAR files fop.jar and w3c.jar on your classpath (FOP uses the DOM SVG package which is not included in saxon.jar).

Defects cleared

The following errors were reported for version 6.2, and have been cleared:

6.2/001 When no implementation of an extension element is available, a compile-time error is reported, whether or not the element is actually instantiated. (Circumvention: add the attribute xsl:version="99" to the extension element).
6.2/002 When no implementation of an extension function is available, a compile-time error is reported, whether or not the function is actually instantiated. (Circumvention: add the attribute xsl:version="99" to a literal result element enclosing the call on the offending function).
6.2/003 The sample extension for SQL provides no way of closing the database connection. With some configurations, this leads to updates being lost. I have therefore added another extension element, sql:close.
6.2/004 If a user-supplied URIResolver is registered with the TransformerFactory, it is not used when resolving the URI contained in the href pseudo-attribute of the xml-stylesheet processing instruction.
6.2/005 When the namespace attribute of xsl:element or xsl:attribute evaluates to an empty string, the specification states that the namespace of the resulting element or attribute should be null. Saxon wrongly generates a namespace declaration of the form xmlns:prefix="".
6.2/006 Under some circumstances using a local variable in an expression constructed using saxon:evaluate() or saxon:expression() fails, saying the variable has not been declared. The failure only occurs when the xsl:variable element declaring the variable is a sibling of the element containing the attribute containing the call on saxon:evaluate or saxon:expression. You can therefore circumvent the problem by wrapping the relevant element inside <xsl:if test="true()">.
6.2/007 If the context node is an attribute node or namespace node, the preceding and following axes (like preceding-sibling and following-sibling) are empty.
6.2/008 The output properties set using xsl:output in the stylesheet are not accessible using the getOutputProperty() method of the Transformer. (Circumvention: they are available from the getOutputProperty() of the Templates object).
6.2/009 Calling an external function that declares an argument of type org.w3c.dom.NodeList may fail with an exception, if the node-set supplied in the function call has not been fully evaluated (specifically, if it is a NodeSetIntent).
6.2/010 Saxon does not report an error when the stylesheet contains two conflicting definitions of the default decimal format; it simply uses the one that comes last.
6.2/011 Saxon reports inadequate diagnostics when an XML parsing failure occurs while looking for an xml-stylesheet processing instruction: specifically, if the failure is a "file not found" error that arises while resolving references to external entities or to the document's external DTD. The only output is the message "TrAX Transform Exception".

Changes in version 6.2 (2001-02-06)

Towards XSLT 1.1

The xsl:script element is now available. It is ignored unless the language is "java". This element can be used to identify the Java class implementing an extension function as defined in the XSLT 1.1 specification. The archive attribute can be used to specify a list of URLs to be searched, but only with a JVM that supports JDK 1.2 interfaces (i.e. not with the Microsoft JVM, and therefore not with Instant Saxon). NOTE: the rules for selecting a method within this class are unchanged. In particular, where there are several methods with the same name and number of arguments, it is not predictable which will be chosen. The native Saxon techniques for identifying a Java class will continue to be used if there is no xsl:script element for the relevant prefix, with one exception: the form xmlns:prefix="fully.qualified.ClassName" is no longer supported; use xmlns:prefix="java:fully.qualified.ClassName" instead.

The element name saxon:script can be used as a synonym of xsl:script. The advantage of using saxon;script is that other processors will ignore it. This allows you to define the way Saxon will implement an extension function which may be different from the way other processors implement it. This is epecially useful if your stylesheet uses functions such as xx:intersection() which are now offered by several different XSLT processors. Note that the built-in Saxon extension functions are all implemented in the same way as user extension functions, in class com.icl.saxon.functions.Extensions; so you can use src="java:com.icl.saxon.functions.Extensions" to locate the Saxon implementation of these functions.

The Saxon class com.icl.saxon.Context now implements the org.w3c.xsl.XSLTContext interface, as defined in the XSLT 1.1 working draft. This can now be used as the first argument of a method that implements an extension function (but you can continue to use com.icl.saxon.Context if you prefer). A consequence of this change is that getContextNode() and getCurrentNode() now return a org.w3c.dom.Node rather than a com.icl.saxon.om.NodeInfo; if you want to use Saxon methods on the returned node, you will have to cast it to a NodeInfo. Note that although the getOwnerDocument() method of XSLTContext is implemented, the resulting document will not be updateable.

The xsl:apply-imports element may now take parameters, that is, it may have child xsl:with-param elements.

The xml:base attribute is implemented. This can be used to change the base URI of an element (in either the source document or the stylesheet) for the purposes of the document() function. A new extension function is provided (largely for diagnostic purposes): saxon:base-uri() returns the base URI of the context node. Note that the terms "base URI" and "system ID" have in the past been used synonymously. This has been tidied up. The System ID refers to the entity (ie. file) in which an element was found, and is useful for diagnostics in conjunction with the line number. The Base URI defaults to the System ID, but may be changed using xml:base, and is used for resolving relative URIs appearing in calls to document() or to xsl:include and xsl:import.

If you supply your own URIResolver, you can use the base URI any way you like. For example, if the relative URI is the key of a record in a database, you could use the base URI to hold information identifying the database, e.g. the JDBC connection details.

Performance

I have changed the algorithm used for generate-id(). The existing algorithm was very inefficient, which was proving a problem with Muenchian grouping algorithms that rely on this function. It performed particularly badly when using the tinytree data structure with a large source document. The new algorithm is much faster, especially with the tinytree structure. It produces different results from the old algorithm, and is different for the two tree implementations.

Defects cleared

The following errors were reported for version 6.1, and have been cleared:

6.1/001 Tail recursion is invoked when it should not be, for example if an xsl:call-template instruction is issued from within a literal result element. Present since Saxon 5.3.
6.1/002 A null pointer exception occurs after reporting the absence of the select attribute on the xsl:value-of instruction. The same error occurs in a number of other cases where absent attributes are reported. Present since Saxon 6.1.
6.1/003 An ArrayIndexOutOfBounds exception occurs in method outputNamespaceNodes when processing a large source document using the tinytree model. Present since Saxon 6.0.
6.1/004 When running in forwards compatibility mode (i.e. when the version attribute on xsl:stylesheet is not 1.0 or 1.1), unknown XSL elements appearing as top-level elements should be ignored. Instead, an error is reported.
6.1/005 When the outermost element of the stylesheet does not declare the XSLT namespace (for example, because it declares the Microsoft WD-xsl namespace instead), no specific diagnostics are output, just the message "Transformation failed". Present since Saxon 6.1.
6.1/006 The first namespace node for an element (typically the XML namespace) has the same internal identifier as it parent element, which means that when a node-set containing a mixture of element and namespace nodes is constructed, one of these will be wrongly eliminated as a duplicate. The problem applies only to the tinytree model. Present since Saxon 6.0.
6.1/007 When loading secondary input documents using the StandardURIResolver, the AElfred parser may be used rather than the one nominated to the TransformerFactory. Present since Saxon 6.1.
6.1/008 The logic for using the current directory as the fallback for resolving relative URIs when no other base URI is available fails on UNIX systems where the current directory is returned with a trailing "/". Present since Saxon 6.1.
6.1/009 With the TrAX API, when the result of a transformation is a DOMResult, if no user-created DOM was specified using setNode(), the processor is supposed to create the DOM document itself. No attempt is made to do so, instead Saxon fails with a null pointer exception. Present since Saxon 6.1.
6.1/010 The XPath expression //abc:xyz returns no nodes. This happens with the tinytree model only, when there is a non-null namespace URI. Present since Saxon 6.0.
6.1/011 The call TransformerFactory#getTransformerHandler() (with no arguments), which should return an identity transformer packaged as a SAX ContentHandler, returns an object that is not useable. Present since Saxon 6.1.
6.1/012 Errors occur when several Transformers derived from the same Templates object are run concurrently in multiple threads. (The problem is that they share the same Stripper, and this is used to hold information specific to the transformation).
6.1/013 The SAX2 driver for the AElfred parser always reports the first two arguments of the endElement() call (the namespace URI and prefix) as empty strings. When the parser is used within Saxon this has surprisingly few ill-effects; the only ones I am aware of are (a) when the an element with a non-null namespace is named in saxon:preview, and (b) when doing an identity transformation using the JAXP 1.1 interface. Present since Saxon 5.3
6.1/014 No error is reported when xsl:copy-of is used as a top-level element. (At 6.1 the instruction is executed "successfully", placing its output at the start of the output file. At previous releases a NullPointerException occurs).
6.1/015 When xsl:copy-of is used to copy a result tree fragment, and a top-level element in the result tree fragment uses the default namespace (xmlns=""), but the result tree at that point uses the default namespace with a non-null URI (xmlns="xxx"), then no namespace undeclaration (xmlns="") is written to the result tree, causing the top-level element to be in the wrong namespace.
6.1/016 When xsl:document attempts to create not only the output file but the directory it is in, using a Java VM earlier than JDK 1.2 (but not the Microsoft Java VM), it crashes with the message "java.lang.NoSuchMethodError: java.io.File: method createNewFile()Z not found".
6.1/017 If a call is made within an XPath predicate to an extension function that uses context information, in particular the saxon:evaluate() extension function, the call may fail with a null pointer exception.
6.1/018 Errors in the [xsl:]exclude-result-prefixes and [xsl:]extension-element-prefixes attributes (for example, use of an undeclared namespace prefix) are poorly reported. In some cases the error triggers a null pointer exception, in others it is reported with an unhelpful message, and in some cases it is not reported at all.
6.1/019 With output method HTML, if elements are output as children of a script or style element, output escaping is switched on for that part of the script or style text that follows such an element. It should remain off for all the contents of the script or style element.
6.1/020 A null pointer exception occurs when reporting an ambiguous template rule match, when one of the matching patterns is a simple node test such as "node()".
6.1/021 If an unsupported encoding is requested, Saxon correctly reverts to UTF-8, but the encoding specified in the XML declaration (or the HTML META element) of the output file is the one that was requested, not UTF-8 as actually used.

Changes in version 6.1 (2001-01-09)

Towards XSLT 1.1

The saxon:output element is renamed xsl:document, and its file attribute is renamed href. (At this stage, though, it still takes a filename rather than a URI). The next-in-chain attribute is renamed saxon:next-in-chain and is now available on both xsl:output and xsl:document. The href attribute is mandatory: if saxon:next-in-chain is also present, ot determines the destination of the output of the chained stylesheet. The indent attribute must now be either "yes" or "no"; the previous option to specify the level of indentation is now replaced by saxon:indent-spaces="integer", on both xsl:output and xsl:document. The omit-meta-tag and character-representation attributes, similarly, are prefixed "saxon:" and are available on both elements.

The xsl:output element (like xsl:document) now allows all its attributes to be specified as attribute value templates.

A side-effect of this change is that xsl:output properties are now ignored when running in preview mode, because the properties cannot be evaluated until the source document is available.

The saxon:user-data attribute of saxon:output is removed. Instead, any number of user-defined attributes may be defined on both xsl:output and xsl:document. These attributes must have names in a non-null namespace, which must not be either the XSLT or the Saxon namespace. These attributes are interpreted as attribute value templates. The value of the attribute is inserted into the Properties object made available to the Emitter handling the output; they will be ignored by the standard output methods, but can supply arbitrary information to a user-defined output method. The name of the property will be the expanded name of the attribute in JAXP format, for example "{http://my-namespace/uri}local-name", and the value will be the value as given, after evaluation as an attribute value template.

The special provisions in XSLT 1.1 for defining what happens when you use xsl:document while the current output destination is a temporary tree are not yet implemented.

URI handling

The standard URI resolver now accepts URIs containing a fragment identifier. The fragment identifier must be the value of an ID attribute within the referenced XML document. The effect is to return a tree containing the subtree rooted at the element with that id. This facility works for URIs contained in the document() function and in xsl:include and xsl:import. If there is no element with the required ID, an empty tree is returned (i.e. a root node with no children).

As a result, embedded stylesheets are now working again. In fact, there is no special code to handle embedded stylesheets: anywhere a stylesheet module can be referenced by URI (including the command line, the xml-stylesheet processing instruction, and the href attribute of xsl:include and xsl:import), a URI containing a fragment identifier can be used, and this will select the relevant subtree in the same way as for any other XML document

In response to complaints about Saxon incompatibility with Xalan, and in order to get the JAXP 1.1 example programs working, I have changed the behaviour of both the AElfred SAX2 driver, and the SAXON standard URI resolver, so that if no systemId is specified for a document, then relative URIs are interpreted relative to the user's current directory. Equally, if the base systemId specified for the document is a relative URI, this is expanded using the current directory as the base. Arguably this behaviour is non-compliant with the SAX2 specification, which states that the systemId must be an absolute URI, but it seems to be a useful convenience.

This means that every document, and every node, now has a base URI: it can never be null. A minor side-effect is that I have withdrawn the ability for saxon:node-set() to take a string (or number, or boolean) as an argument: it must now be a result tree fragment or an existing node-set. The reason is that there is no obvious way of constructing a base URI.

JAXP 1.1

Saxon 6.1 implements the new TrAX interface, now defined as part of JAXP 1.1: see JSR-63. Saxon implements the javax.xml.transform interfaces. Saxon does not implement (or use) the javax.xml.parsers interfaces.

This has involved fairly extensive changes to the Java API for invoking Saxon. Some of the main implications are:

Effect on Java-only applications

I have tried to minimize the impact of the TrAX changes on Java-only applications, but inevitably some incompatible changes have crept in. The main ones are:

In future I want to align the Java-only processing model more closely with TrAX, so that the set of processing rules defining the transformation becomes another kind of Templates object.

Defects cleared

The following errors were reported for version 6.0.2, and have been cleared:

6.0.2/001 For the TinyTree tree model, the method getDocumentElement() always returns null.
6.0.2/002 A recurrence of 6.0.1/014: the same code was present in three different places, and only one of them was corrected.
6.0.2/003 When Saxon input is supplied as a DOM, CDATA section nodes and entity reference nodes are ignored: their contents are simply omitted from the input.
6.0.2/004 A null pointer exception is reported if the stylesheet contains a template rule whose match pattern is of the form id('abc'), and the source document contains no node with identifier "abc".
6.0.2/005 The method com.icl.saxon.tree.AttributeCollection#getLocalName returns the QName of the attribute, not the local part of the name. This causes the local-name() function when applied to a namespace-qualified attribute node to return the wrong result.
6.0.2/006 An attempt to access the last comment in the source document using xsl:value-of, xsl:copy, etc, will fail if the data part of the processing instruction is zero length. The failure occurs with the Microsoft JVM but not with JDK 1.3. Fails with the tinytree model only. (Present since 6.0; see also 6.0.1/008)

Changes in version 6.0.2 (2000-12-08)

Defects cleared

The following errors were reported for version 6.0.1, and have been cleared except where otherwise noted:

6.0.1/001 When a template is called recursively to obtain a default value for one of its own parameters (i.e. within <xsl:param>), the wrong result may be returned. This is because tail recursion is invoked when it should not be. (Bug also present in 5.5 and earlier releases).
6.0.1/002 An array bound exception will occur when processing a document with a stylesheet that uses more than 100 namespace URIs or namespace prefixes. Present since 6.0
6.0.1/003 When a key is defined with match="@*", nothing will be retrieved. The problem also applies to some other patterns that can match attributes, for example match=" name | @name ". (Possibly present in 5.5 and earlier releases - unconfirmed)
6.0.1/004 The extension functions saxon:set-user-data() and get-user-data() do not work correctly with the TinyTree model. They may also fail with the standard tree model if the context node is an attribute or namespace. This is because the code relies on a one-to-one mapping of XPath nodes to Java objects. (Present since 6.0)
6.0.1/005 Not a bug.
6.0.1/006 When attribute value templates are used in the attributes of xsl:sort, for example ascending="{$asc}", then the values used are those that apply the first time the sort occurs; if subsequent sorts have different values for the parameters, these are ignored. This is true even if the subsequent sort takes place in a later transformation using the same PreparedStyleSheet. (Also applies to 5.5 and earlier releases).
6.0.1/007 saxon:output and other Saxon extension elements do not allow the xsl:extension-element-prefixes attribute to appear on the extension element itself. (Present since 6.0)
6.0.1/008 An attempt to access the last processing instruction in the source document using xsl:value-of, xsl:copy, etc, will fail if the data part of the processing instruction is zero length. The failure occurs with the Microsoft JVM but not with JDK 1.3. (Present since 6.0)
6.0.1/009 Running a transformation using the Transformer.getInputContentHandler() method fails saying that the same NamePool must be used for the StyleSheet and the source document. (Present since 6.0)
6.0.1/010 The code that searches for an xml-stylesheet processing instruction displays unintended trace information on System.err.
6.0.1/011 When xsl:apply-imports is called and there is no explicit imported template rule to invoke, Saxon does a no-op; the correct action is to invoke the built-in template rule for the current node. (Bug present in all previous releases).
6.0.1/012 If the value attribute to xsl:number is not an integer, Saxon truncates it towards zero rather than rounding it as specified. (Bug present in all previous releases).
6.0.1/013 With the TinyTree model, selecting a namespace node using //e/namespace::n doesn't work. Selecting all namespace nodes using namespace::* is OK. (Present since 6.0)
6.0.1/014 An array bound check failure may occur in routine com.icl.saxon.tinytree.TinyElementImpl.makeAttributeNodeFS() when searching for the last attribute node in the document. (Present since 6.0)

Integration with FOP has been restored. Saxon now works with FOP version 0_15_0.

NamePools: I have changed the approach, so that instead of making a copy of the stylesheet name pool for each transformation, the name pool is now shared (which means its updating methods are now synchronized, to ensure thread-safety). This shouldn't affect most users, unless you are manipulating NamePools explicitly. It is still possible to have multiple name pools, but you now need to organise any copying yourself if this is what you want to do. For 99% of users, it should be possible to ignore NamePools entirely and just leave the system to use the single default name pool all the time.

The following changes are for conformance with the (imminent) XSLT 1.0 errata:

Changes in version 6.0.1 (2000-11-28)

Defects cleared

The following errors were reported for version 6.0, and have been cleared except where otherwise noted:

6.0/001 When xsl:copy-of is used to copy attributes with no namespace prefix, and the owning element has a default namespace declaration (xmlns="xyz"), then an invalid prefix is generated for the attributes.
6.0/002 The PreparedStyleSheet object is not serially reusable. A new NamePool needs to be allocated each time it is used.
6.0/003 A performance bug: in the match pattern row[id=1234] the predicate is not recognized as a boolean predicate, therefore the pattern matching code determines the position of the row relative to its siblings on the assumption that it needs this information. If there are a large number of <row> siblings this gives a severe performance hit.
6.0/004 The function-available() function returns false for a method that exists but that requires one or more arguments.
6.0/005 The element-available() function crashes (with a diagnostic print of the name pool contents) if the supplied name is one that is not used in the stylesheet and is not a known XSL or Saxon instruction.
6.0/006 With the TinyTree tree model, finding the descendants of a node that has neither descendants nor following-siblings produces incorrect results.
6.0/007 DTDGenerator won't compile: no name pool is supplied to RuleManager
6.0/008 In the SQL sample application, the last row is not written to database. (This reported bug has not yet been investigated)

Other changes

Warning messages (issued typically when a node matches more than one template rule) are now limited in number: only the first 25 are displayed.

Changes in version 6.0 (2000-11-17)

In Saxon 5.5, I introduced a change that allows a result-tree-fragment to be implicitly converted to a node-set. I did this in anticipation of changes in XSLT 1.1, and to allow interoperability with MSXML3. However, Microsoft have now withdrawn this facility and conform fully to the XSLT 1.0 rules, so in order to protect Saxon's reputation for 100% conformance, I have decided to withdraw the facility too. It can still be used, however, if the stylesheet specifies version="1.1". For more details, see Conformance

Defects in version 5.5.1

The following errors are cleared in version 6.0:

5.5.1/001 When xsl:copy-of is used to make a copy of an element node that has no attributes or namespace declarations of its own, the namespace nodes inherited from its ancestor elements are not copied to the result tree. (Present since 5.5)
5.5.1/002 In some Java environments (ServletExec) the current method for dynamic loading of classes fails. The fix to this detects this failure and reverts to the simple pre-JDK 1.2 method.
5.5.1/003 When <xsl:namespace-alias> is used, Saxon uses the new (result-prefix) prefix and the new URI in the output. A careful reading of the spec suggests that it should use the old (stylesheet-prefix) prefix with the new URI. (The term "result-prefix" is thus a misnomer).
5.5.1/004 An ArrayIndexOutOfBounds exception occurs if the match pattern "@comment()" (or "@text()" or "@processing-instruction()") is used in an xsl:template rule. Such a pattern is meaningless (it will never match any nodes) but entirely legal.
5.5.1/005 Saxon does not report an error if two sibling <xsl:with-param> elements specify the same parameter name.
5.5.1/006 Where conflicting <xsl:strip-space> and <xsl:preserve-space> elements occur in the stylesheet, Saxon gives greater weight to the priority of the pattern than to its import precedence. So <xsl:strip-space elements="ns:item"> in an imported stylesheet will incorrectly override <xsl:preserve-space elements="ns:*"> in the importing stylesheet.
5.5.1/007 A null pointer exception can occur in the AElfred parser when attempting to access an XML file using a URL, if the resource accessed by the URL is found but its encoding is unknown.
5.5.1/008 A null pointer exception can occur when evaluating a variable reference within the arguments to an extension function that is called within the predicate of a filter expression.
5.5.1/009 When running in fowards-compatible mode, Saxon incorrectly rejects XSL elements that contain an attribute other than those defined in XSLT 1.0.
5.5.1/010 When xsl:copy is applied to an attribute, text node, comment, or processing instruction, the content of the xsl:copy element should be ignored. It isn't.
5.5.1/011 When output to a DOM Node is requested in the TrAX API, this is ignored if an output method is specified in an xsl:output element of the stylesheet. The output is sent to the standard output stream instead. The xsl:output element should be ignored.
5.5.1/012 When a top-level element such as xsl:output is used within a template, it is reported as an error. This happens even when processing in forwards-compatible mode (e.g. when version="1.1"). In this case fallback processing (xsl:fallback) should be invoked.
5.5.1/013

not yet fixed

When the first argument to the document() function is a result tree fragment, Saxon takes the Base URI (for resolving the URI if it is relative) as if the argument were a string. The intention of the specification, though not clearly stated, is that the Base URI should be calculated as if the argument were a node-set. That is, if the argument is $tree and $tree is defined by <xsl:variable name="tree">doc.xml</xsl:variable>, then the Base URI should be that of the xsl:variable element, not that of the element containing the call on the document() function.

New XSLT facilities at version 6.0

Added support for two new output encodings on xsl:output: iso-8859-2 and cp1250.

Added two attributes to xsl:output (not yet available in saxon:output):

Added a new extension function saxon:showNodeSet(). It takes a single argument that is a node-set, produces a diagnostic print of the node-set on System.err, and returns an empty string.

Added an extension function saxon:getContext() to get the context object. Only really intended for diagnostic use.

Command line changes

Added an option to choose the tree implementation (see below): -ds for the standard tree, as used in previous releases, -dt for the "tinytree" which is new to this release. The tinytree is the default: it takes up less memory, is faster to build, and generally appears to perform better in most circumstances.

The -a option on the stylesheet, which causes the source document to be processed using the stylesheet identified from its xml-stylesheet processing instruction, now uses the same logic as the getAssociatiedStylesheets() method in the TrAX interface. This means multiple (cascading) stylesheets are now supported. However, embedded stylesheets (identified by href="#id" in the xml-stylesheet processing instruction) are not supported at this release.

Java API changes

There have been a great many internal changes, but relatively few that impact directly on the high-level transformation API. In particular, if you only use TrAX interfaces, there are no changes. Otherwise, the main points to note are:

Internal Changes

These details should only affect you if you access intimate internal interfaces or use the Saxon source code.

There are two big changes to the internals of Saxon at this release: a new implementation of the tree structure, and a new system for handling names.

The tinytree implementation

I have introduced an alternative tree implementation (called "tinytree"). This is designed to reduce the number of Java objects created: the tree is sliced vertically rather than horizontally, so instead of having one Java object per node, there is one Java array for each property of the nodes, with an entry in the array for each node. The effect is to greatly reduce the Java memory management overheads. The existing tree structure remains available, and is always used for the stylesheet tree. It is also currently always used for the intermediate result tree created when saxon:output next-in-chain is used.

To select the standard tree structure, use -ds on the command line. To select the "tinytree" structure, use -dt. The default is -dt. You can also select the tree structure using a method on the Controller class.

The tinytree is smaller than the standard tree, as the name suggests, and it is also faster to build. However, it may be slower to navigate. So if you have a small document that is built once in memory and used repeatedly, the standard tree implementation is probably better. In other cases, however, the tinytree usually wins.

Name pools

I have made radical changes to the way names are managed. Previously, the NamePool object contained a pool of names, but its only real purpose was to avoid the memory overhead of storing each name many times. Now, Saxon takes advantage of the NamePool to avoid storing references to Name objects on the tree at all: instead it stores a "namecode": an integer which can be used to identify the name within the NamePool.

A namecode has 4 bits unused, 8 bits representing the prefix, and 20 bits acting as a pointer to an entry in the namepool containing the local name and namespace URI. Two names are therefore equal if the namecodes are the same in the bottom 20 bits. The value in these 20 bits is also referred to as the fingerprint of the name.

All searching for objects by name is now done by comparing fingerprints; no string comparisons are involved. Fingerprints are used not only for matching names used in XPath expressions to refer to the source document, they are also used for all matching of names within a stylesheet, for example variable names, template names, mode names, key names, and decimal format names.

The name pool is also used for storing namespace declarations: each prefix/URI pair is allocated a namespace code, and all manipulation of namespace nodes in the tree is done using these integer codes.

A consequence of this is that all documents used in a transform must use the same NamePool. This has some implications on the Java API. With simple use of the API, you needn't worry about name pools, they will be taken care of automatically. However, if you are operating a continuously running service in which both source documents and stylesheets are cached in memory, you may need to exercise some care to specify the right NamePool when each document is built.

The model is further complicated by multi-threading. Rather than have synchronization problems with multiple threads updating the same NamePool, the NamePool used to build the stylesheet is copied (imported) into the NamePool used to build the source document, before parsing of the source document starts. When you use the transform() method to parse and transform an InputSource, this happens automatically. However, if you want to build the document yourself, and transform it using transformDocument() (which allows you to run more than one transformation on the same document), then you must manage the NamePool merging yourself. The system does include checks that the NamePools for the stylesheet and source document are compatible, though these are not completely foolproof.

The use of namecodes rather than String names has affected many internal interfaces, and some of these are interfaces that are also exposed externally. For example, the ParameterSet object which is used to pass parameters from a calling template to a called template can also be used to supply global parameters to the Transformer. The parameters in a parameter set are now identified by an integer fingerprint rather than a string name. You can get the integer namecode from the NamePool using the getFingerprint() method; alternatively use the TrAX method addParameter(), which still takes the name as a String.

The Emitter interface has also changed to use name codes; if you have written your own Emitter, the code will have to be modified.

Other changes

The classes and interfaces used in Saxon for manipulating collections of attributes now implement the SAX2 Attributes interface.

The standard XPath functions have been extensively revised. The main change, apart from tidying up the code, is that the functions are now responsible for evaluating their own arguments, which enables some optimisation, especially when the arguments are node-sets: they can now be evaluated using knowledge of the data type required. For example, the not() function now stops as soon as the first node in the argument node-set is found.

Some of the little-used methods on the NodeInfo interface have been moved as static methods to a separate helper class, com.icl.saxon.om.Navigator. This enables the code of these methods to be independent of the particular tree implementation.

The delayed evaluation of path expressions now works as follows: on the first two occasions that a path expression is evaluated, it navigates the source tree. On the third occasion, it saves the resulting node-set in memory. On subsequent uses, the result is retrieved from memory. This approach is designed to balance time against memory usage.

The optimisation of "//name" as "/descendant::name" (which is possible when there are no predicates) wasn't working in 5.5 (or for a while before that), causing an unnecessary sort. This has been corrected. In addition, the first time "//name" is used for a particular document, the results are now saved, and all subsequent uses of "//name" for the same document retrieve the results from memory. This means that the traditional assumption that "//name" is inefficient may no longer always be true.

A Sequencer class has been introduced for allocating globally-unique sequence numbers. There are two such sequences, one for document numbers, and one for node numbers. By default, two sequencers are created when Saxon is loaded, and remain in use until it is unloaded. However, it is now possible to reset the sequence numbering if required, either to prevent running out of numbers in a long-running server, or to ensure repeatability of the value of generate-id(). The result of generate-id() depends on the document number, and you can restart the sequence of document numbers by calling controller.setDocumentSequencer(new com.icl.saxon.om.Sequencer()). It is the caller's responsibility to ensure that this does not cause two documents that are in use at the same time to have the same number. The node sequence number is used when sorting nodes into document order, and when eliminating duplicates in a union operation. You can similarly allocate a new sequence using controller.setNodeSequencer().

Added an optimization for recursive processing of a node-set: the predicate "[position() > 1]" is now recognized and handled specially, allowing pipelined execution and reducing memory requirements.

Removed getAttributeValue(Name), replaced it with getAttributeValue(String uri, String localName). This is more efficient: in many cases it removes the need to construct the Name object and then take it apart. Attributes can also be found using the integer fingerprint of the name.

The Name class is no longer used for holding expanded names, it now serves merely as a container for a couple of static methods for name validation.

NameTest and its subclasses have been reorganised. There is a new class NodeTest which is a subclass of Pattern; it performs the test on node-type and node-name supporting a node-test in XPath. This test is context-free. As well as replacing the NameTest class, it also replaces NodeTypePattern and NamedNodePattern. The NodeTest is now used on a Step, and on an Axis, replacing the previous combination of a NameTest and a node type. These tests are also used in testing which nodes are candidates for whitespace stripping.

The interface between the Step and Axis classes and the expression parser has been much simplified.

Michael H. Kay
3 July 2001