XSLT Language Extensions
XSLT processor vendors are free to
add their own private extensions to the language. The XSLT
specification even specifies how they should indicate if an extension element
or extension function is supported by their implementation.
In the stylesheet, certain
namespaces can be specified to be XSLT extension namespaces with the xsl:extension-element-prefixes
attribute on the stylesheet element. Elements in those namespaces will be
processed using the extensions of the used processor.
If the stylesheet author wants to know if the processor
supports a certain extension element, the function element-available()
can be called with the element name as the parameter. If the processor supports
this element, the function should return true.
The same information can be retrieved about extension
functions using the function-available() function.
The IE5 Implementation
When Microsoft released Internet Explorer 5.0, it wanted to
ship with it an XML parser that conformed as much as possible to all
XML-related standards at that time. XSLT was at that time still a part of the
XSL working draft. The XSLT support in IE5 is based on the transformations
chapter in the working draft of December 1998. They did quite a good job, but
the specification moved on, split itself in two, and by now the MSXML
implementation is a very weak and non-compliant version of the now final
recommendation of XSLT 1.0. This IE5 implementation of MSXML is version 2.0.
Microsoft has announced that they will support the full
specification in a next release. When this book is available, at least a
developer's preview is available (called MSXML 2.6). This preview implements
the standards much better, but still a lot remains to be done. More information
can be found from http://msdn.microsoft.com/downloads/webtechnology/xml/msxml.asp.
The new implementation will support both the W3C XSLT 1.0
recommendation as well as the MSXML 2.0 implementation. Which implementation is
used depends on the namespace of the stylesheet elements. The MSXML 2.0
implementation uses the namespace:
In Appendix D, you can see for each element if it is supported
in IE5 (the MSXML 2.0 library). Here we will try to give you a notion of what is unsupported, what is ill-supported
and what works fine.
MSXML 2.0 does a good job on:
literal elements and attributes.
the element element, the attribute element, the comment
the choose, when
the for-each element.
the if element.
Some elements can be used in most cases, but fail to support
more complex uses or certain attributes. These include:
apply-templates: you cannot
use the mode
template: you cannot use the name attribute and the mode
attribute. The priority rules are not implemented (see the section entitled 'What
if Several Templates Match?').
processing-instruction: is called
in MSXML 2.0.
stylesheet: IE5 does not support any
of the attributes for the stylesheet element. Note that the version
attribute is defined as required in XSLT.
is not supported. See below
for undocumented tricks to do this anyway.
The XPath expressions that can be used in lots of places in
XSLT are only partially implemented. Basically only the shorthand notation is
supported. For details, see the XPath section earlier in this chapter.
The following elements are not supported in MSXML 2.0:
attribute-set (and the related attributes)
sort (MSXML 2.0 has implemented
attributes on some elements to allow sorting)
text (MSXML 2.0 has an undocumented
element that does more or less the same)
transform (which is the same thing as stylesheet
and a whole bunch of top level elements
Although this is a fairly long list, most of these
unimplemented elements are the kind you will rarely use anyway. Some of them,
however, are dearly missed.
Most of the specified additional functions that can
be used in XSLT are unsupported in IE5. At the same time, MSXML 2.0 features
some functions that can be very useful in overcoming these shortcomings.
There are some unsupported standard functions:
generate-id():MSXML 2.0 has a
function available called uniqueID() that can do the same
MSXML 2.0 has a formatNumber() function that works
almost identically, except for localization using a decimal-format
IE5 has a very powerful context() function. This can be used
to do the same. context(-1) is equivalent to current().
Tricks for using MSXML 2.0
Although MSXML 2.0 has some limitations compared to the full
XSLT specification, it is still a very useful transformation tool. When using
it, there are some problems that all developers stumble into. The developer
community has been looking for solutions and work-arounds for almost two years
now. These are a few of the most important ones.
Output Escaping Off
If you have an XML document containing a piece
of text that should appear literally in the output, you can run into trouble.
The XML parser and XSLT processor will replace some characters with XML
entities, to keep the output well-formed. That is fine, but sometimes we don't
care whether the output is well-formed, we just want that exact string to
appear in the output. The output is not supposed to be XML anyway – it might be
HTML. XSLT allows us to do so by using the disable-output-escaping on the value-of
elements. IE5 does not support disable-output-escaping, but it does
allow the use of an undocumented attribute: no-entities='true' on the eval
element. We can use this to generate unescaped content, for example, using the
<![CDATA[<BR>]]> Duynstee </Author>
with the following template:
This would generate this output:
Note that this is not well-formed XML, but that was exactly
what we where trying to do. But this also means that we must be very careful
using this feature. Note also that this feature is undocumented, so Microsoft
might remove it from future versions just when you least expect it.
IE5 does not support modes and calling templates by name, but it does allow something else: locally scoped
templates. These can be included as a child element of an apply-templates
element and the processor will try to use this template before any of the globally
scoped templates. Look at this sample:
The stylesheet has a template defined for use with Author
elements. It generates a b element with the Author
element's content in it. The root template performs two apply-template
actions on all authors in the source document. The first one will match on the
template for Author
elements and output the following:
The second apply-templates element has a
template defined locally. This local template also matches the selected nodes,
so the second apply-templates
element will generate:
Let's have a look at some more examples to demonstrate the use of XSLT. In the last part of this section,
we will look at using XSLT to style an XML document in HTML. There will be more
examples there. Here we will cover examples that are not HTML-related, but
targeted to converting one XML dialect into another. This will be a very common
case in business-to-business e-commerce, where XML documents containing orders,
inventories, product descriptions, etc., are sent automatically and converted
on the fly to a format that is suitable for the target system.
Product Information Import
Think of a system that retrieves product descriptions from several
suppliers to present users in the organization with a coherent view of all
available products. Some of these suppliers will have their product range
available in an XML format. In an ideal world, an agreement could be made with
all suppliers about the format used for delivering the data. Unfortunately, in
the real world suppliers will not be willing to do that, the user will have to
settle for what he can get. Some will conform to an industry standard but, in
the end, transformation from some other format to that which is required will
The format that can be natively imported by our application
looks like this:
234, Wood lane
The XML descriptions we receive from Clippers Inc look like
<FullName>Solid quality nail clipper, San Juanito
We want to transform this delivered format into our native
format using XSLT. We could create a stylesheet for the transformation like
[Name = 'Clippers Inc.']"/>
Let's have a look at the sample
little by little. There is only one template, matching the root. This template
contains a framework for the output document. The Product element and its ID child
element are inserted as literals. The value of the ID element is fetched from the source document, by inserting
the value of the product-reference attribute from the source. The same thing is done for the
name. We create a name
element with literals and insert a value from the source document in it. Note
that we chose to use the short name from the source and discard the long name.
The Product_category element is hard-coded. We expect only products in this
category from this supplier.
Now comes the hard part. The
supplier information is not provided in this case. Some suppliers will, some
will not. We could choose to hard-code the supplier information in the
stylesheet. But that would force us to update the stylesheet every time the
supplier changes its address or we get a new contact person. We decided to
store all supplier information in our own format in one file. While
transforming the document, the processor does a lookup in the supplier_lookup.xml document and copies a whole fragment from that document to
the destination document using copy-of.
Our second example is for a publishing company; all books are stored
in a giant XML document (in fact it is stored in a database, but this database
allows access to the data as if it were an XML document). A fragment of this
document looks like:
<title>Stranger in a strange land</title>
Note how the second book has several authors. For making an
overview of the most successful authors, the publisher wants to transform this
huge books file to something like this:
Authors will be ranked by the total number of copies of
books sold, and this should also determine their position in the document. So,
the best selling author in the books document should be the highest on the
list. This can be accomplished by this stylesheet:
Some things in this stylesheet are
worthy of further comment. First, note how the sum() and count()
functions are used, both in the author
template for calculating the number of publications and total number sold for
each author, and in the sort element
within the apply-templates element. Note
how the current() function is used to match the author-ref elements to the author
elements they refer to. An interesting thing to note is that the current() function within the apply-templates element refers to the current context after selecting the new set.
If the source document is large,
this stylesheet will probably take a long time to process. Many calculations
are done in counting and summing the nodes. In these counting actions, a lot of
searching is done on books that have an author-ref element with a certain ref attribute. We could also implement this using a key. If
the processor is optimized for using keys, this will speed things up
significantly (but I don't know of any such processor at the time of writing).
Even if it doesn't give us a performance gain (it still might in the future),
our code becomes somewhat cleaner. Then the stylesheet would look like this.
See if you can figure it out.
<xsl:value-of select="sum(key('books-by-author', @id)/sold)"/>
At the beginning of the document, we added an xsl:key
element. It is called 'books-by-author'. The key will give
us a direct access to a set of nodes from the source document. With the match
attribute we specify which nodes we want to be able to access. In our case, we
want access through the key to all book elements in the document (match="/publisher/books/book").
With the use attribute we specify the key value we want to use to access a book
element. This is apparently the ref attribute on the author-ref
child element(s) of the book (use="author-ref/@ref").
Now if we use the key() function anywhere in the
stylesheet like this:
This will return a node set containing all book
elements that have an author-ref child element with ref="rh".
Effectively these are all books by Robert Heinlein. Using this, we could
simplify some the expressions in the stylesheet significantly.