XSLT as a Language
What are the most significant characteristics of XSLT as a language, which distinguish it from other languages? In this
section I shall pick three of the most striking features: the fact that it is
written in XML syntax, the fact that it is a language free of side-effects,
and the fact that processing is described as a set of independent
pattern-matching rules.
Use of XML Syntax
As we've seen, the use of SGML syntax for stylesheets was
proposed as long ago as 1994, and it seems that this idea gradually became the
accepted wisdom. It's difficult to trace exactly what the overriding arguments
were, and when you find yourself writing something like:
<xsl:variable name="y">
<xsl:call-template name="f">
<xsl:with-param name="x"/>
</xsl:call-template>
</xsl:variable>
to express what in other languages would be written as « y = f(x); », then you may find
yourself wondering how such a decision came to be made.
In fact, it could have been worse: in the very early drafts,
the syntax for writing what are now XPath expressions was also expressed in
XML, so instead of writing select="book/author/first-name"
you had to write something along the lines of:
<select>
<path>
<element type="book">
<element type="author">
<element type="first-name">
</path>
</select>
The most obvious arguments for expressing XSLT stylesheets
in XML are perhaps:
There is already an XML parser in the browser, so it keeps the footprint
small if this can be re-used.
Everyone had got fed up with the
syntactic inconsistencies between HTML/XML and CSS, and didn't want the same
thing to happen again.
The syntax of DSSSL was widely seen as a barrier to its adoption; better to have a syntax that was
already familiar in the target community.
Many existing popular templating languages
are expressed as an outline of the output document with embedded instructions,
so this is a familiar concept.
All the lexical apparatus is reusable, for example Unicode support, character and entity
references, whitespace handling, namespaces.
It's occasionally useful to have a stylesheet as the input or output of a transformation (witness the
Microsoft XSL converter as an example), so it's a benefit if a stylesheet can
read and write other stylesheets.
Providing visual development tools
easily solves the inconvenience of having to type lots of angle brackets.
Like it or not, the XML-based syntax is now an intrinsic
feature of the language that has both benefits and drawbacks. It does require a
lot of typing: but in the end, the number of keystrokes has very little bearing
on the ease or difficulty of solving particular transformation problems.
No Side-effects
The idea that XSL should be a declarative language free of
side-effects appears repeatedly in the early statements about the goals and
design principles of the language, but no-one ever seems to explain why: what would be the user benefit?
A function or procedure in a programming language is said to
have side-effects if it makes changes to its environment, for example if it can
update a global variable that another function or procedure can read, it can
write messages to a log file, or prompt the user. If functions have
side-effects, it becomes important to call them the right number of times and
in the correct order. Functions that have no side-effects
(sometimes called pure functions) can be called any number of times and in any
order. It doesn't matter how many times you evaluate the area of a triangle,
you will always get the same answer; but if the function to calculate the area
has a side-effect such as changing the size of the triangle, or if you don't
know whether it has side-effects or not, then it becomes important to call it
once only.
I expand on this concept in the
section on Computational Stylesheets in Chapter 8, page 545.
It is possible to find hints at the reason why this was
considered desirable in the statements that the language should be equally
suitable for batch or interactive use, and that it should be capable of progressive
rendering. There is a concern that when you download a large XML
document, you won't be able to see anything on your screen until the last byte
has been received from the server. Equally, if a
small change were made to the XML document, it would be nice to be able to
determine the change needed to the screen display, without recalculating the
whole thing from scratch. If a language has side effects then the order of
execution of the statements in the language has to be defined, or the final
result becomes unpredictable. Without side-effects, the statements can be
executed in any order, which means it is possible, in principle, to process the
parts of a stylesheet selectively and independently.
Whether XSLT has actually achieved these goals is somewhat
debatable. Certainly, determining which parts of the output document are
affected by a small change to one part of the input document is not easy, given
the flexibility of the expressions and patterns that are now permitted in the
language. Equally, all existing XSLT processors require the whole document to
be loaded into memory. However, it would be a mistake to expect too much too
soon. When E. F. Codd published the relational calculus in 1970, he made the
claim that a declarative language was desirable because it was possible to
optimize it, which was not possible with the navigational data access languages
in use at the time. In fact it took another fifteen years before relational
optimization techniques (and, to be fair, the price of hardware) reached the
point where large relational databases were commercially viable. But in the end
he was proved right, and the hope is that the same principle will also
eventually deliver similar benefits in the area of transformation and styling
languages.
What being side-effect free means in practice is that you
cannot update the value of a variable. This restriction is something you may
find very frustrating at first, and a big price to pay for these rather remote
benefits. But as you get the feel of the language and learn to think about
using it the way it was designed to be used, rather than the way you are
familiar with from other languages, you will find you stop thinking about this
as a restriction. In fact, one of the benefits is that it eliminates a whole
class of bugs from your code! I shall come back to this subject in Chapter 8,
where I outline some of the common design patterns for XSLT stylesheets, and in
particular, describe how to use recursive code to handle situations where in
the past you would probably have used updateable variables to keep track of the
current state.
Rule-based
The dominant feature of a typical XSLT stylesheet is that it
consists of a sequence of template rules, each of which describes how a
particular element type or other construct should be processed. The rules are
not arranged in any particular order; they don't have to match the order of the
input or the order of the output, and in fact there are very few clues as to
what ordering or nesting of elements the stylesheet author expects to encounter
in the source document. It is this that makes XSLT a declarative language: you say what output should
be produced when particular patterns occur in the input, as distinct from a
procedural program where you have to say what tasks to perform in what order.
This rule-based structure is very like CSS, but with the
major difference that both the patterns (the description of which nodes a rule
applies to) and the actions (the description of what happens when the rule is matched) are much richer
in functionality.
|
Example: Displaying a Poem
|
|
Let's see how we can use the rule-based approach
to format a poem. Again, we haven't introduced all the concepts yet, so I
won't try to explain every detail of how this works, but it's useful to see
what the template rules actually look like in practice.
Input
Let's take this XML source as our poem. The source file
can be found on the web site for this book at http://www.wrox.com,
under the name poem.xml,
and the stylesheet is there as poem.xsl.
<poem>
<author>Rupert Brooke</author>
<date>1912</date>
<title>Song</title>
<stanza>
<line>And suddenly the wind comes soft,</line>
<line>And Spring is here again;</line>
<line>And the hawthorn quickens with buds of green</line>
<line>And my heart with buds of pain.</line>
</stanza>
<stanza>
<line>My heart all Winter lay so numb,</line>
<line>The earth so dead and frore,</line>
<line>That I never thought the Spring would come
again</line>
<line>Or my heart wake any more.</line>
</stanza>
<stanza>
<line>But Winter's broken and earth has woken,</line>
<line>And the small birds cry again;</line>
<line>And the hawthorn hedge puts forth its buds,</line>
<line>And my heart puts forth its pain.</line>
</stanza>
</poem>
|
|
Output
We'll write a stylesheet such that this document appears in the browser as shown below:

Stylesheet
It starts with the standard header:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
Now we'll write one template rule for each element type in
the source document. The rule for the <poem>
element creates the skeleton of the HTML output, defining the ordering of the
elements in the output (which doesn't have to be the same as the input
order). The <xsl:value-of>
instruction inserts the value of the selected
element at this point in the output. The <xsl:apply-templates>instructions
cause the selected child elements to be processed, each using its own template rule.
<xsl:template
match="poem">
<html>
<head>
<title><xsl:value-of
select="title"/></title>
</head>
|
|
<body>
<xsl:apply-templates
select="title"/>
<xsl:apply-templates select="author"/>
<xsl:apply-templates select="stanza"/>
<xsl:apply-templates select="date"/>
</body>
</html>
</xsl:template>
The template rules for the <title>, <author>, and <date> elements are
very simple: they take the content of the element (denoted by «select="."»), and
surround it within appropriate HTML tags to define its display style:
<xsl:template
match="title">
<div align="center"><h1><xsl:value-of
select="."/></h1></div>
</xsl:template>
<xsl:template
match="author">
<div align="center"><h2>By <xsl:value-of
select="."/></h2></div>
</xsl:template>
<xsl:template match="date">
<p><i><xsl:value-of
select="."/></i></p>
</xsl:template>
The template rule for the <stanza>
element puts each stanza into an
HTML paragraph, and then invokes processing of the lines within the stanza,
as defined by the template rule for lines:
<xsl:template
match="stanza">
<p><xsl:apply-templates
select="line"/></p>
</xsl:template>
The rule for <line>
elements is a little more complex: if the position of the line within the
stanza is an even number, it precedes the line with two non-breaking-space
characters ( ).
The <xsl:if> instruction
tests a boolean condition, which in this case calls the position() function to determine the
relative position of the current line. It then outputs the contents of the
line, followed by an empty HTML <br>
element to end the line.
<xsl:template
match="line">
<xsl:if test="position() mod 2 =
0">  </xsl:if>
<xsl:value-of select="."/><br/>
</xsl:template>
And to finish off, we close the <xsl:stylesheet>
element:
</xsl:stylesheet>
|
Although template rules are a characteristic feature of the
XSLT language, we'll see that this is not the only way of writing a stylesheet.
In Chapter 8, I will describe four different design patterns for XSLT
stylesheets, only one of which makes extensive use of template rules. In fact,
the Hello World
stylesheet I presented earlier in this chapter doesn't make any real use of
template rules: it fits into the design pattern I call fill-in-the-blanks, because the stylesheet essentially contains the
fixed part of the output with embedded instructions saying where to get the data
to put in the variable parts.