The History of XSL
Like most of the XML family of standards, XSLT was developed
by the World Wide Web Consortium (W3C), a coalition of companies orchestrated
by Tim Berners-Lee, the inventor of the web. There is an interesting page on
the history of XSL, and styling proposals generally, at http://www.w3.org/Style/History/.
Pre-history
HTML was originally conceived by Berners-Lee as a set of
tags to mark the logical structure of a document: headings, paragraphs, links,
quotes, code sections, and the like. Soon people wanted more control over how
the document looked: they wanted to achieve the same control over the
appearance of the delivered publication as they had with printing and paper. So
HTML acquired more and more tags and attributes to control presentation: fonts,
margins, tables, colors, and all the rest that followed. As it evolved, the
documents being published became more and more browser-dependent, and it was
seen that the original goals of simplicity and universality were starting to
slip away.
The remedy was widely seen as separation of content from
presentation. This was not a new concept; it had been well developed through
the 1980s in the development of Standard Generalized Markup Language
(SGML), whose architecture in turn was influenced by the elaborate (and
never implemented) work done in the ISO Open Document Architecture (ODA)
standards.
Just as XML was derived as a greatly simplified subset of
SGML, so XSLT has its origins in an SGML-based standard called DSSSL
(Document Style Semantics and Specification Language).
DSSSL (I pronounce it Dissel) was
developed primarily to fill the need for a standard device-independent language
to define the output rendition of SGML documents, particularly for high-quality
typographical presentation. SGML was around for a long time before DSSSL
appeared in the early 1990s, but until then the output side had been handled
using proprietary and often extremely expensive tools, geared towards driving
equally expensive phototypesetters, so that the technology was only really
taken up by the big publishing houses.
C. M. Sperberg-McQueen and Robert F. Goldstein presented an
influential paper at the WWW '94 conference in Chicago under the title A Manifesto for Adding SGML Intelligence to
the World-Wide Web. You can find it at: http://www.ncsa.uiuc.edu/SDG/
IT94/Proceedings/Autools/sperberg-mcqueen/sperberg.html.
The authors presented a set of requirements for a
stylesheet language, which is as good a statement as any of the aims that the
XSL designers were trying to meet. As with other proposals from around that
time, the concept of a separate transformation language had not yet appeared,
and a great deal of the paper is devoted to the rendition capabilities of the
language. There are many formative ideas, however, including the concept of
fallback processing to cope with situations where particular features are not
available in the current environment.
It is worth quoting some extracts from the paper here:
Ideally, the style sheet language should be declarative, not
procedural, and should allow style sheets to exploit the structure of SGML
documents to the fullest. Styles must be able to vary with the structural
location of the element: paragraphs within notes may be formatted differently
from paragraphs in the main text. Styles must be able to vary with the
attribute values of the element in question: a quotation of type
"display" may need to be formatted differently from a quotation of
type "inline". They may even need to vary with the attribute values
of other elements: items in numbered lists will look different from
items in bulleted lists.
At the
same time, the language has to be reasonably easy to interpret in a procedural
way: implementing the style sheet language should not become the major
challenge in implementing a Web client.
The semantics should be additive: It should be possible for users to
create new style sheets by adding new specifications to some existing (possibly
standard) style sheet. This should not require copying the entire base style
sheet; instead, the user should be able to store locally just the user's own
changes to the standard style sheet, and they should be added in at browse
time. This is particularly important to support local modifications of standard
DTDs.
Syntactically,
the style sheet language must be very simple, preferably trivial to parse. One
obvious possibility: formulate the style sheet language as an SGML DTD, so that
each style sheet will be an SGML document. Since the browser already knows how
to parse SGML, no extra effort will be needed.
We
recommend strongly that a subset of DSSSL be used to formulate style sheets for
use on the World Wide Web; with the completion of the standards work on DSSSL,
there is no reason for any community to invent their own style-sheet language
from scratch. The full DSSSL standard may well be too demanding to implement in
its entirety, but even if that proves true, it provides only an argument for
defining a subset of DSSSL that must be supported, not an argument for rolling
our own. Unlike home-brew specifications, a subset of a standard comes with an
automatically predefined growth path. We expect to work on the formulation of a
usable, implementable subset of DSSSL for use in WWW style sheets, and invite
all interested parties to join in the effort.
In late 1995, a W3C-sponsored workshop on stylesheet
languages was held in Paris. In view of the subsequent role of James Clark as
editor of the XSLT Recommendation, it is interesting to read the notes of his
contribution on the goals of DSSSL, which can be found at http://www.w3.org/Style/951106_Workshop/report1.html#clark.
What follows is a few selected paragraphs from these
notes:
DSSSL
contains both a transformation language and a formatting language. Originally
the transformation was needed to make certain kinds of styles possible (such as
tables of contents). The query language now takes care of that, but the
transformation language survives because it is useful in its own right.
Both
simple and complex designs should be possible, and the styles should be
suitable for batch formatting as well as interactive applications. Existing
systems should be able to support DSSSL with only minimal changes (a DSSSL
parser is obviously needed).
The
language is strictly declarative, which is achieved by adopting a functional
subset of Scheme. Interactive style sheet editors must be possible.
A DSSSL
style sheet very precisely describes a function from SGML to a flow object
tree. It allows partial style sheets to be combined ('cascaded' as in CSS):
some rule may override some other rule, based on implicit and explicit
priorities, but there is no blending between conflicting styles.30
James Clark closed his talk with the remark:
Creating
a good, extensible style language is hard!
One suspects that the effort of editing the XSLT
Recommendation didn't cause him to change his mind.
The First XSL Proposal
Following these early discussions, the W3C set up a
formal activity to create a stylesheet language proposal. The remit for this
group specified that it should be based on DSSSL.
As an output of this activity came the first formal proposal
for XSL, dated 21 August 1997. It can be found at http://www.w3.org/TR/NOTE-XSL.html.
There are eleven authors listed. They include five from
Microsoft, three from Inso Corporation, plus Paul Grosso of ArborText, James
Clark (who works for himself), and Henry Thompson of the University of
Edinburgh.
The section describing the purpose of the language is worth reading:
XSL is a stylesheet language
designed for the Web community. It provides functionality beyond CSS (e.g.
element reordering). We expect that CSS will be used to display
simply-structured XML documents and XSL will be used where more powerful
formatting capabilities are required or for formatting highly structured
information such as XML structured data or XML documents that contain
structured data.
Web authors create content at
three different levels of sophistication:
markup: relies solely on a declarative syntax
script: additionally uses code "snippets" for more complex
behaviors
program: uses a full programming language
XSL is intended to be
accessible to the "markup" level user by providing a declarative
solution to most data description and rendering requirements. Less common tasks
are accommodated through a graceful escape to a familiar scripting environment.
This approach is familiar to the Web publishing community as it is modeled
after the HTML/JavaScript environment.
The
powerful capabilities provided by XSL allow:
formatting of source elements based on ancestry/descendency,
position, and uniqueness
the creation of formatting constructs including generated text and
graphics
the definition of reusable formatting macros
writing-direction independent stylesheets
extensible
set of formatting objects
The authors then explained carefully why they had felt it
necessary to diverge from DSSSL, and described why a separate language from CSS
(Cascading Style Sheets) was thought necessary.
They then stated some design principles:
XSL should be straightforwardly usable over the Internet.
XSL should be expressed in XML syntax.
XSL should provide a declarative language to do all common
formatting tasks.
XSL should provide an "escape" into a scripting language
to accommodate more sophisticated formatting tasks and to allow for
extensibility and completeness.
XSL will be a subset of DSSSL with the proposed amendment. (As XSL
was no longer a subset of DSSSL, they cannily proposed amending DSSSL so it
would become a superset of XSL).
A mechanical mapping of a CSS stylesheet into an XSL stylesheet
should be possible.
XSL should be informed by user experience with the FOSI stylesheet
language.
The number of optional features in XSL should be kept to a minimum.
XSL stylesheets should be human-legible and reasonably clear.
The XSL design should be prepared quickly.
XSL stylesheets shall be easy to create.
Terseness in XSL markup is of minimal importance.
As a requirements statement, this doesn't rank among the
best. It doesn't read like the kind of list you get when you talk to users and
find out what they need. It's much more the kind of list designers write when
they know what they want to produce, including a few political concessions to
the people who might raise objections. But if you want to understand why XSLT
became the language it did, this list is certainly evidence of the thinking.
The language described in this first proposal contains many
of the key concepts of XSLT as it finally emerged, but the syntax is virtually
unrecognizable. It was already clear that the language should be based on
templates that handled nodes in the source document matching a defined pattern,
and that the language should be free of side-effects, to allow
"progressive rendering and handling of large documents". I'll explore
the significance of this requirement in more detail on page 34, and discuss its implications on the way stylesheets
are designed in Chapter 8. The basic idea is that if a stylesheet is expressed
as a collection of completely independent operations, each of which has no
external effect other than generating part of the output from its input (for
example, it cannot update global variables), then it becomes possible to
generate any part of the output independently if that particular part of the
input changes. Whether the XSLT language actually achieves this objective is
still an open question.
Microsoft shipped their first technology preview four months
after this proposal appeared, in January 1998.
To enable W3C to make an assessment of the proposal, Norman
Walsh produced a requirements summary, which was published in May 1998. It is
available at http://www.w3.org/TR/WD-XSLReq.
The bulk of his paper is given over to a long list of the
typographical features that the language should support, following the
tradition both before and since that the formatting side of the language gets a
lot more column inches than the transformation side. But as XSLT fans that need
not worry us: the success of standards has always been inversely proportional
to their length.
What Walsh has to say on the transformation aspects of the
language is particularly terse, and although he clearly had reasons for thinking
these features were necessary, it's a shame that he doesn't tell us why he put
these in and left others, such as sorting, grouping, and
totaling, out:
Ancestors, children, siblings,
attributes, content, disjunctions, negation, enumerations, computed select
based upon arbitrary query expressions.
Arithmetic Expressions;
arithmetic, simple boolean comparisons, boolean logic, substrings, string
concatenation.
Data Types: Scalar types, units
of measure, Flow Objects, XML Objects
Side effects: No global side
effects.
Standard Procedures: The
expression language should have a set of procedures that are built in to the
XSL language. These are still to be identified.
User Defined Functions: For
reuse. Parameterized, but not recursive.
Following this activity, the first Working Draft of XSL (not to
be confused with the Proposal) was published on 18 August 1998, and the
language started to take shape, gradually converging on the final form it took
in the 16 November 1999 Recommendation through a series of Working Drafts, each
of which made radical changes, but kept the original design principles intact.
So let's look now at the essential characteristics of XSLT
as a language.