Wattle Software - producers of XMLwriter XML editor
 Home | Site Map 
XMLwriter
 Screenshots
 Features
 About Latest Version
 Awards & Reviews
 User Comments
 Customers
Download
 Download XMLwriter
 Download Plug-ins
 Download Help Manual
 Downloading FAQ
Buy
 Buy XMLwriter
 Pricing
 Upgrading
 Sales Support
 Sales FAQ
Support
 Sales Support
 Technical Support
 Submit a Bug Report
 Feedback & Requests
 Technical FAQ
Resources
 XML Links
 XML Training
 XMLwriter User Tools
 The XML Guide
 XML Book Samples
Wattle Software
 About Us
 Contact Details
Professional XML

Buy this book

Back Contents Next

SAX 2.0

SAX 1.0 has been very widely implemented and has been in widespread use almost since the day the first draft appeared on 12 January 1998 a month earlier than the date of the final XML 1.0 recommendation. It has met user needs well, in spite of a few criticisms, some of which are hinted at in this chapter.

 

So it is perhaps unsurprising that the development of a successor, SAX 2.0, has been comparatively leisurely. Requirements were discussed on the XML-DEV mailing list during the early months of 1999, and an alpha version of a revised spec was published by David Megginson (though not widely advertised) on 1 June 1999. There has been little adverse comment, and it seems likely that the final specification of SAX 2.0 will be close to its current form, which can be found on http://www.megginson.com/SAX/SAX2/

 

Whether the specification will be widely implemented is another matter. Time will tell.

 

The way in which the original SAX interface has been extended is in itself quite interesting. A standard mechanism has been defined to allow the application to ask the parser to support particular features or to set particular properties; the parser in all cases has the option to refuse. The set of features and properties that can be requested is itself entirely open-ended. SAX2 defines a core set, but additional features and properties can be invented by anyone at any time. To make this possible, the features and properties are identified by a URI, in rather the same way as XML namespaces.

The Configurable Interface

The key new interface in SAX2 is named Configurable. A SAX2 parser must implement the org.xml.sax.Configurable interface as well as the org.xml.sax.Parser interface. The Configurable interface contains four methods:

 

getFeature(featureName)

Allows the application to ask the parser whether or not it supports a particular feature.

 

setFeature(featureName, boolean)

Allows the application to request that the parser should turn a particular feature on or off.

 

getProperty(featureName)

Allows the application to request the current value of some particular property.

 

setProperty(featureName, object)

Allows the application to set some particular property to the supplied value.

 

In each case, if the parser does not recognize the feature or property name, it must throw a SAXNotRecognizedException. This means in general that the application will not know whether the parser supports the feature or not. If the parser recognizes the name of the feature or property, but cannot set it to the requested value, it must throw a SAXNotSupportedException.

 

To make this more concrete, consider one of the new core features, whose name is http://xml.org/sax/features/validation. This feature is provided to fix the problem in SAX 1.0 whereby an application has no way of discovering or controlling whether the parser is a validating one. With SAX 2.0, if this feature is on, the parser must validate the XML document; if it is off, it must not do so (in other words, the parse must succeed so long as the document is well-formed).

 

An application that explicitly requires a validating parser may call:

 

parser.setFeature("http://xml.org/sax/features/validation", true);

 

This is a core feature, so every SAX2 parser should recognize its name. A parser that can perform validation will return normally, while a parser that cannot perform validation will throw a SAXNotSupportedException.

 

Equally, an application that explicitly requires the parser not to do validation may call:

 

parser.setFeature("http://xml.org/sax/features/validation", false);

 

This time, a parser that insists on doing validation must respond to this request with a SAXNotSupportedException.

 

On the other hand, an application that simply wants to know whether the parser is performing validation or not may call:

 

if (parser.getFeature("http://xml.org/sax/features/validation")) ...

 

Core Features and Properties

The following core features and properties are defined in SAX2. A feature is simply shorthand for a property whose value is a boolean.

 

Name
(prefixed
http://xml.org/sax)

Value

Meaning

/features/validation

boolean

Perform validation

/features/external-general-entities

boolean

 

Expand general (i.e. parsed) external entities

/features/external-parameter-entities

boolean

 

Expand the external DTD subset and external parameter entities

/features/namespaces

boolean

 

Process namespace declarations. Element and attribute names with a prefix will have the prefix replaced by the URI of the namespace

/features/normalize-text

boolean

 

Normalize character data, by ensuring that all consecutive pieces of character data are passed in a single call of the characters() method

/features/use-locator

boolean

 

Supply the application with a Locator object by calling the setDocumentLocator() method

/properties/namespace-sep

String

Separator to be used between the URI and the local part of a name when the namespaces feature is enabled

/properties/dom-node

org.w3c.dom.Node

Read-only property: if the DOM for the source document exists in memory, this property identifies the DOM node relating to the current event

/properties/xml-string

String

Read-only property: a character string giving the XML representation of the current event.

/handlers/DeclHandler

org.xml.sax.misc.
DeclHandler

Set a handler to process element and attribute declarations encountered in the DTD

/handlers/LexicalHandler

org.xml.sax.misc.
LexicalHandler

Set a handler to process lexical events. These include CDATA sections, entities, and comments

/handlers/NamespaceHandler

org.xml.sax.misc.
NamespaceHandler

Set a handler to process namespace declarations

 

The core properties in SAX2 thus include three new event-handling interfaces: features, properties, and handlers. (Remember, however, that "core" simply means every parser must recognize a request for these features, it still has the right to refuse the request.)

 

The declaration handler, DeclHandler, meets the requirement for access to the structural definitions in the DTD. It provides access to element declarations in the simplest possible way, as a string that the application must parse.

 

The lexical handler, LexicalHandler, meets the requirement for access to information that was suppressed in SAX 1.0 because it was considered to be of no interest to applications. This includes the boundaries of internal entities, the boundaries of CDATA sections, and the existence of comments. Many application writers asked for these features because they enable the application to minimize the changes made to a document as it is being copied. Comments are needed for other reasons as well: for example, the XSLT recommendation allows a stylesheet to say what should happen to comments in the source document, so an XSLT interpreter written using the SAX interface needs access to this information.

 

The namespace handler, NamespaceHandler, meets more advanced namespace handling requirements than the namespaces feature. Whereas the namespaces feature simply expands element and attribute prefixes using the namespace definitions currently in force, a namespace handler allows the namespace definitions themselves to be processed as events in their own right. This is useful in several circumstances:

 

q     Where the application uses prefixes in contexts other than element and attribute names (for example, it might use them in attribute values)

q     Where the application needs to know the prefix that was used (for example, for use in error messages, or in attempting to copy parts of the original document)

 

As remarked earlier, the SAX 2.0 specification cannot yet be regarded as stable, so even if you find a parser that supports it, use it with care.

Summary

We've presented some information about the origins of the SAX interface, which is implemented by a wide variety of parsers.

 

The thing that characterizes SAX, and that distinguishes it from the DOM interface, is that it is event-based. We discussed some of the factors that might cause you to use an event-based interface in preference to the DOM.

 

We discussed the structure of a simple SAX application, and the relationship of the three main classes, the application, the parser, and the document handler. We showed several examples of how to write SAX applications using these classes.

 

We presented some of the important design patterns for SAX applications, in particular, the filter or pipeline pattern, and the rule-based pattern.

 

Finally, we gave a preview of the features that are expected to appear in SAX 2.0 when it stabilizes.

 

We should end with a word of caution. All the examples shown in this chapter could be coded much more easily in XSLT, which we will discuss in Chapter 7. Of course that doesn't mean there is no need for SAX: Java applications can do many things that XSL stylesheets can't for example, loading data into a relational database; and they will usually be much faster. But it's worth thinking twice about your problem before you rush into assuming that SAX is the answer, because in many cases an XSL approach, or a hybrid approach using XSL for preprocessing, may be preferable.

 

 


BackContentsNext
©1999 Wrox Press Limited, US and UK.

Buy this book



Select a Book

Beginning XML
Beginning XHTML
Professional XML
Professional ASP XML
Professional XML Design...
Professional XSLT...
Professional VB6 XML
Designing Distributed...
Professional Java XML...
Professional WAP

© Wattle Software 1998-2019. All rights reserved.