The Rule-Based Design Pattern
An alternative way of structuring a SAX application, which
again has the objective of separating functions and keeping the structure
modular and simple, is a rule-based approach.
In general rule-based programs use an "Event-Condition-Action"
model: they contain a collection of rules of the form "if this event
occurs under these conditions, perform this action". Rule based
programming can thus be seen as a natural extension of event-based programming.
The processing model of XSL (discussed in Chapter 9) can be
seen as an example of rule-based programming. Each XSL template constitutes one
rule: the event is the processing of a node in the source document; the
condition is the pattern that controls which template is activated, and the
action is the body of the template. We can use the same concepts in a SAX
application.
The diagram below illustrates the structure of a rule-based
SAX application. The input from the XML parser is fed into a switch, which
evaluates the events against the defined conditions, and decides which actions
to invoke. The actions are then passed to processing modules each of which is
designed to perform one specific task.

There are all sorts of ways conditions and actions could be
implemented, but we'll describe a very simple implementation, where the
condition is based only on element type.
Firstly, let's write the DocumentHandler. We'll call it Switcher because its job is to switch
processing to a piece of code that handles the specific element type.
What Switcher does is
to maintain a set of rules as a Hashtable.
The set of rules is indexed by element type. The application can nominate a
class called an ElementHandler to
process a particular element type. When the parser notifies an element start tag,
the appropriate ElementHandler is located in the set of rules, and it is called
to process the start tag. At the same time, the ElementHandler is remembered on
a stack, so that the same ElementHandler can be used to process the end tag and
any character data occurring immediately within this element.
Here’s the Switcher
code:
import org.xml.sax.*;
import java.util.*;
/**
* Switcher is
a DocumentHandler that directs events to an appropriate element
* handler
based on the element type.
*/
public class Switcher extends HandlerBase
{
private
Hashtable rules = new Hashtable();
private
Stack stack = new Stack();
/**
* Define
processing for an element type.
*/
public void
setElementHandler(String name, ElementHandler handler)
{
rules.put(name, handler);
}
/**
* Start of
an element. Decide what handler to use, and call it.
*/
public void
startElement (String name, AttributeList atts) throws
SAXException
{
ElementHandler handler = (ElementHandler)rules.get(name);
stack.push(handler);
if
(handler!=null)
{
handler.startElement(name, atts);
}
}
/**
* End of an
element.
*/
public void
endElement (String name) throws SAXException
{
ElementHandler handler = (ElementHandler)stack.pop();
if
(handler!=null)
{
handler.endElement(name);
}
}
/**
* Character data.
*/
public void
characters (char[] ch, int start, int length) throws SAXException
{
ElementHandler handler = (ElementHandler)stack.peek();
if
(handler!=null)
{
handler.characters(ch, start, length);
}
}
}
An ElementHandler is rather like a DocumentHandler, but it
only ever gets to process a subset of the events: element start and end, and
character data. So although we could use a DocumentHandler here, we've defined
a special class. This serves both as a definition of the interface and as a
superclass for real element handlers: good Java coding practice might suggest
using a separate interface class, but this will do for now.
import org.xml.sax.*;
/**
* ElementHandler
is a class that process the start and end tags and
* character
data
* for one
element type. This class itself does nothing; the
* real
processing should
* be defined
in a subclass
*/
public class ElementHandler {
/**
* Start of
an element
*/
public void
startElement (String name, AttributeList atts) throws
SAXException {}
/**
* End of an
element
*/
public void
endElement (String name) throws SAXException {}
/**
* Character
data
*/
public void
characters (char[] ch, int start, int length) throws
SAXException {}
}
So far this is all completely general. We could use the Switcher and ElementHandler
classes with any kind of document, to do any kind of processing. Now let's
exploit them for a real application: we want to produce an HTML page showing
selected data from our list of books.
Here's an application that does it. We'll start with the
main control structure, What this does is to create a Switcher
and register a number of ElementHandler classes to process
particular elements in the input XML document. It then creates a Parser, nominates Switcher
as the DocumentHandler, and runs the parse.
import org.xml.sax.*;
import com.icl.saxon.ParserManager;
public class DisplayBookList
{
public static
void main (String args[]) throws Exception
{
(new
DisplayBookList()).go(args[0]);
}
public void
go(String input) throws Exception
{
Switcher s
= new Switcher();
s.setElementHandler("books", new BooklistHandler());
s.setElementHandler("book", new BookHandler());
s.setElementHandler("author", new AuthorHandler());
s.setElementHandler("title",
new TitleHandler());
s.setElementHandler("price", new PriceHandler());
s.setElementHandler("volume", new VolumeHandler());
Parser p =
ParserManager.makeParser();
p.setDocumentHandler(s);
p.parse(input);
}
//...rest of code goes in here...
}
The actual element handlers can be defined as inner classes
within the DisplayBookList class: this is useful
because it enables them to share access to data.
The ElementHandler for the outermost element, "books",
causes a skeletal HTML page to be created:
private class
BooklistHandler extends ElementHandler
{
public
void startElement(String name, AttributeList atts)
{
System.out.println("<html>");
System.out.println("<head><title>Book
List</title></head>");
System.out.println("<body><h1>A List of
Books</h1>");
System.out.println("<table>");
System.out.println("<tr><th>Author</th>");
System.out.println("<th>Title</th><th>Price</th></tr>");
}
public
void endElement(String name)
{
System.out.println("</table></body></html>");
}
}
The ElementHandler for the repeated "book" element
starts and ends a row in the generated HTML table, and initializes some
variables to hold the data:
private
String author;
private
String title;
private
String price;
private
boolean inVolume;
private class
BookHandler extends ElementHandler
{
public
void startElement(String name, AttributeList atts)
{
author
= "";
title =
"";
price =
"";
inVolume = false;
}
public
void endElement(String name)
{
System.out.println("<tr><td>" + author +
"</td>");
System.out.println("<td>" + title +
"</td>");
System.out.println("<td>" + price +
"</td></tr>");
}
}
Finally, the element handlers for the fields within the <book> element update the local
variables holding the data. We're being careless about performance here in the
interests of clarity – it would be better to use StringBuffers rather than
Strings for the variables.
private class
AuthorHandler extends ElementHandler
{
public
void characters (char[] chars, int start, int len)
{
author = author + new String(chars, start,
len);
}
}
private class
TitleHandler extends ElementHandler
{
public
void characters (char[] chars, int start, int len)
{
if
(!inVolume)
{
title = title + new String(chars, start, len);
}
}
}
private class
PriceHandler extends ElementHandler
{
public
void characters (char[] chars, int start, int len)
{
if
(!inVolume)
{
price = price + new String(chars, start, len);
}
}
}
private class
VolumeHandler extends ElementHandler
{
public
void startElement(String name, AttributeList atts)
{
inVolume = true;
}
public
void endElement(String name)
{
inVolume = false;
}
}
The flag inVolume is used
to track whether the current element is within a containing <volume> element, in which case it is
ignored. Once you've put all this together (the full code can be found in the
download for the book at http://www.wrox.com)
you can run this on a sample XML file with a command like this:
>java DisplayBookList file:///c:/data/books2.xml
The following output should then appear:
<html>
<head><title>Book List</title></head>
<body><h1>A List of Books</h1>
<table>
<tr><th>Author</th><th>Title</th><th>Price</th></tr>
<tr><td>Nigel Rees</td>
<td>Sayings of the Century</td>
<td>8.95</td></tr>
<tr><td>Evelyn Waugh</td>
<td>Sword of Honour</td>
<td>12.99</td></tr>
<tr><td>Herman Melville</td>
<td>Moby Dick</td>
<td>8.99</td></tr>
<tr><td>J. R. R. Tolkien</td>
<td>The Lord of the Rings</td>
<td>22.99</td></tr>
</table></body></html>
You can elaborate on this design pattern as much as you
like. Possible enhancements include:
q
Providing element handlers with access
to a stack containing details of their context
q
Selecting element handlers based on
conditions other than just the element name
q
Using element handlers as part of a
pipeline, by allowing them to fire events into another DocumentHandler.
The advantage of this design pattern is that it avoids a
great deal of if-then-else programming. It removes the need to change the
DocumentHandler to add conditional logic every time a new element type is
introduced. Instead all you need to do is to register another element handler.