The Origins of SAX
The history of SAX is unusually well documented, because all
the discussion took place on the public XML-DEV mailing list, whose archives
are available at http://www.lists.ic.ac.uk/hypermail/xml-dev/.
David Megginson has also summarized its history at http://www.megginson.com/SAX/history.html.
The process started late in 1997 as a result of pressure
from XML users such as Peter Murray-Rust, who was developing XML applications
and struggling with the needless incompatibility of different parsers.
Suppliers of early XML parsers, including Tim Bray, David Megginson, and James
Clark contributed to the discussion, and many other members of the list
commented on the various drafts. David Megginson devised a process, rather in
the spirit of the original Internet "Request for Comments", whereby
comments and suggestions could be handled promptly yet fairly, and he
eventually declared the specification frozen on 11 May 1998.
One of the major reasons for the success of SAX was that
along with the initial specification, Megginson supplied front-end drivers for
several popular XML parsers, including his own Ælfred, Tim Bray's Lark, and
Microsoft's MSXML. Once SAX was established in this way, other parser writers
such as IBM, Sun, and ORACLE were quick to incorporate native SAX interfaces
into their own parsers, to enable existing applications to run with their
products.
The definitive SAX specification is written in terms of Java
interfaces. It has been adapted to other languages, though the only one we know
of that is actively supported is an interface for the Python language, produced
by Lars Marius Garshol (see http://www.stud.ifi.uio.no/~larsga/download/python/xml/saxlib.html ). Of
course, the Java interfaces can be used from other languages that interoperate
with Java, for example by using Microsoft's Java VM that interfaces Java to
COM. In this chapter, however, we'll stick to the original Java.
The Structure of SAX
SAX is structured as a number of Java interfaces.
It's very important to understand the difference between an interface and a
class:
q
An interface says what methods there
are, and what kind of parameters they expect. It is purely a specification; it
doesn't provide any code to execute when the methods are called. But it is a
concrete specification, not just a scrap of paper, and the Java compiler will
check that a class that claims to implement an interface does so correctly.
q
A class provides executable methods,
including public methods that can be called by the code in other classes.
q
A class may implement one or more
interfaces. In many cases SAX specifies several interfaces which could
theoretically be implemented by separate classes, but which in practice are
often implemented in combination by a single class. To implement an interface,
a class must supply code for each of the methods defined in the interface.
q
Several classes may implement the same
interface. Of course this is the whole point of the SAX exercise – there are
lots of implementations of the SAX Parser interface for you to choose from, and
because they all implement the same interface, your application doesn't care
which one it is using.
Some of the interfaces in SAX are implemented by classes
within the parser, and some must be implemented by classes within the
application. There are some classes supplied with SAX itself, though you don't
have to use these. And there are some classes (such as the error handling
classes), which the parser must provide, but which your application can
override if it wishes.
The Basic Structure
The components of a simple SAX application are shown in the
diagram below.

In the diagram:
q
The Application
is the "main program": the code that you write to start the whole
process off.
q
The Document Handler is code that you write to process the contents of the document.
q
The Parser is
an XML Parser that conforms to the SAX standard.
The job of the application is to create a parser (more
technically, to instantiate a class that implements the org.xml.sax.Parser
interface); to create a document handler (by instantiating a class that
implements the org.xml.sax.DocumentHandler
interface); to tell the parser what document handler to use (by calling the
parser's setDocumentHandler() method); and to
tell the parser to start processing a particular input document (by calling the
parse() method of the parser).
The job of the parser is to notify the document handler of
all the interesting things it finds in the document, such as element start tags
and end tags.
The job of the document handler is to process these
notifications to achieve whatever the application requires.
A Simple Example
Let's look at a very simple application: one that simply
counts how many <book> elements there are in
the supplied XML file (shown later).
In this example we will simplify the structure shown in the
diagram above by using the same class to act as both the application and the
document handler. The reason we can
do this is that one Java class can implement several interfaces, so it can
perform several roles at once.
The first thing the application must do is to create a
parser:
import org.xml.sax.*;
...
Parser p = new com.jclark.xml.sax.Driver();
This is the only time you need to say which particular SAX
parser you are using. We have chosen the xp
parser produced by James Clark, and available from http://www.jclark.com.
Like any other Java class you use, of course, it must be on the Java classpath.
The chosen parser must implement the SAX Parser interface org.xml.sax.Parser (if it doesn't, Java will
complain loudly), so it can be assigned to a variable of type Parser. Because of the import statement at
the top, Parser is actually a shorthand for org.xml.sax.Parser.
So you need to know the relevant class name of your chosen
parser. Oddly, many of the available SAX parsers don't advertise their parser
class name in bright lights. So here is a list of some of the more popular
parsers, with the class name you need to use to instantiate them. (Note however
that this may change with later versions of the products.)
|
Product
|
Details
|
|
Ælfred
|
from: http://www.microstar.com/aelfred.html
parser class: com.microstar.xml.SAXDriver
|
|
Datachannel DXP
|
from: http://www.datachannel.com/products/xjparser.html
parser class: com.datachannel.xml.sax.SAXDriver
|
|
IBM xml4j
|
from: http://alphaworks.ibm.com/tech/xml4j
parser class
(non-validating):
com.ibm.xml.parsers.SAXParser
parser class
(validating):
com.ibm.xml.parsers.ValidatingSAXParser
|
|
Oracle
|
from: http://www.oracle.com
(requires TechNet registration)
parser class: oracle.xml.parser.v2.SAXParser
|
|
Sun Project X
|
from: http://java.sun.com/products/xml/
parser class (non-validating):
com.sun.xml.parser.Parser
parser class (validating):
com.sun.xml.parser.ValidatingParser
|
|
xp
|
from: http://www.jclark.com/xp
parser class: com.jclark.xml.sax.Driver
|
So, you've created a parser. Now you can
start telling it what to do.
First you need to tell the parser what document handler to call
when events occur. This can be any class that implements the SAX org.xml.sax.DocumentHandler interface. The
simplest and most common approach is to make your application itself act as the
document handler.
DocumentHandler
itself is an interface defined in SAX. You could make your application program
implement this interface directly, in which case you would have to provide code
for all the different methods required by that interface. In our example,
however, we want to ignore most of the events, so it would be rather tedious to
define lots of methods that do nothing. Fortunately SAX supplies an implementation
of DocumentHandler that does nothing, HandlerBase, and we can make our application
extend this, so it inherits all the "do nothing" methods. Let's do
this:
import org.xml.sax.*;
...
public class BookCounter extends HandlerBase
{
public void
countBooks()
{
Parser p =
new com.jclark.xml.sax.Driver();
p.setDocumentHandler(this);
}
}
The call on setDocumentHandler()
tells the parser that "this" class (your application program) is to
receive notification of events. This class is an implementation of org.xml.sax.DocumentHandler, because it
inherits from org.xml.sax.HandlerBase, which in turn implements DocumentHandler.
The parser is now almost ready to go; all it needs is a
document to parse, and the Java main method
that lets it operate as a standalone program. Let's give it a file to parse
first:
import org.xml.sax.*;
...
public class BookCounter extends HandlerBase
{
public void
countBooks() throws Exception
{
Parser p =
new com.jclark.xml.sax.Driver();
p.setDocumentHandler(this);
p.parse("file:///C:/data/books.xml");
}
}
Note that the argument to parse()
is a URL, supplied as a string. We'll show you later how to supply a filename
rather than a URL. Because the program now involves data input and output we
must also add "throws Exception"
to the countBooks method to alert if there
are errors.
We need to make one more addition to get the program to run
as a standalone application: the Java main
method. In the main method we create an instance of
the class, with new BookCounter(), and then call the
object's countBooks method; we also trap
exceptions again for the new object as a whole. Our code should then look like
this:
import org.xml.sax.*;
...
public class BookCounter extends HandlerBase
{
public static
void main (String args[]) throws Exception
{
(new
BookCounter()).countBooks();
}
public
void countBooks() throws Exception
{
Parser
p = new com.jclark.xml.sax.Driver();
p.setDocumentHandler(this);
p.parse("file:///C:/data/books.xml");
}
}
The program can now be run: it will parse the document and
run to completion (assuming, of course, that the document is there to be
parsed).
The only snag is that the program currently produces no
output. To make it useful, we need to add a method that counts the <book> start tags as they are
notified, and another that prints the number of books counted at the end of the
document. These methods make use of the global variable count.
The final version of the application is shown below. You can
find it on our web site on the pages for this book at http://www.wrox.com/ in the code for this
chapter.
import org.xml.sax.*;
public class BookCounter extends HandlerBase
{
private int
count = 0;
public static
void main (String args[]) throws Exception
{
(new
BookCounter()).countBooks();
}
public void
countBooks() throws Exception
{
Parser p =
new com.jclark.xml.sax.Driver();
p.setDocumentHandler(this);
p.parse("file:///c:/data/books.xml");
}
public void
startElement(String name, AttributeList atts) throws SAXException
{
if
(name.equals("book"))
count++;
}
public void
endDocument() throws SAXException
{
System.out.println("There
are " + count + " books");
}
}
You can now run this application from the command line, with
a command of the form:
and it will print the number of <book>
elements in the supplied XML file. Suppose the file c:\data\books.xml
contains the following file (available for download with the code for the
chapter from http://www.wrox.com)
<?xml version="1.0"?>
<books>
<book
category="reference">
<author>Nigel Rees</author>
<title>Sayings of the Century</title>
<price>8.95</price>
</book>
<book
category="fiction">
<author>Evelyn Waugh</author>
<title>Sword of Honour</title>
<price>12.99</price>
</book>
<book
category="fiction">
<author>Herman Melville</author>
<title>Moby
Dick</title>
<price>8.99</price>
</book>
</books>
Then the output displayed at the terminal will be:
>java BookCounter
There are 3 books
The DocumentHandler Interface
As the example above shows, the main work in a SAX
application is done in a class that implements the DocumentHandler
interface. Usually we'll be interested in rather more of the events than in the
simple example above, so let's look at the other methods that make up this
interface.
Document Events
First, there's a pair of methods that mark the start and end
of document processing:
q startDocument()
q endDocument()
These two methods take no parameters and return no result.
In fact, you can usually get by without them, since anything you want to do at
the start can generally be done before you call parse(), and anything you want to do at the
end can be done when parse()
returns. However, in a more complex application you may want to make the
application that calls parse()
a different class from the DocumentHandler,
and in this case these two methods are useful for initializing variables and
tidying up at the end.
Note that a SAX parser (a single instance of the Parser class) should only
be used to parse one XML document at a time. Once it has finished, you can use
it again to parse another document. If you want to parse several documents
concurrently, you need to create one instance of the Parser class for each. You'll almost
certainly want to apply the same one-document-per-instance rule to a DocumentHandler, because
there's nothing in the event information that tells you what document the event
came from.
Element Events
As with document events, there is a pair of methods that is
called to mark the start and end tags of each element in the document:
q startElement(String
name, AttributeList attList)
q
endElement(String name)
The name
is the name that appears in the start and end tag of the element.
If the document uses the abbreviated syntax for an empty
element (that is, "<tag/>"),
the parser will notify both a start and end tag, exactly as if you had written
"<tag></tag>". This
is because XML defines these two constructs as equivalent, so your application
shouldn't need to know which was used.
The attributes appearing in the start tag are bundled
together into an object of class AttributeList
and handed to the application all at once. This is a departure from the
event-based model, in which you might expect each attribute to be notified as
it occurs. AttributeList is
another interface defined by SAX. It's up to the parser to define a class that
implements this interface: all the application needs to know is the methods it
can call to get details of individual attributes. The most useful one is:
q getValue(String
name)
which returns the value of the named attribute as a String, if it is present, or null if it is
absent.
One thing to remember about the AttributeList is that it's only valid for the
duration of the startElement()
method. Once your method returns control to the parser, it can (and often does)
overwrite the AttributeList with
different information. If you want to keep attribute information for later use,
you'll need to make a copy. One convenient way to do this is to use the SAX
"helper" class AttributeListImpl:
this allows you to create another AttributeList as a private copy of the one
you were given.
Character Data
Character data appearing in the XML document is usually
reported to the application using the method
q characters(char[]
chars, int start, int len)
This interface was defined for efficiency rather than
convenience. If you want to handle the character data as a String, you can
easily construct one by writing:
String s = new String(chars, start, len);
The parser could have constructed this String for you, but
creating new objects can be expensive in Java, so instead it just gives you a
pointer to its internal buffer where the characters are already held.
One advantage of using Java for XML processing is that Java
and XML both use the Unicode character set as standard. The characters passed
in the chars array are always native Java
Unicode characters, regardless of the character encoding used in the original
source document. This means you never need to worry about how the characters
were encoded.
One important point to remember is that the parser is
allowed to break up character data however it likes, and pass it to you one
piece at a time. This means that if you are looking for "gold" in
your document, the following code is wrong:
public void characters(char[] chars, int start, int
len) throws SAXException
{
String s =
new String(chars, start, len);
if
(s.indexOf("gold") >= 0) ...
}
Why? Because the string "gold" might appear in
your document, but be notified to your application in two or more calls of the characters() method. In theory, there could
be four separate calls, one for the "g", one for the "o",
one for the "l", and one for the "d".
The worst aspect of this problem is that you will probably
not discover your program is wrong during testing, because in practice parsers
very rarely split the text in this way. They might split it, for example, only
if the text happens to straddle a 4096-byte boundary (if there is some reason
the memory should happen to be limited in this way at the time), and this might
not happen until after months of successful running. Be warned.
There is one circumstance in which parsers are obliged to
split the text, and that is when external entities are used. The SAX
specification is quite explicit that a single call on characters()
may not contain text from two different external entities.
If you want to do anything with character data other than
simply copying it unconditionally to an output file, you are probably
interested in knowing what element is belongs to. Unfortunately the SAX
interface doesn't give you this information directly. If you need such
contextual information, your application will have to maintain a data structure
that retains some memory of previous events. The most common is a stack. In the
next section we will show how you can use some simple data structures both to
assemble character data supplied piecemeal by the parser, and to determine what
element it is part of.
There is a second method for reporting character data,
namely
q ignorableWhitespace(char[]
chars, int start, int len)
This interface can be used to report what the SAX
specification rather loosely refers to as "ignorable white space". If
the DTD defines an element with "element content" (that is, the
element can have children but cannot contain PCDATA), then XML permits the
child elements to be separated by spaces, tabs, and newlines, even though
"real" character data is not allowed. This white space is probably
insignificant, so a SAX application will almost invariably ignore it: which you
can do simply by having an ignorableWhitespace()
method that does nothing. The only time you might want to do anything else is
if your application is copying the data unchanged to an output file.
The XML specification allows a parser to ignore information
in the external DTD, however. A non-validating parser will not necessarily
distinguish between an element with element content and one with mixed content.
In this case the ignorable white space is likely to be reported via the
ordinary characters() interface. Unfortunately
there is no way within a SAX application of telling whether the parser is a
validating one or not, so a portable application must be prepared for either.
This is another limitation that is remedied in SAX 2.0.
Processing Instructions
There is one more kind of event that parsers report, namely
processing instructions. You probably won't meet these very often: they are the
instructions that can appear anywhere in an XML document between the symbols
"<?" and "?>". A processing instruction has a
name (called a target),
and arbitrary character data (instructions for the target application
concerned).
Processing instructions are notified to the DocumentHandler using the method:
q processingInstruction(String
name, String data)
By convention, you should ignore any processing instruction
(or copy it unchanged) unless you recognize its name.
Note that the XML declaration at the start
of a document may look like a processing instruction, but it is not a true
processing instruction, and is not reported to the application via this
interface – indeed, it is not reported at all.
Processing instructions are often written to look like
element start tags, with a sequence of keyword="value" attributes.
This syntax, however, is purely an application convention, and is not defined
by the XML standard. So SAX doesn't recognize it; the contents of the
processing instruction data are passed over in an amorphous lump.
Error Handling
We've glossed over error handling so far, but as always, it
needs careful thought in a real production application.
There are three main kinds of errors that can occur:
q
Failure to open the XML input file, or
another file that it refers to, for example the DTD or another external entity.
In this case the parser will throw an IOException (input/output exception), and it is up to your application to
handle it.
q
XML errors detected by the parser,
including well-formedness errors and validity errors. These are handled by
calling an error handler which your application can supply, as described below.
q
Errors detected by the application:
for example, an invalid date or number in an attribute. You handle these by
throwing an exception in the DocumentHandler
method that detects the error.
Handling XML errors
The SAX specification defines three levels of error
severity, based on the terminology used in the XML standard itself. These are:
|
Fatal errors
|
These usually mean the XML is not
well-formed. The parser will call the registered error handler if there is
one; if not, it will throw a SAXParseException.
In most cases a parser will stop after the first fatal error it finds.
|
|
Errors
|
These usually mean the XML is
well-formed but not valid. The parser will call the registered error handler
if there is one; if not, it will ignore the error.
|
|
Warnings
|
These mean that the XML is correct,
but there is some condition that the parser considers it useful to report.
For example this might be a violation of one of the
"interoperability" rules: input that is correct XML but not correct
SGML. The parser will call the registered error handler if there is one; if
not, it will ignore the error.
|
The application can register an error handler using the
parser's setErrorHandler() method. An error
handler contains three methods, fatalError(),
error(), and warning(),
reflecting the three different error severities. If you don't want to define
all three, you can make an error handler that inherits from HandlerBase: this contains versions of all
three methods that take the same action as if no error handler were registered.
The parameter to the error handling method, in all three
cases, is a SAXParseException object. You
probably think of Java Exceptions as things that are thrown and caught when
errors occur; but in fact an Exception is a regular Java object and can be
passed as a parameter to methods just like any other: it might never be thrown
at all. The SAXParseException contains
information about the error, including where in the source XML file it
occurred. The most common thing for an error handler method to do is to extract
this information to construct an error message, which can be written to a
suitable destination: for example, a web server log file.
The other useful thing the error handling method can do is
to throw an exception: usually, but not necessarily, the exception that the
parser supplied as a parameter. If you do this, the parse will typically be
aborted, and the top-level application will see the same exception thrown by
the parse() method. It then has another
opportunity to output diagnostics. Whether you generate a fatal error message
from within the error handler, or do it by letting the top-level application
catch the exception, is entirely up to you.
Application-Detected Errors
When your application detects an error within a DocumentHandler method (for example, a badly
formatted date), the method should throw a SAXException
containing an appropriate message to explain the problem. After this, the
parser deals with the situation exactly as if it had detected the error itself.
Typically, it doesn't attempt to catch the exception, but exits immediately
from the parse() method with the same
exception, which the top-level application can then catch.
Identifying Where the Error Occurred
When the parser detects an XML syntax error, it will supply
details of the error in a SAXParseException
object. This object will include details of the URL, line, and column where the
error occurred (a line number on its own is not much use, because the error may
be in some external entity not in the main document). When you catch the SAXParseException in your application, you
can extract this information and display it so the user can locate the error.
If the problem with the XML file is detected at application
level (for example, an invalid date), it is equally important to tell the user
where the problem was found, but this time you can't rely on the SAXParseException to locate it. Instead, SAX
defines a Locator interface. The SAX
specification doesn't insist that parsers supply a Locator,
but most parsers do.
One of the methods you must implement in a document handler is the setLocator() method. If the parser maintains
location information it will call this method to tell the document handler where to
find the Locator object. At any subsequent
time while your document handler is processing an event it can ask the Locator object for details of the current
coordinates in the source document. There are three coordinates:
q
The URL of the document or external
entity currently being processed
q
The line number within that URL
q
The column number within that line
This is of course exactly the same information
that you can get from a SAXParseException
object, and in fact one of the things you can do very easily when your
application detects an error is to throw a SAXParseException that takes the coordinates directly from the Locator object: just write:
if ( [data is not valid] )
{
throw new
SAXParseException("Invalid data", locator);
}
Why wasn't the location information simply included in the
events passed to the document handler,
such as startElement()? The reason is
efficiency: most applications only want location information if something goes
wrong, so there should be minimal overhead incurred when it is not needed.
Supplying location information with each call from the parser to the document
handler would be unnecessarily expensive.
Another Example: Using Character Data and Attributes
After this excursion into the world of error handling, let's
develop a slightly more complex example SAX application.
The task this time is for the application to print the
average price of fiction books in the catalog. We'll use the same data file (books.xml) as in our previous example.
We are interested only in those <book>
elements that have the attribute category="fiction",
and for these we are interested only in the contents of the <price> child element. We add up the
prices, count the books, and at the end divide the total price by the number of
books.
Here's our first version of the application:
import org.xml.sax.*;
public class AveragePrice extends HandlerBase
{
private int
count = 0;
private
boolean isFiction = false;
private
double totalPrice = 0.0;
private
StringBuffer content = new StringBuffer();
public void
determineAveragePrice() throws Exception
{
Parser p =
new com.jclark.xml.sax.Driver();
p.setDocumentHandler(this);
p.parse("file:///c:/data/books.xml");
}
public void
startElement(String name, AttributeList atts) throws SAXException
{
if
(name.equals("book"))
{
String
category = atts.getValue("category");
isFiction = (category!=null &&
category.equals("fiction"));
if
(isFiction) count++;
}
content.setLength(0);
}
public void
characters(char[] chars, int start, int len) throws SAXException
{
content.append(chars, start, len);
}
public void
endElement(String name) throws SAXException
{
if
(name.equals("price") && isFiction)
{
try
{
double price = new Double(content.toString()).doubleValue();
totalPrice
+= price;
}
catch
(java.lang.NumberFormatException err)
{
throw new SAXException("Price is not numeric");
}
}
content.setLength(0);
}
public void
endDocument() throws SAXException
{
System.out.println("The average price of fiction books is " +
totalPrice / count);
}
public static
void main (String args[]) throws java.lang.Exception
{
try
{
(new
AveragePrice()).determineAveragePrice();
}
catch
(SAXException err)
{
System.err.println("Parsing failed: " + err.getMessage());
}
}
}
There are three main points to note in this code:
q
The application needs to maintain one
piece of context, namely whether the current book is fiction or not. It uses an
instance variable to remember this, setting isFiction
to true when a start tag for a fiction book is encountered, and to false when a
start tag for a non-fiction book is read.
q
See how the character content is
accumulated in a Java StringBuffer and is
not actually processed until the endElement() event is notified. This kills two birds with one stone: it solves
the problem that the content of a single element might be broken up and
notified piecemeal; at the same time, it means that when we handle the data, we
know which element we are dealing with. The StringBuffer is emptied whenever a start or end tag is read, which means that
when the application gets to the end tag of a PCDATA element (one that contains
character data only) the buffer will contain the character data of that
element.
q
The application needs to do something
sensible when the price of a book is not a valid number. (Until XML Schemas
become standardized, we can't rely on the parser to do this piece of validation
for us: DTDs provide no way of restricting the data type of character data
within an element.) This condition is detected by the fact that the Java
constructor Double(String s), which converts a
String to a number, reports an exception. The relevant code catches this
exception, and reports a SAXException
describing the problem. This will cause the parsing to be terminated with an
appropriate error message.
When the code is run on our example XML file it produces the
following output:
>java AveragePrice
The average price of fiction books is 10.99
But the program isn't yet perfect.
Firstly, it can easily fail if the structure of the input
document is not as expected. For example, it will give wrong answers if the <price> element occurs other than in a
<book>, or if there is a <book> with no <price>,
or if a <price> element has its own
child elements. Such things might happen because there is no DTD, or because a
non-validating parser is used that doesn't check the DTD, or because a document
is submitted that uses a different DTD from that expected, or because the DTD
has been enhanced since the program was written.
Secondly, the diagnostics when errors are detected are
rather unfriendly. The user will be told that a price is not numeric, but there
may be hundreds of books in the list: it would be more helpful to say which
one. Even more helpful would be to report all the errors in a single run, so
that the user doesn't have to run the program once to find and correct each
separate error. (Actually, most XML parsers will only report one syntax error
in a single run, so there's a limit to what we can achieve here.)
In the next section we'll look at how to maintain more
information about element context, which is necessary if we're to do more
thorough validation. Before that, we'll make one improvement in the area of
error handling. We'll use the Locator
object to determine where in the source document the error occurred, and report
it accordingly.
In order to show what happens clearly we've switched from
James Clark's xp parser to IBM Alphaworks' xml4j, which provides clearer
messages. Here is the revised program.
This version of the application can also be found on our web
site at http://www.wrox.com
import org.xml.sax.*;
public class AveragePrice
extends HandlerBase
{
private int
count = 0;
private
boolean isFiction = false;
private
double totalPrice = 0.0;
private
StringBuffer content = new StringBuffer();
private
Locator locator;
public void
determineAveragePrice() throws Exception
{
Parser p
= new com.ibm.xml.parsers.SAXParser();
p.setDocumentHandler(this);
p.parse("file:///c:/data/books.xml");
}
public void
setDocumentLocator(Locator loc)
{
locator
= loc;
}
public void
startElement(String name, AttributeList atts) throws SAXException
{
if
(name.equals("book"))
{
String category = atts.getValue("category");
isFiction = (category!=null && category.equals("fiction"));
if
(isFiction) count++;
}
content.setLength(0);
}
public void
characters(char[] chars, int start, int len) throws SAXException
{
content.append(chars, start, len);
}
public void
endElement(String name) throws SAXException
{
if
(name.equals("price") && isFiction)
{
try
{
double price = new Double(content.toString()).doubleValue();
totalPrice += price;
}
catch (java.lang.NumberFormatException err)
{
if (locator!=null)
{
System.err.println("Error in " + locator.getSystemId() +
" at line " + locator.getLineNumber()
+
" column " +
locator.getColumnNumber());
}
throw new SAXException("Price is not numeric", err);
}
}
content.setLength(0);
}
public void
endDocument() throws SAXException
{
System.out.println("The average price of fiction books is " +
totalPrice / count);
}
public
static void main (String args[]) throws java.lang.Exception
{
try
{
(new
AveragePrice()).determineAveragePrice();
}
catch
(SAXException err)
{
System.err.println("Parsing failed: " + err.getMessage());
}
}
}
This version of the code improves the diagnostics with very
little extra effort. The revised application does three things:
q
It keeps a note of the Locator object supplied by the parser.
q
When an error occurs, it uses the Locator object to print information about the location of the error before
generating the SAXException. Note that the application
has to allow for the case where there is no Locator,
because SAX doesn't require the parser to supply one.
q<