Wattle Software - producers of XMLwriter XML editor
 Home | Site Map 
XMLwriter
 Screenshots
 Features
 About Latest Version
 Awards & Reviews
 User Comments
 Customers
Download
 Download XMLwriter
 Download Plug-ins
 Download Help Manual
 Downloading FAQ
Buy
 Buy XMLwriter
 Pricing
 Upgrading
 Sales Support
 Sales FAQ
Support
 Sales Support
 Technical Support
 Submit a Bug Report
 Feedback & Requests
 Technical FAQ
Resources
 XML Links
 XML Training
 XMLwriter User Tools
 The XML Guide
 XML Book Samples
Wattle Software
 About Us
 Contact Details
Designing Distributed Applications with XML, ASP, IE5, LDAP and MSMQ

Buy this book

BackContentsNext

Metadata in Network Applications

We've hinted at some of the uses of metadata as they might apply to our philosophy. When we discussed namespaces, we used the appearance of names from foreign namespaces as a clue to the meaning of an unknown XML vocabulary. Let's take a direct look at some of the specific ways we can use metadata to improve our cooperative applications.

Validating Documents

First and foremost, we can validate documents using schemas. This is a mixed blessing. Validation let's us enforce the rules of a vocabulary rigorously. Sometimes though, we can improve the reliability of our applications by relaxing unimportant syntactic rules. If XML Data or XML DCD come to be recommendations of the W3C, we might be able to do both. An open model as defined by XML Data or XML DCD would allow us to enforce the rules that are important to us while admitting other content in a flexible way. Each metadata effort offers some interesting features that, if adopted as both a recommendation and as an implemented feature in a parser component, could help us apply our five principles. The XML metadata world is full of tantalizing possibilities and short on fulfillment. However, at the time of writing a working group of the W3C was producing a schema for XML using an XML syntax: more information was at http://www.w3.org/XML/Activity#schema-wg.

Searching for Useful Data

Strong typing of data and content organizing features, like the group element's order attribute in the XML Schema preview, help client applications search XML documents written in an unfamiliar vocabulary for data on which they can operate. A client that operates on text strings knows to skip numeric types. A calculation-oriented application would seek out numeric types. Groups indicate some association between elements, so a client would logically view such elements as part of a whole a data series, a group of alternatives, properties of the containing parent. A client application working with metadata can make useful suggestions to a human user. If the application requires some numeric inputs, the application would present those types found in the document to the user. Based on the order attribute, the user interface could be reconfigured: a single selection list box for one, a group of mandatory inputs for seq, or a multiple selection list box for many. For example, in the XML Schema experiment at the end of this chapter, we will encounter a schema intended for building SQL queries. The schema will use the value seq with the order attribute and the enumerated values "GT GE LT LE EQ" to represent the operators " >, >=, <, <=, = " used in SQL WHERE clauses.

XML is a great advance over native data formats because it explicitly tags and labels each item of content. Metadata extends this by providing information about the structure, types, and relationships of the marked-up data. We've assumed that software clients will increasingly encounter unfamiliar vocabularies as networks grow decentralized. Once an XML metadata standard reaches recommendation status, client applications will find parsing the schema document as useful as parsing the data document.

Learning Vocabulary Structure

A validating parser will only tell us when the data we use violates the rules of the vocabulary in question. The ability to parse a metadata document and discover the structure of the vocabulary enables us to avoid errors in the first place. Generally speaking, the metadata proposals we have seen in this chapter do not result in schema documents that proceed in a top-down fashion like our typical XML data document. In fact, since we usually like to define the component parts of larger structures before defining the overall structure, schema documents will usually be organized bottom-up. Once we have a root element definition, however, we can use the metadata definition to find its components. We can then either walk the parse tree or use the XSL pattern matching syntax to extract each of those components in turn. While this may not be terribly efficient from a programming point of view, keep in mind we're learning the structure that will be applied to all documents written to the vocabulary specified in the schema.

A certain amount of digging to learn the structure of the vocabulary will mean we can operate on data in a format we've never seen before. If we are using the specialization scheme presented in the last chapter, a client receiving a vocabulary more specialized than it desired can discover the structure of the specialized parts of the new vocabulary. This is a very powerful capability.

Building Queries

A common task in our network experience will be composing queries to select data. We established the convention of using directory entries to indicate the vocabulary of our services back in Chapter Two. It is likely that the vocabulary for the queries will contain elements drawn from the response vocabulary. If, for example, we were searching for a person we would expect to provide a name for which to search. The name would certainly be part of the XML sent in response to the query. An application that understands the name element and we are assuming a client understands the vocabulary it requires could reasonably request input from the user for this element. The application, however, might not understand the vocabulary used to query the service. That vocabulary might change based on the SQL query used to implement the search. Some services might permit the submission of batches of unrelated queries while others would only accept one query based on one alternative at a time, e.g., searching by name or by age, but not both in one request.

Once a metadata recommendation is released by the W3C, we might reasonably extend our directory schema to include an entry for specifying the URL of the query schema for a particular service. A client searching for a given data vocabulary would locate a server, as we do now, but first retrieve and parse the query schema for that schema. With that in hand, a user or programmatic interface (for human and software agents, respectively) could be created dynamically. The client application would package the input data according to the query schema and transmit that in its request. This would give us considerably more flexibility than we presently have. Right now, we assume knowledge of both the response and query vocabularies. With a metadata capability, we could loosen our requirement for knowledge of the query vocabulary considerably. As we saw in the preceding section, we could also loosen the requirement to understand the response vocabulary to some extent. The degree to which we could loosen the requirement would of course depend on how much metadata is included in the schema. RDF and XML Data are at one extremely flexible end of the spectrum, with more limited proposals like XML DCD at the other.

Experiments in Metadata

It is now time to get to work and try some experiments. Using the technology preview in MSXML, let's see what we can implement. We'll first see how XML Schema can be used to validate a document, then develop the ability to extract metadata from a schema, and finally try to build a dynamic query builder at the proof of concept level.

It is well worth repeating that the metadata capabilities we're going to use are based on notes submitted to the W3C, not published recommendations. We are exploring, hoping to find the scope of future capabilities. It seems fairly certain that we will eventually see a metadata recommendation; it is certain that the syntax and capabilities of components implementing that recommendation will be a dramatic change from what we present here. Nevertheless, these experiments have value. Not only do they show us a path forward into a future in which we can communicate more effectively between clients and services, but the final metadata standards that are implemented in commercial parsers will likely be similar in spirit to what we will see here.

Validating a Document

We can control whether the parser performs validation with the validateOnParse property. If this property is true, which is the default value, the parser will perform validation when it parses a document. Errors, as we know, turn up in the parseError object property.

Using a DTD, we need to declare a DOCTYPE element declaring the DTD for the document. The technology preview, however, also allows MSXML to validate a document using an XML Schema. The xmlns name on an attribute declaring a namespace is treated specially. The parser will download the resource named by the attribute value unless the URI bears the urn preficx or a DOCTYPE element has been declared for the document. So, the resource named by this attribute must be an XML Schema file.

Retrieving Metadata from an XML Schema

Having a schema encoded as an XML document means we can use our existing tools and experience to pick the schema apart and discover its structure and content constraints. This will usually be used to guide the construction of a document written according to an unfamiliar schema. It could also be used to look for overlapping structures in two schemas. For example, if we suspect two schemas are talking about the same topic, we might compare their structures. Two group elements composed of the same number and types of elements, regardless of name, would strongly suggest similarity of the encoded content. Similar constraints found in datatype schema elements would also be clues. Of course, the dt:type attributes would be of additional assistance to us in comparing two schemas.

We're about to embark on an experiment. It is not an uncommon task to format a query for a service as an XML document. The service extracts search criteria from the query document, performs a database query, and returns the results as an XML document written in the vocabulary specified in the directory. So far, we've assumed that the users of a vocabulary would understand it and also understand the required query vocabulary. Is it possible though, to parse a query vocabulary schema document and dynamically generate a query document builder? That is precisely what we shall try in our experiment.

The Query Interface Experiment

Suppose we have a service that generates documents in the EMPLOYEE vocabulary in response to queries from clients. It might be the front end to a database of employee information. The queries are documents written to an XML schema. We'll assume the client application has obtained the URL for this schema document from the directory. For our purposes, we'll allow ourselves to directly input the URL in a Web page. When the user presses a button labeled Create Form, we want the script in the page to dynamically generate a user interface for composing a query for the service. The user interface should present us with the structure from the schema and allow us to input search criteria in the appropriate places. Once the user has finished entering values, he can click another button: Complete Query. The button handler script will compose a query according to the schema using the input parameters from the form and show us the XML in an alert box. This is our experimental page before anything is generated:

The source code for the experiment is found in the file QueryBuilder.html. It can be downloaded from our site http://www.wrox.com/ or run from http://webdev.wrox.co.uk/books/2270/.

Query Schema

We need a schema for queries against the Employee service. We'll follow a couple of loose conventions. Since schemas can contain any number of top level element definitions, we need some way of specifying the root element name in our query vocabulary. Let's follow the DTD practice of naming the schema for the root node. This convention is solely for the purposes of our experiment. If this were a production application, we could store the name of the root in the directory or ask the client application to supply it. We could also adopt a convention of simply appending the word Query to the name of the response vocabulary. A more important convention is to use the names of elements in the response vocabulary to name elements in the query vocabulary. For example, if we wish to search by the employee's name, we shall have to supply a name against which to search. The element supplying this should be called NAME since that is the matching element name in the Employee vocabulary.

For reasons of simplicity we shall go into later, let's allow batched queries. That is, we will allow the user to provide parameters for searches by name, by hire date, by manager, and by department in a single request document. If all parameters were provided, multiple queries would be generated. Here is the schema we shall use.

<Schema name="EMPLOYEEQUERY"
   xmlns="urn:schemas-microsoft-com:xml-data"
   xmlns:dt="urn:schemas-microsoft-com:datatypes">

   <ElementType name="NAME" content="textOnly"/>
	<ElementType name="BYNAME" content="eltOnly">
	<description>Search by employee name</description>
	<element type="NAME"/>
   </ElementType>

   <ElementType name="HIREDATE" content="textOnly" dt:type="date"/>
   <ElementType name="OPERATOR" content="textOnly" dt:type="enumeration"
	   	dt:values="GT GE LT LE EQ">
	<description>Comparision operator for search</description>
   </ElementType>

   <ElementType name="BYHIRE" content="eltOnly">
	<description>Search with respect to hire date specified</description>
	<element type="HIREDATE"/>
      <element type="OPERATOR"/>
   </ElementType>

   <ElementType name="MANAGER" content="textOnly"/>
      <ElementType name="BYMANAGER" content="eltOnly">
      <description>Search by employee's manager's name</description>
      <element type="MANAGER"/>
   </ElementType>

   <ElementType name="DEPARTMENT" content="textOnly"/>
   <ElementType name="BYDEPARTMENT" content="eltOnly">
      <element type="DEPARTMENT"/>
   </ElementType>

   <ElementType name="EMPLOYEEQUERY" content="eltOnly">
      <description>Query for Employee Service</description>
      <group order="many">
         <element type="BYNAME"/>
	   <element type="BYHIRE"/>
	   <element type="BYMANAGER"/>
	   <element type="BYDEPARTMENT"/>
	</group>
   </ElementType>

</Schema>

This slightly formidable schema simply says that a document rooted with the EMPLOYEEQUERY element can contain any mix of BYNAME, BYHIRE, BYMANAGER, and BYDEPARTMENT elements. Each of these provides the parameters for an appropriate search. For every search type except BYHIRE, we simply provide the string for which to search. Searching by hire date requires the additional specification of an operator to tell us whether the search should look before or after the input date.

User Interface Concerns

This schema is not what one would normally expect. One would normally specify an order attribute of one on the group so that the user would be asked to select which type of search he wished to perform. This would, however, add a substantial degree of complexity to our query form builder. In a production system, we would have to compose multiple pages or replace the form on our page as the user selected a search type. Since this is a proof of concept, we'll make the simplifying assumption of allowing any or all of the search types to be specified at once. That way, our query builder can simply generate a user input form that captures the entire structure. Note, however, this isn't exactly what many means. While the form matching this query schema will have one section for BYNAME, one for BYHIRE, and so forth, a document could have more than one of these types and be valid. Again, since we are simply exploring the feasibility of this concept, we'll leave the fine points for later work.

Given that, how shall we map schema element definitions to form elements?

Retrieving the Structure

The core of our query builder is the ability to traverse a schema document so that we start with the root element of the vocabulary and recursively retrieve the definitions of its contained elements. In this way, we learn the structure of the vocabulary from the top down. Since the definitions can be in any order, we should use the selectSingleNode() method to search the parse tree for the definitions we require. First, let's get the ElementType element corresponding to the root element of our query vocabulary in our example, EMPLOYEEQUERY:

var nameNode = schema.documentElement.attributes.getNamedItem("name");

if (nameNode != null)
{
	queryRootName = nameNode.nodeValue;
	var rootDef = schema.documentElement.selectSingleNode("//ElementType[@name='" 

+ queryRootName + "']"); if (rootDef != null) TraverseSchema(schema, rootDef, query, null, qform); else alert("ElementType named " + queryRootName + " not found. Requires root element name."); } else alert("Name not set -- cannot determine root element");

The root element of the schema file is, of course, <SCHEMA>. In the informal convention we established, the name attribute should provide the name of the root element in our query vocabulary. If the name attribute is not found, our convention definitely isn't being observed, so we fail with an error message. If it is found, the line

queryRootName = nameNode.nodeValue;

will give us the name of the root node, which in this case will be EMPLOYEEQUERY. Now we use the parser to search for the ElementType element whose name attribute matches the name we just retrieved. Since each element type definition appears exactly once, we can use selectSingleNode. If the search turns up empty, the convention isn't being followed the author of the schema provided a name that does not match an ElementType definition. However, if it returns a node, we can begin to descend through the schema parse tree.

To do this, we use the recursive TraverseSchema() function:

function TraverseSchema(schemaParser, schemaNode, docParser, docNode, qform)

{ var currentNode; switch (schemaNode.nodeName) { case "ElementType": BuildQuery(schemaParser, docParser, schemaNode, docNode, qform); currentNode = BuildDoc(docParser, schemaNode, docNode); for (var nk = 0; nk < schemaNode.childNodes.length; nk++) TraverseSchema(schemaParser, schemaNode.childNodes(nk), docParser, currentNode, qform); break; case "group": var orderType = schemaNode.attributes.getNamedItem("order"); if (orderType == null || orderType.text == "many" || orderType.text == "seq") { for (var nj = 0; nj < schemaNode.childNodes.length; nj++) TraverseSchema(schemaParser, schemaNode.childNodes.item(nj), docParser, docNode, qform); } break; case "attribute": break; case "datatype": break; case "description": BuildQuery(schemaParser, docParser, schemaNode, docNode, qform); break; case "element": var elementDef = schemaParser.documentElement.selectSingleNode( "//ElementType[@name='" + schemaNode.attributes.getNamedItem( "type").nodeValue + "']"); if (elementDef != null) TraverseSchema(schemaParser, elementDef, docParser, docNode, qform); break; case "AttributeType": nInputItemCount++; BuildQuery(schemaParser, docParser, schemaNode, null, qform); BuildDoc(docParser, schemaNode, docNode); break; } }

The switch statement provides the appropriate processing for each type of schema element we will encounter (recall we're inside the one and only Schema element). ElementType, group, and element schema elements are the only elements that provide structural information. The datatype, attribute, and AttributeType elements may provide attribute and constraint information, but that is not of interest to us at the moment.

An ElementType element will either define PCDATA or will contain element content. For this reason, we call TraverseSchema() on each child element. The same is true of group. An <element> tag's type attribute refers to a corresponding ElementType element, so we need to do a search for that element to find its definition. The name we need to search for is specified in the current node's type attribute and will be found in the target's name attribute:

var elementDef = schemaParser.documentElement.selectSingleNode("//ElementType[@name='" + 
	schemaNode.attributes.getNamedItem("type").nodeValue +
	"']");

If it is not found, there is a problem and we will be unable to go deeper into that particular subtree. Normally, however, we should find a match, in which case we submit it to TraverseSchema() to continue processing.

If you run this code in a debugger using our example schema, you will see that we properly traverse the schema. Of course, we haven't generated any output. The output we are looking for is an HTML form for eliciting user input, so let's add some code to TraverseSchema() to generate that form.

User Interface Issues

Some items are going to require input from the user. Specifically, we will need inputs whenever an element can have text or mixed content, and when an attribute is defined other than the datatype namespace attributes, e.g., dt:type. We will make some simplifying assumptions to keep the example from becoming unwieldy. We will generate an input element in the page when we encounter text and mixed content elements. If we detect an enumeration datatype, we will create a single selection listbox and populate it with the individual items in the dt:values attribute.

There is another, critical simplifying assumption. Groups present no problem if the order is seq. If the order is many, we can present a form with all the grouped elements, i.e., as if the order had been seq. This will generate a document that obeys the schema, but we cannot generate all the combinations permitted under the many order attribute value. Similarly, one presents some difficulty. To resolve these situations, we would require some input from the user. We could obtain this with dialog boxes if we implemented the query builder as a Java applet or as an ActiveX control. For our purposes, it is enough to handle seq and many the same way and omit support for one.

We are also going to omit support for AttributeType. We could handle this in a manner similar to how we will handle ElementType, so we're going to leave this as an exercise for production. The datatype element will be supported to the extent of assigning the proper dt:type attribute to elements when the definition occurs in a datatype element. The other attributes could be used in a production system to support field level validation.

Now that we've decided what we won't do, let's decide how we're going to generate the user interface for our supported elements. We've built the essential navigational features in the preceding section. We will again traverse the schema in a top-down approach, but we will create the required user input form elements and the shell of an XML document that follows the schema as we go. The shell consists of the tags for the finished query document, but not the text values that the user will provide. The shell document then becomes our guide when it comes time to retrieve user inputs. We will traverse the shell document and fill each tag with a value from a similarly named HTML input element. Here's what the finished page looks like after generating a page for the EMPLOYEEQUERY schema:

This isn't the nicest user interface page we've ever seen, but remember that it was built entirely without manual interference. The contents of this form derive entirely from the schema. If we provided the URL for a different XML schema, the form would change. Let's see how we got from simply traversing the schema document to generating an HTML form.

Generating the User Interface

This necessitates some changes to the button handler for creating the form:

var nameNode = schema.documentElement.attributes.getNamedItem("name");
if (nameNode != null)
{
queryRootName = nameNode.nodeValue; 
	var rootDef = schema.documentElement.selectSingleNode("//ElementType[@name='" + 
				queryRootName + "']");
if (rootDef != null)
	TraverseSchema(schema, rootDef, query, null, qform);
else
	alert("ElementType named " + queryRootName + " not found. Requires root 
				element name.");
}
else
	alert("Name not set -- cannot determine root element");

Notice that we have some new parameters in TraverseSchema. We pass in an instance of MSXML (in the schema variable) for use in building the shell document query and parameters for the current shell document node (null at the moment) as well as a <DIV> where we will be generating the user interface, qform. Now we turn our attention to TraverseSchema(). The actions on ElementType are typical. In addition to the schema traversal issues, we have issues related to creating HTML form elements and issues related to building the shell document. We handle these in the functions BuildQuery() and BuildDoc(), respectively.

case "ElementType":
	BuildQuery(schemaParser, docParser, schemaNode, docNode, qform);
	currentNode = BuildDoc(docParser, schemaNode, docNode);
	for (var nk = 0; nk < schemaNode.childNodes.length; nk++)
	TraverseSchema(schemaParser, schemaNode.childNodes(nk), docParser, 
			currentNode, qform);
	break;

BuildQuery() doesn't change anything in terms of the way we traverse the schema. BuildDoc(), however, is going to create new nodes in the shell document, so we will want to keep track of the current node in the shell document processing. This is handled with the currentNode variable. BuildQuery() takes both parsers, the current schema node, the current shell document node, and the <DIV> as parameters.

function BuildQuery(parserSchema, parserDoc, nodeSchema, nodeDoc, qform)
{
	var str, eltType, oEnums;
	switch (nodeSchema.nodeName)
	{
	case "ElementType":
		var eltName = nodeSchema.attributes.getNamedItem("name").nodeValue;
		qform.insertAdjacentHTML("beforeEnd", eltName);
		var contentType = nodeSchema.attributes.getNamedItem("content").text;
		switch (contentType)
		{
		case "empty":
		case "eltOnly":
		case "eltonly":
			qform.insertAdjacentHTML("beforeEnd", "<p/>");
		break;
		case "textOnly": 
		case "textonly":
		case "mixed":
			eltType = nodeSchema.attributes.getNamedItem("dt:type");
			if (eltType != null && eltType.text == "enumeration")
			{
			oEnums = nodeSchema.attributes.getNamedItem("dt:values");
			if (oEnums != null)
				PopulateEnumeration(eltName + nInputItemCount, oEnums.text,
				qform); 
			}
			else
			{
			str = "&nbsp;<input size='40' name ='" + eltName + 
			nInputItemCount + "'><p/>";
			qform.insertAdjacentHTML("beforeEnd", str);
			}
		nInputItemCount++;
	break;
	}
	break;
	case "description":
	qform.insertAdjacentHTML("beforeEnd", nodeSchema.text +
					"<p/>");

	break;
	}	

}

In all cases, we are going to write out the name of the element as a label for the user:

var eltName = nodeSchema.attributes.getNamedItem("name").nodeValue;
qform.insertAdjacentHTML("beforeEnd", eltName);
var contentType = nodeSchema.attributes.getNamedItem("content").text;

If the element is empty or contains only element content, we simply write out a paragraph HTML element to provide some formatting. Things get interesting when we reach mixed and text content elements, however. We need to provide an input element and concern ourselves with whether this is a free or an enumerated value:

		case "textOnly": 
		case "textonly":
		case "mixed":
			eltType = nodeSchema.attributes.getNamedItem("dt:type");
			if (eltType != null && eltType.text == "enumeration")
			{
			oEnums = nodeSchema.attributes.getNamedItem("dt:values");
			if (oEnums != null)
				PopulateEnumeration(eltName + nInputItemCount, oEnums.text,
				qform); 
			}
			else
			{
			str = "&nbsp;<input size='40' name ='" + eltName + 
			nInputItemCount + "'><p/>";
			qform.insertAdjacentHTML("beforeEnd", str);
			}
		nInputItemCount++; 
	break;
	}
	break;

Note the global variable nInputItemCount. We need to generate unique identifiers for all the form elements we generate. We'll need to recreate these when we go back and pick up the user's inputs, so we've come up with the following rule:

A form element's name is the name of the element with the value of nInputItemCount appended.

As long as we increment this variable when traversing the shell in the same order as when we traversed the schema, we'll be able to retrieve the values of the form elements we generate. Here's how PopulateEnumeration() works:

function PopulateEnumeration(sName, sVals, qform)
{
	var nStart, nFinish, nValsLength, sOptVal;
	// Write out the start of the list box
	str = "&nbsp;<select type='select-one' id='" + sName +
					"'>";
	if (sVals != null)
	{
	nValsLength = sVals.length;
	nStart = 0;
 	nFinish = sVals.indexOf(" ");

	while (nStart < nValsLength && nFinish != -1)
	{
	sOptVal = sVals.substring(nStart, nFinish);
	str += "<OPTION value='" + sOptVal + "'>" + sOptVal +
					"</OPTION>";
	nStart = nFinish + 1;
	nFinish = sVals.indexOf(" ", nStart);
	} 
	if (nStart < nValsLength)
	{
	sOptVal = sVals.substring(nStart, nValsLength);
	str += "<OPTION value='" + sOptVal + "'>" + sOptVal +
					"</OPTION>";
	}
	}
	str += "</SELECT><p/>";
	qform.insertAdjacentHTML("beforeEnd", str);
}

We always generate a single selection listbox type="select-one". Parsing the enumeration values creates the <OPTION> elements. We know these are delimited by spaces, so the parsing can be accomplished with the JavaScript String.indexOf() and String.substring() methods.

That takes care of handling the form building side of ElementType schema nodes. TraverseSchema() also takes care of building the shell document:

currentNode = BuildDoc(docParser, schemaNode, docNode);

BuildDoc() looks like this:

function BuildDoc(docParser, schNode, docNode)
{
var newNode;
switch(schNode.nodeName)
	{
	case "ElementType":
		var eltName = schNode.attributes.getNamedItem("name").nodeValue;
		newNode = docParser.createElement(eltName);
	if (docNode == null)
		docParser.documentElement = newNode;
	else
		docNode.appendChild(newNode);
	var contentType = schNode.attributes.getNamedItem("content").text;
	if (contentType == "mixed" || contentType == "textonly" || 
				contentType == "textOnly")
	{
	newNode.appendChild(docParser.createTextNode(""));
	}
	var eltType = schNode.attributes.getNamedItem("dt:type");
	if (eltType != null)
	{
	var newAttr = docParser.createAttribute("dt:type");
	newAttr.nodeValue = eltType.nodeValue;
	newNode.attributes.setNamedItem(newAttr);
	}
	break;
	case "AttributeType":
	break;
	}
	return newNode;
}

Whenever we encounter an ElementType node in the schema, we want to create an element in the shell document that takes its name from the name attribute of the ElementType element in the schema. If the current document node is null, we have an empty document. In that case, the newly created document is the documentElement in the DOM tree. Otherwise, the new node is appended as a child of the current node.

It is important to insert a placeholder for the user's inputs while we are generating the shell document. If we waited until later, the new PCDATA elements would continually change the number of child nodes which the mixed content nodes have. Also, if we insert the placeholder in its parent tag now, we can use this as a guide when it comes time to retrieve the user's inputs. Since both mixed content and text nodes require PCDATA at some point, we create a text node and append it to the current node without changing the current node:

newNode.appendChild(docParser.createTextNode(""));

While processing an ElementType schema element, we may also encounter a dt:type attribute. We want to add this to our shell document element to facilitate typed processing by the service that receives the query we are building.

var eltType = schNode.attributes.getNamedItem("dt:type");
if (eltType != null)
{
var newAttr = docParser.createAttribute("dt:type");
newAttr.nodeValue = eltType.nodeValue;
newNode.attributes.setNamedItem(newAttr);
}

That's the entire picture for TraverseSchema() and ElementType schema elements. There is one other small change to TraverseSchema() from the simple traversal case, and that is our support for description elements. Our labels are the element names. We hope that the schema author chose descriptive names, but we will provide one more bit of assistance to our users. By writing out the contents of the description element, the user will have a bit of text to help her try and figure out what the schema element means:

case "description":
BuildQuery(schemaParser, docParser, schemaNode, docNode, qform);
break;

In BuildQuery() we saw these lines:

case "description":
qform.insertAdjacentHTML("beforeEnd", nodeSchema.text +
		"<p/>");
	break;

Simply put, BuildQuery() retrieves the text the schema author entered and places it on a line of its own.

Generating a Query in the Vocabulary

After our user has clicked on the Create Form button, he has a user interface for providing element values, and we have a shell query document. When the user clicks the Complete Query button, we want to complete the shell document with PCDATA values we retrieve from the user interface. Our button handler simply resets the nInputItemCount variable and calls a document traversal function. After traversing the document, it displays the finished XML document in an alert box.

function OnComplete()
{
var root = query.documentElement;
nInputItemCount = 0;
	// Depends on having built a shell query document via the form builder
if (root != null)
{
// Recurse through the tree populating inputs, then display XML
TraverseForm(root);
alert(query.xml);
}
else
alert("You must have built a form prior to selecting this option.");
}

The traversal function has but two functions to perform. First, when a text element is encountered in the shell, it must go out to the user interface and retrieve the value of the corresponding HTML form element. Next, regardless of the node type encountered, it must keep the recursion going to complete the document traversal, which it does by calling itself with each child node of the current node.

function TraverseForm(node)

{
switch
(node.nodeType)

      {
     case PCDATA: 
            var str = node.parentNode.nodeName + nInputItemCount++;
                      node.nodeValue = document.all(str).value;
     break;
      }

for (var ni = 0; ni < node.childNodes.length; ni++)

      TraverseForm(node.childNodes(ni));
}

We can verify that this works by running the query builder against the schema found in employeequery.xml and providing the form inputs seen in the user interface illustration:


It isn't pretty, but it's all there.

Further Work

This proof of concept clearly omits some features that are mandatory for a production system. AttributeType schema elements must be supported, and all the attributes of datatype elements need to be handled. Some sort of mechanism for handling the order attribute of group elements should be provided. Ideally this would come in two forms: a dialog-based mechanism for interaction with human users and a programmatic approach for software clients. Better formatting of form elements would be nice. It would be useful to have some mechanism for showing human users a comparison of the known (e.g., EMPLOYEE) schema for the desired service and the unknown (e.g., EMPLOYEEQUERY) schema in order to assist them in determining what to input into the various form elements.

Some of these items will be influenced by the specific nature of the client application. Others will stem from the syntax of whatever metadata proposal reaches W3C Recommendation status. An appreciation of the issues of traversing an XML schema and mapping it into something useful to a user without manual intervention is what is important at this stage in the evolution of XML metadata.

What We Should Do Now

Metadata clearly has great potential for cooperative applications roaming a wide and loosely knit network. Metadata does the following for XML and its use in implementing our principles:

  • It makes service data self-descriptive in a rich way
  • It explains extensions to data by clarifying and describing what is in the extensions
  • It helps our services support degradation by identifying overlapping and generalized elements

Of course, as has been noted, all the proposals, save namespaces and RDF, we saw and worked with in this chapter are still W3C Notes, not finished recommendations. For this reason, we shouldn't be too quick to adopt metadata techniques into our toolkit. As we finish this chapter, it is appropriate to reflect on what we know, where metadata is going, and what we should do now to prepare our applications for its eventual inclusion. After all, our fifth principle said in part that services should support extension. Metadata not only supports this, it will be an extension of what we can do now! Here are the things we should be doing:

  • Encourage vocabulary authors to develop and document formal metadata for their work
  • Consider how you might use metadata in your organization
  • Keep abreast of metadata developmens at the W3C and software vendors

First, you should encourage vocabulary authors in your organization to develop and document formal metadata for their work. This can be in DTD or other schema form, but the important thing is to capture this knowledge while it is still explicit. Additionally, the effort will force authors to think through the ramifications of their vocabularies. This will result in better, more descriptive vocabularies. When a metadata recommendation is available, the documentation in hand can be translated into models according to the published recommendation. In the meantime, DTDs or XML syntax schemas can be used during development to validate test documents. Failing to do this may result in unintended changes to the vocabulary that will have to be supported as legacy features.

Next consider how you might use metadata in your organization. A closed intranet will have less need for metadata than an application based on an extranet or the public Internet. At most, a query building capability will be needed. Vocabulary discovery is less likely to be required; you should be able to obtain metadata information internally for important service vocabularies. Developers of applications and components that will work in a wider environment will want to consider their metadata discovery needs. A component builder will want to consider searching for useful information in unknown vocabularies. An extranet application might need to learn the structure of specialized vocabularies or search for overlap by examining the structure and content of element and attribute definitions.

Finally, architects of network applications and systems should keep abreast of developments in the W3C Metadata Activity and XML parser vendors. You will want to know the syntax and capabilities of whichever metadata model shapes the final recommendation (or recommendations; there is no guarantee the W3C won't support several in order to address various needs, and vendors may pick and choose what they will implement). You will also want to be ready with a parser or metadata tool that supports the final recommendation.

The reader is reminded to check http://www.w3.org/TR/ frequently to obtain the latest status of W3C efforts.

Summary

In this chapter we completed our examination of XML as a data transmission format by taking up the topic of XML metadata. We defined metadata as "data about data". We considered the current state of metadata by considering XML Namespaces, a working draft tangentially related to metadata proper. We looked at the future of metadata in the Web development world by examining the major submissions before the W3C Metadata Activity. These are:

  • Resource Description Format (RDF)
  • Meta Content Format (MCF)
  • XML Data
  • XML Document Content Description (DCD)

We then considered some future uses for metadata in the context of our development philosophy. Having done so, we demonstrated a prototype query builder. To do this we had to learn the syntax supported in Microsoft's XML Schema technology preview.

Metadata offers enormous power and flexibility that will help make networked applications more cooperative and robust in the face of unexpected data formats. It is a natural continuation of our use of tagged data XML for capturing our service data.


BackContentsNext
©1998 Wrox Press Limited, US and UK.

Buy this book



Select a Book

Beginning XML
Beginning XHTML
Professional XML
Professional ASP XML
Professional XML Design...
Professional XSLT...
Professional VB6 XML
Designing Distributed...
Professional Java XML...
Professional WAP

© Wattle Software 1998-2019. All rights reserved.