(Post 13/12/2005)
Some of the exciting new features of the Java 2 Platform, Standard Edition
(J2SE) 5.0 release, code-named Tiger, are the added XML validation package
at javax.xml.validation and the XPath libraries at javax.xml.xpath .
Before the Tiger release, the Java API for XML Processing (JAXP) SAXParser
or DocumentBuilder
classes were the primary instruments of Java technology XML validation.
The new Validation API, however, decouples the validation of an XML document
from the parsing of the document. Among other things, this allows Java
technology to support multiple schema languages. Let's take a closer look
at XML validation first.
XML Validation
The simplest way to validate an XML document is to use
a Validator object. This object will perform a validation
against the Schema object from which the Validator
was created. Schema objects are typically created from SchemaFactory
objects. The static newInstance() object allows you to create
a SchemaFactory using a preset XML schema. The following
code demonstrates this:
SchemaFactory factory =
SchemaFactory.newInstance(
XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema =
factory.newSchema(new File("mySchema.xsd"));
Validator validator = schema.newValidator();
|
Calling the validate() method on the Validator
object performs the actual validation. This method takes at least a javax.xml.transform.Source
object, of which you can use a SAXSource or a DOMSource ,
depending on your preference.
DocumentBuilder parser =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document document =
parser.parse(new File("myXMLDocument.xml"));
validator.validate(new DOMSource(document));
|
Here is a simple source example that shows how to validate
an XML document using a World
Wide Web Consortium (W3C) XML Schema, sometimes referred to as WXS.
try {
// Parse an XML document into a DOM tree.
DocumentBuilder parser =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document document =
parser.parse(new File("myXMLDocument.xml"));
//Create a SchemaFactory capable of understanding
// WXS schemas.
SchemaFactory factory =
SchemaFactory.newInstance(
XMLConstants.W3C_XML_SCHEMA_NS_URI);
// Load a WXS schema, represented by a Schema instance.
Source schemaFile =
new StreamSource(new File("mySchema.xsd"));
Schema schema = factory.newSchema(schemaFile);
// Create a Validator object, which can be used to validate
// an instance document.
Validator validator = schema.newValidator();
// Validate the DOM tree.
validator.validate(new DOMSource(document));
} catch (ParserConfigurationException e) {
// exception handling
} catch (SAXException e) {
// exception handling - document not valid!
} catch (IOException e) {
// exception handling
}
|
Note that the newInstance() method takes
in a constant to indicate which type of schema it can expect. Currently,
the only schema that is required is the W3C XML Schema. This is an object-oriented
schema language that provides a type system for constraining the character
data of an XML document. WXS is maintained by the W3C and is a W3C Recommendation
(that is, a ratified W3C standard specification).
Let's run this source code on the following XML file:
<?xml version="1.0"?>
<birthdate>
<month>January</month>
<day>21</day>
<year>1983</year>
</birthdate>
|
In addition, let's include the following W3C XML Schema
document as our XML validation schema:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:import namespace="http://www.w3.org/XML/1998/namespace"
schemaLocation="http://www.w3.org/2001/xml.xsd" />
<xs:element name="birthdate">
<xs:complexType>
<xs:sequence>
<xs:element name="month" type="xs:string" />
<xs:element name="day" type="xs:int" />
<xs:element name="year" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
|
If the validation is successful, the program will run
without incident. However, let's insert a spelling error on the month
element:
At this point, the Validator will throw
a SAXException , the first few lines of which are shown here:
ERROR: 'cvc-complex-type.2.4.a: Invalid content was
found starting with element 'amonth'. One of '{"":month}'
is expected.'
org.xml.sax.SAXParseException: cvc-complex-type.2.4.a:
Invalid content was found starting with element 'amonth'.
One of '{"":month}' is expected.
At ...(Util.java:109)
at ...(ErrorHandlerAdaptor.java:104)
...
|
Understanding XML Schema
All implementations of SchemaFactory are
required to support the W3C XML Schema. If you're not familiar with W3C
XML Schema, here's a quick summary.
XML schemas contain definitions that are either simple
or complex types. At the highest level, a complex type contains
other elements, while a simple type does not. (These types differ
in other ways as well, but this article will not attempt to explain all
the differences.) As an example, let's create a schema that defines a
fullname element that must consist of a firstname ,
a middlename , and a lastname element, in that
order.
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="fullname">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="middlename" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
|
At the root of every XML schema is, appropriately, a
schema element. The schema declaration above includes an
xmlns attribute that indicates that the elements and data
types used in the schema come from the "http://www.w3.org/2001/XMLSchema"
namespace .
Elements and Attributes With Simple Types
Elements and attributes with simple types do
<xs:element name="name" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
<xs:element name="birthdate" type="xs:date"/>
|
not declare
other elements or attributes inside them. Instead, they declare only "text"
of several different types. This can be one of the types included in the
XML schema definition, or it can be a custom type that you can define
yourself. You can also add restrictions to a data type in order to limit
its content, and you can require the data to match a defined pattern.
Here are some examples of simple elements:
These are some of the more common data types used with
XML schema:
xs:string
xs:decimal
xs:integer
xs:boolean
xs:date
xs:time
Simple elements can also have a default value or a fixed
value set. A default value is automatically assigned to the element when
no other value is specified. A fixed value is also automatically assigned
to the element and cannot be overridden. For example:
<xs:element name="firstname" type="xs:string"
default="joe"/>
<xs:element name="firstname" type="xs:string"
fixed="unknown"/>
|
Much as you would define an element, you can define
attributes in XML schema using the name, type, default, and fixed modifiers.
Attributes are optional by default, but you can employ the use
attribute to require their presence.
<xs:attribute name="lang" type="xs:string"
use="optional"/>
<xs:attribute name="lang" type="xs:string"
use="required"/>
|
Elements With Complex Types
A complex element is an XML element that contains
other elements and attributes. Look at this complex XML element, fullname ,
which contains only other elements, firstname , middlename ,
and lastname :
<fullname>
<firstname>Robert</firstname>
<middlename>Franklin</middlename>
<lastname>Collins</lastname>
</fullname>
|
You can define this using XML schema in a couple of
ways. First, the fullname element can be declared directly
by naming the element, as shown below. Notice that the child elements
-- firstname , middlename , and lastname
-- are surrounded by the sequence indicator. This means that
the child elements must appear in the same order as they are declared:
firstname first, middlename second, and lastname
third.
<xs:element name="fullname">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="middlename" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
|
Second, we can have the fullname element
use an attribute called type , which refers to the name of
another complex type to use. Here, we've essentially made the complex
type stand on its own, and we're referencing it from within the fullname
element:
<xs:element name="fullname" type="personinfo"/>
<xs:complexType name="personinfo">
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="middlename" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
|
The benefit here is that several elements can refer
to the same complex type. You can also base a complex type element on
an existing complex type and add some elements using an extension, like
this:
<xs:element name="contact" type="fullpersoninfo"/>
<xs:complexType name="personinfo">
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="middlename" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="fullpersoninfo">
<xs:complexContent>
<xs:extension base="personinfo">
<xs:sequence>
<xs:element name="address" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="country" type="xs:string"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
|
W3C XML Schema has other useful features, as Table 1
indicates.
Table 1. Indicators
Indicator |
Function |
all |
Specifies that each of the child elements must appear once but
can appear in any order |
choice |
Specifies that any one of the alternatives can occur |
sequence |
Specifies that the child elements must appear in a specific order |
@maxOccurs |
Specifies the maximum number of times an element can occur |
@minOccurs |
Specifies the minimum number of times an element can occur |
group |
Used to define related sets of elements |
attributeGroup |
Used to define related sets of attributes |
This is just a brief introduction to the features of
XML schema, and it only touches on advanced features such as restrictions
and extensions. For more information, see the home
page for the W3C XML Schema.
Evaluating XPath Expressions in JDK 1.5
Another new package added to the XML arsenal of the
JDK 1.5 release is java.xml.xpath . This package provides
an API for evaluating expressions based on the XML
Path Language (XPath) version 1.0. XPath allows you to select nodes
from an XML document object model (DOM) tree. XPath also provides rules
for converting a node to a boolean, double, or string value. The Javadocs
offer more information: "XPath started in life in 1999 as a supplement
to the XSLT and XPointer languages, but has more recently become popular
as a stand-alone language, as a single XPath expression can be used to
replace many lines of DOM API code."
Getting to Know XPath
Let's quickly look at XPath expressions and at how they
are useful. The following is an example of a simple XPath expression:
This is known as a location path. This would select
all author elements that are the children of a book
element, where book is a child of the current context node.
For example, if the current context node is the library element,
then using the XPath expression book/author would select
both author elements below:
<library>
<book>
<author name="Author A"/>
<author name="Author B"/>
</book>
</library>
|
The context node can be any node inside an XML DOM tree,
including the root node.
Note that author must be a direct child
of book . A special location path operator, // ,
selects nodes at any depth in an XML document below the context node.
For example, the following selects all author elements below
the context node:
Table 2 lists some other useful XPath operators.
Table 2. Some XPath Operators
Location Path |
Description |
../author |
Selects all author elements that are
the children of the context node's parent |
* |
Selects all child elements of the context node |
*/author |
Selects all author element grandchildren
of the current context node |
/book/author |
Selects all author elements that are
children of book elements, which are in turn children
of the root node of the document |
./book/author |
Selects all author elements that are
children of book elements, which are in turn children
of the current context node |
In addition to elements, XPath location paths may also
target attributes, text, comments, and processing-instruction nodes inside
a DOM tree. Table 3 gives some usage examples.
Table 3. Usage Examples for XPath Location Paths
Location Path |
Description |
author/@name |
Selects the attribute name of the author
element. |
author/node() |
Selects any type of node (text, comment, or processing
instruction). |
author/text() |
Selects the text nodes of the author
element. No distinction is made between escaped and nonescaped character
data. |
author/comment() |
Selects all comment nodes contained in the author
element. |
author/processing-instruction() |
Selects all processing-instruction nodes contained
in the author element. |
Xpath predicates also allow for refining the nodes selected
by an XPath location path. Predicates take the form [expression] .
The following example selects all foo elements that contain
an include attribute with the value of true :
Predicates may be appended to each other to further
refine an expression, for example:
//foo[@include='true'][@class='bar']
|
Using the XPath API
The following example demonstrates using the XPath API
to select at least one node from an XML document:
XPath xpath = XPathFactory.newInstance().newXPath();
String expression = "/birthdate/year";
InputSource inputSource =
new InputSource("myXMLDocument.xml");
NodeSet nodes = (NodeSet) xpath.evaluate(expression,
inputSource, XpathConstants.NODESET);
|
Note that the XPath API allows the selected nodes to
be converted to other data types, including Boolean, Number,
and String objects. The return type is specified by a QName
parameter in the method call used to evaluate the expression, which is
either a call to XPathExpression.evaluate() , as shown above
(the third parameter), to one of the XPath.evaluate() convenience
methods. The allowed QName values are specified as constants
in the XPathConstants
class:
When a Boolean return type is requested,
Boolean.TRUE is returned if one or more nodes was selected.
Otherwise, Boolean.FALSE is returned. The String
return type is a convenience for retrieving the character data from a
text node, attribute node, comment node, or processing-instruction node.
When used on an element node, the value of the descendant text nodes is
returned. Finally, the Number return type attempts to coalesce
the text of a node into a double data type.
For the XML document presented at the beginning of this
article, you can use the following XPath API code to select the year
element as a node, a string, and a number:
try {
// Parse the XML as a W3C document.
DocumentBuilder builder =
DocumentBuilderFactory.newInstance()
.newDocumentBuilder();
Document document =
builder.parse(new File("myXMLDocument.xml"));
XPath xpath = XPathFactory.newInstance().newXPath();
String expression = "/birthdate/year";
// First, obtain the element as a node.
Node birthdateNode = (Node)
xpath.evaluate(expression, document,
XPathConstants.NODE);
System.out.println("Node is: " + birthdateNode);
// Next, obtain the element as a String.
String birthdateString = (String)
xpath.evaluate(expression, document,
XPathConstants.STRING);
System.out.println("String is: " + birthdateString);
// Finally, obtain the element as a Number (Double).
Double birthdateDouble = (Double)
xpath.evaluate(expression, document,
XPathConstants.NUMBER);
System.out.println("Double is: " + birthdateDouble);
} catch (ParserConfigurationException e) {
System.err.println(
"ParserConfigurationException caught...");
e.printStackTrace();
} catch (XPathExpressionException e) {
System.err.println(
"XPathExpressionException caught...");
e.printStackTrace();
} catch (SAXException e) {
System.err.println(
"SAXException caught...");
e.printStackTrace();
} catch (IOException e) {
System.err.println(
"IOException caught...");
e.printStackTrace();
}
|
When you run the example, you should see the following
output:
Node is: [year: null]
String is: 1983
Double is: 1983.0
|
Source Code
You can download the source code for these examples
here.
A NetBeans IDE project file is included as part of the source code.
For More Information
(theo java.sun.com) |