(Post 30/06/2006) Validating an XML document
against an untrusted schema could have serious consequences, as validation
may modify the actual data by adding default attributes and possibly corrupting
the data. Validation against an untrusted schema may also mean that an
incoming instance document might not conform to your business's constraints
or rules.
Validate DOM in memory
The Transformation APIs also allow a transformed result
to be obtained as a DOM object. The DOM object in memory can be validated
against a schema. This can be done as follows:
DOMResult dr = new DOMResult();
t.transform(xml , dr);
DOMSource ds = new DOMSource();
schema.newValidator().validate(ds(dr.getNode())); |
So you see that the Validation APIs can be used with
the Transformation APIs to do complex things easily. This approach also
boosts performance because it avoids the step of parsing the XML again
when validating a transformed XML document.
Validate
a JDOM Document
The ValidatorHandler
can be used to validate various object models such as JDOM against the schema(s). In
fact, any object model (XOM ,
DOM4J , and
so on) that can be built on top of a SAX
stream or can emit SAX
events can be used with the Schema Validation Framework to validate an
XML document against a schema. This is possible because ValidationHandler can validate
a SAX stream.
Let's see how a JDOM
document can be validated against schema(s):
SAXOutputter so = new SAXOutputter(vh);
so.output(jdomDocument); |
It is that simple. JDOM
has a way to output a JDOM
document as a stream of SAX
events. SAXOutputter
fires SAX
events that are validated by ValidatorHandler .
Any error encountered is reported through ErrorHandler
set on ValidatorHandler .
Obtain Schema
Type Information
ValidatorHandler
can give access to TypeInfoProvider ,
which can be queried to access the type information determined by the
validator. This object is dynamic in nature and returns the type information
of the current element or attribute assessed by the ValidationHandler during validation of the XML document.
This interface allows an application to know three things:
·
Whether the attribute is declared as an ID type
·
Whether the attribute was declared in the original XML document or was
added by Validator
during validation
·
What type information of the element or attribute as declared in the schema
is associated with the document
Type information is returned as an org.w3c.dom.TypeInfo
object, which is defined as part of DOM L3. The TypeInfo
object returned is immutable, and the caller can keep references to the
obtained TypeInfo object longer than the callback scope. The methods of
this interface may only be called by the startElement
event of the ContentHandler
that the application sets on the ValidatorHandler .
For example, look at the section of the code below. (Note: For clarity,
only part of the code is shown here. For the complete source, look at
the sample SchemaTypeInformation.java,
which can be downloaded from here.)
ValidatorHandler vh = schema.newValidatorHandler();
vh.setErrorHandler(eh);
vh.setContentHandler(new MyContentHandler(
vh.getTypeInfoProvider()));
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
XMLReader reader = spf.newSAXParser().getXMLReader();
reader.setContentHandler(vh);
reader.parse(<XML Document>); |
Ensure Data Security
Validating an XML document against an untrusted schema
could have serious consequences, as validation may modify the actual data
by adding default attributes and possibly corrupting the data. Validation
against an untrusted schema may also mean that an incoming instance document
might not conform to your business's constraints or rules.
With the new Validation APIs, getting a Schema
instance is the first step before being able to validate an instance document,
and it is the application that determines how to create the Schema instance. Validation using
the Schema
instance makes sure that an incoming instance document is not validated
against any other (untrusted) schema(s) but only against the schema(s)
from which the instance is created. If the instance XML document has elements
or attributes that refer to schema(s) from a different targetNamespace
and are not part of javax.xml.validation.Schema
representation, an error will be thrown. This approach protects you from
accidental mistakes and malicious documents.
Reusing a Parser Instance
Is it possible to use the same parser instance to parse
multiple XML documents? This was not clear, and the behavior was implementation
dependent. JAXP 1.3 has added the new function reset()
on SAXParser ,
DocumentBuilder , and Transformer . This guarantees that
the same instance can be reused. The reset
function improves the overall performance by saving resources, time associated
with creating memory instances, and garbage collection time. Let's see
how the reset() function can be used.
SAXParserFactory spf = SAXParserFactory.newInstance() ;
spf.setSchema(schema);
SAXParser saxParser = spf.newSAXParser();
for(int i = 0 ; i < n ; i++){
saxParser.parse(new File(args[i]), myHandler);
saxParser.reset(); } |
The same function has also been added to newly designed
javax.xml.validation.Validator ,
as well as to javax.xml.xpath.XPath .
Applications are encouraged to reuse the parser ,
transformer ,
validator and XPath instance by
calling reset()
when processing multiple XML documents. Note that reset() sets the instance back to factory settings.
XPath Support
Accessing XML is made simple using XPath: A single XPath
expression can be used to replace many lines of DOM
API code. JAXP 1.3 has defined XPath
APIs that conform to the XPath 1.0 specification and provide object-model-neutral
APIs for the evaluation of XPath expressions and access to the evaluation
environment. Though current APIs conform to XPath 1.0, the APIs have been
designed with future XPath 2.0 support in mind.
To use JAXP 1.3 XPath APIs, the first step is to get
the instance of XPathFactory . Though the default
model is W3C DOM ,
it can be changed by specifying the object model URI:
XPathFactory factory = XPathFactory.newInstance();
XPathFactory factory = XpathFactory.newInstance(
<OBJECT MODEL URI>); |
Evaluate the
XPath Expression
XpathFactory
is used to create XPath
objects. The XPath
interface provides access to the XPath evaluation environment and expressions.
XPath has overloaded the evaluate() function, which can
return the result by evaluating an XPath expression based on the return
type set by the application. For example, look at the following XML document:
<Books>
<Book>
<Author> Author1 </Author>
<Name> Name1 </Name>
<ISBN> ISBN1 </ISBN>
</Book>
<Book>
<Author> Author2 </Author>
<Name> Name2 </Name>
<ISBN> ISBN2 </ISBN>
</Book>
</Books> |
Following is the working code to evaluate the XPath expression
and print the contents of all the Book
elements in the XML document:
XPath xpath = XpathFactory.newInstance().newXPath();
String expression = "/Books/Book/Name/text()";
NodeSet nameNodes = (NodeSet) xpath.evaluate(expression, new
InputSource("Books.xml"), XpathConstants.NODESET);
//print all the names of the books
for(int i = 0 ; i < result.getLength(); i++){
System.out.println("Book name " + (i+1) + " is " +
result.item(i).getNodeValue());
} |
Evaluate
With Context Specified
XPath
is also capable of evaluating an expression based on the context set by
the application. The following example sets the Document node as the context for evaluation:
DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();
DocumentBuilder db =
dbf.newDocumentBuilder();
Document d = db.parse(new File("Books.xml"));
XPath xpath = XPathFactory.newInstance().newXPath();
String exp = "/Books/Book";
NodeSet books = (NodeSet) xpath.evaluate(exp,
d,XpathConstants.NODESET); |
With a reference to a Book
element, a relative XPath expression can now be written to select the
Name element as follows:
String expression = "Name";
Node book = xpath.evaluate(exp, books.item(0)
, XpathConstants.NODE); |
NamespaceContext
XPath Evaluation
What happens if the XML document is namespace aware?
Look at the following XML document, in which the first Book
element is in the publisher1
domain and the second in the publisher2
domain:
<Books >
<Book xmlns="www.publisher1.com">
<Author>Author1</Author>
<Name>Name1</Name>
<ISBN>ISBN1</ISBN>
</Book>
<Book xmlns="www.publisher2.com">
<Author>Author2</Author>
<Name>Name2</Name>
<ISBN>ISBN2</ISBN>
<Cover>Hard</Cover>
</Book>
</Books> |
In this case, the XPath expression /Books/Book/Name/text()
won't give any result because the expression is not fully
qualified. You can use an expression such as /Books/p1:Book/p1:Name with a p1 prefix. However, you should
set NamespaceContext
on the XPath
instance so that the p1
prefix can be resolved. In the following sample, the NamespaceContext capable of resolving
p1 is set
on the XPath
instance. Note that the two Book
elements are in different namespaces, so the expression would result in
only one node.
XPath xpath = XpathFactory.newInstance().newXPath();
String exp = "/Books/p1:Book/p1:Name" ;
xpath.setNamespaceContext(new MyNamespaceContext());
InputSource is = new InputSource("Books.xml");
NodeSet nn = (NodeSet)xpath.evaluate(exp,
is, XpathConstants.NODESET);
// Print the count.
System.out.println("Node count = " + nn.getLength()); |
XPathVariableResolver
The XPath specification allows variables to be used in
the XPath expressions. XPathVariableResolver
is defined to provide access to the set of user-defined XPath variables.
Here is an example of an XPath expression using Variable :
String exp = "/Books/j:Book[j:Name=$bookName]";
xpath.setXPathVariableResolver(
new SimpleXPathVariableResolver());
InputSource is = new InputSource("Books.xml");
Node n = (Node) xpath.evaluate(exp,
is, XPathConstants.NODE);
System.out.println("Node name is " + n.getNodeName()); |
A SimpleXPathVariableResolver
can implement the resolveVariable()
function as follows. (Note: For clarity, only the relevant code is shown
here.)
public Object resolveVariable(javax.xml.namespace.QName qName)
{
if(qName.getLocalPart().equals("bookName"))
return "Name1";
....
}
} |
XML Schema Data Types
JAXP 1.3 has introduced new data types in the Java platform,
the javax.xml.datatypes package, that
directly map to some of the XML schema data types, thus bringing XML schema
data type support directly into the Java platform.
The DatatypeFactory
has functions to create different types of data types -- for example,
xs:data , xs:dateTime , xs:duration , and so on. The javax.xml.datatype.XMLGregorianCalendar
takes care of many W3C XML Schema 1.0 date and time data types, specifically,
dateTime , time , date , gYearMonth , gMonthDay , gYear gMonth , and gDay defined in this XML namespace:
http://www.w3.org/2001/XMLSchema |
These data types are normatively defined in W3C XML Schema
1.0, Part 2, Section 3.2.7-14.
The data type javax.xml.validation.Duration
is an immutable representation of a time span as defined in the W3C XML
Schema 1.0 specification. A Duration
object represents a period of Gregorian time, which consists of six fields
(years, months, days, hours, minutes, and seconds) as well as a sign field
(+ or -).
Table 1 shows the mapping of XML schema data types to
Java platform data types. Table 2 shows the mapping of XPath data types
and Java Platform data types.
Table 2. XPath and Java
Platform Data Type Mapping |
XPath Data Type |
Java Platform Data
Type |
xdt:dayTimeDuration
|
Duration
|
xdt:yearMonthDuration
|
Duration
|
These data types have a rich set of functions introduced
to perform basic operations over data types, for example, addition, subtraction,
and multiplication.
Also, there are ways to get the lexicalRepresentation
of a particular data type that is defined at XML Schema
1.0, Part 2, Section 3.2.[7-14].1, Lexical Representation. There is
no need to understand the complexities of XML schema data types such as
what types of operations are allowed on a data type, how to write a lexical
representation, and so on. The javax.xml.datatype
APIs have defined a rich set of functions to make it easy for you.
XInclude Support
JAXP 1.3 has also defined the support for XInclude. SAXParserFactory/DocumentBuilderFactory
should be configured to make it XInclude aware. Do this by setting setXIncludeAware() to true .
Security Enhancements
JAXP 1.3 has defined a security feature:
http://javax.xml.XMLConstants/feature/secure-processing |
When set to true ,
this operates the parser in secure manner and instructs the implementation
to process XML securely and avoid conditions such as denial-of-service
attacks. Examples include restricting the number of entities that can
be expanded, the number of attributes an element can have, and the XML
schema constructs that would consume large amounts of resources, such
as large values for minOccurs and maxOccurs . If XML processing is
limited for security reasons, it will be reported by a call to the registered
ErrorHandler.fatalError() .
Summary
This article has introduced you to some of the new features
in JAXP 1.3. You have seen the benefits of the Schema Validation Framework
and seen how it can be used to improve the performance of schema validation.
Developers working with applications using JAXP 1.2 schema properties
to validate XML document against schemas should upgrade to JAXP 1.3 and
use this framework. Remember to reuse the parser instance by calling the
reset() method
to improve performance.
New object-model-neutral XPath APIs bring
XPath support and can work with different object models. XML schema data
type support is brought directly into the Java platform with the introduction
of new data types. Security features introduced in JAXP 1.3 can help protect
the application from denial-of-service attacks. Also, JAXP 1.3 provides
complete support for the latest standards: XML 1.1, DOM L3, XInclude,
and SAX 2.0.2. These are enough reasons to upgrade to JAXP 1.3, and the
implementation is available for downloading from java.net
Neeraj Bajaj |