Easy and Efficient XML Processing: Upgrade to JAXP 1.3

Easy and Efficient XML Processing: Upgrade to JAXP 1.3 - Part 3

(Post 30/06/2006) Validating an XML document against an untrusted schema could have serious consequences, as validation may modify the actual data by adding default attributes and possibly corrupting the data. Validation against an untrusted schema may also mean that an incoming instance document might not conform to your business's constraints or rules.

Validate DOM in memory

The Transformation APIs also allow a transformed result to be obtained as a DOM object. The DOM object in memory can be validated against a schema. This can be done as follows:

DOMResult dr = new DOMResult();

t.transform(xml , dr);

DOMSource ds = new DOMSource();

schema.newValidator().validate(ds(dr.getNode()));

So you see that the Validation APIs can be used with the Transformation APIs to do complex things easily. This approach also boosts performance because it avoids the step of parsing the XML again when validating a transformed XML document.

Validate a JDOM Document

The ValidatorHandler can be used to validate various object models such as JDOM against the schema(s). In fact, any object model (XOM, DOM4J, and so on) that can be built on top of a SAX stream or can emit SAX events can be used with the Schema Validation Framework to validate an XML document against a schema. This is possible because ValidationHandler can validate a SAX stream.

Let's see how a JDOM document can be validated against schema(s):

SAXOutputter so = new SAXOutputter(vh);

so.output(jdomDocument);

It is that simple. JDOM has a way to output a JDOM document as a stream of SAX events. SAXOutputter fires SAX events that are validated by ValidatorHandler. Any error encountered is reported through ErrorHandler set on ValidatorHandler.

Obtain Schema Type Information

ValidatorHandler can give access to TypeInfoProvider, which can be queried to access the type information determined by the validator. This object is dynamic in nature and returns the type information of the current element or attribute assessed by the ValidationHandler during validation of the XML document. This interface allows an application to know three things:

· Whether the attribute is declared as an ID type

· Whether the attribute was declared in the original XML document or was added by Validator during validation

· What type information of the element or attribute as declared in the schema is associated with the document

Type information is returned as an org.w3c.dom.TypeInfo object, which is defined as part of DOM L3. The TypeInfo object returned is immutable, and the caller can keep references to the obtained TypeInfo object longer than the callback scope. The methods of this interface may only be called by the startElement event of the ContentHandler that the application sets on the ValidatorHandler. For example, look at the section of the code below. (Note: For clarity, only part of the code is shown here. For the complete source, look at the sample SchemaTypeInformation.java, which can be downloaded from here.)

ValidatorHandler vh = schema.newValidatorHandler();

vh.setErrorHandler(eh);

vh.setContentHandler(new MyContentHandler(
			vh.getTypeInfoProvider()));

SAXParserFactory spf = SAXParserFactory.newInstance();

spf.setNamespaceAware(true);

XMLReader reader = spf.newSAXParser().getXMLReader();

reader.setContentHandler(vh);

reader.parse(<XML Document>);

Ensure Data Security

Validating an XML document against an untrusted schema could have serious consequences, as validation may modify the actual data by adding default attributes and possibly corrupting the data. Validation against an untrusted schema may also mean that an incoming instance document might not conform to your business's constraints or rules.

With the new Validation APIs, getting a Schema instance is the first step before being able to validate an instance document, and it is the application that determines how to create the Schema instance. Validation using the Schema instance makes sure that an incoming instance document is not validated against any other (untrusted) schema(s) but only against the schema(s) from which the instance is created. If the instance XML document has elements or attributes that refer to schema(s) from a different targetNamespace and are not part of javax.xml.validation.Schema representation, an error will be thrown. This approach protects you from accidental mistakes and malicious documents.

Reusing a Parser Instance

Is it possible to use the same parser instance to parse multiple XML documents? This was not clear, and the behavior was implementation dependent. JAXP 1.3 has added the new function reset() on SAXParser, DocumentBuilder, and Transformer. This guarantees that the same instance can be reused. The reset function improves the overall performance by saving resources, time associated with creating memory instances, and garbage collection time. Let's see how the reset() function can be used.

SAXParserFactory spf = SAXParserFactory.newInstance() ;

spf.setSchema(schema);

SAXParser saxParser = spf.newSAXParser();

for(int i = 0 ; i < n ; i++){

saxParser.parse(new File(args[i]), myHandler);

saxParser.reset(); }

The same function has also been added to newly designed javax.xml.validation.Validator, as well as to javax.xml.xpath.XPath. Applications are encouraged to reuse the parser, transformer, validator and XPath instance by calling reset() when processing multiple XML documents. Note that reset() sets the instance back to factory settings.

XPath Support

Accessing XML is made simple using XPath: A single XPath expression can be used to replace many lines of DOM API code. JAXP 1.3 has defined XPath APIs that conform to the XPath 1.0 specification and provide object-model-neutral APIs for the evaluation of XPath expressions and access to the evaluation environment. Though current APIs conform to XPath 1.0, the APIs have been designed with future XPath 2.0 support in mind.

To use JAXP 1.3 XPath APIs, the first step is to get the instance of XPathFactory. Though the default model is W3C DOM, it can be changed by specifying the object model URI:

XPathFactory factory = XPathFactory.newInstance();

XPathFactory factory = XpathFactory.newInstance(
			  <OBJECT MODEL URI>);

Evaluate the XPath Expression

XpathFactory is used to create XPath objects. The XPath interface provides access to the XPath evaluation environment and expressions. XPath has overloaded the evaluate() function, which can return the result by evaluating an XPath expression based on the return type set by the application. For example, look at the following XML document:

<Books>

<Book>

     <Author> Author1 </Author>

     <Name> Name1 </Name>

     <ISBN> ISBN1 </ISBN>

</Book>

<Book>

     <Author> Author2 </Author>

     <Name> Name2 </Name>

     <ISBN> ISBN2 </ISBN>

</Book>

</Books>

Following is the working code to evaluate the XPath expression and print the contents of all the Book elements in the XML document:

XPath xpath = XpathFactory.newInstance().newXPath();

String expression = "/Books/Book/Name/text()";

NodeSet nameNodes = (NodeSet) xpath.evaluate(expression, new

 InputSource("Books.xml"), XpathConstants.NODESET);

//print all the names of the books

for(int i = 0 ; i < result.getLength(); i++){

    System.out.println("Book name " + (i+1) + " is " +

    result.item(i).getNodeValue());

Evaluate With Context Specified

XPath is also capable of evaluating an expression based on the context set by the application. The following example sets the Document node as the context for evaluation:

DocumentBuilderFactory dbf = 
	DocumentBuilderFactory.newInstance();

DocumentBuilder db = 
	dbf.newDocumentBuilder();

Document d = db.parse(new File("Books.xml"));

XPath xpath = XPathFactory.newInstance().newXPath();

String exp = "/Books/Book";

NodeSet books = (NodeSet) xpath.evaluate(exp,
			  d,XpathConstants.NODESET);

With a reference to a Book element, a relative XPath expression can now be written to select the Name element as follows:

String expression = "Name";

Node book = xpath.evaluate(exp, books.item(0)
			  , XpathConstants.NODE);

NamespaceContext XPath Evaluation

What happens if the XML document is namespace aware? Look at the following XML document, in which the first Book element is in the publisher1 domain and the second in the publisher2 domain:

<Books >

<Book xmlns="www.publisher1.com">

     <Author>Author1</Author>

     <Name>Name1</Name>

     <ISBN>ISBN1</ISBN>

</Book>

<Book xmlns="www.publisher2.com">

     <Author>Author2</Author>

     <Name>Name2</Name>

     <ISBN>ISBN2</ISBN>

     <Cover>Hard</Cover>

</Book>

</Books>

In this case, the XPath expression /Books/Book/Name/text()won't give any result because the expression is not fully qualified. You can use an expression such as /Books/p1:Book/p1:Name with a p1 prefix. However, you should set NamespaceContext on the XPath instance so that the p1 prefix can be resolved. In the following sample, the NamespaceContext capable of resolving p1 is set on the XPath instance. Note that the two Book elements are in different namespaces, so the expression would result in only one node.

XPath xpath = XpathFactory.newInstance().newXPath();

String exp = "/Books/p1:Book/p1:Name" ;

xpath.setNamespaceContext(new MyNamespaceContext());

InputSource is = new InputSource("Books.xml");

NodeSet nn = (NodeSet)xpath.evaluate(exp, 
			  is, XpathConstants.NODESET);

// Print the count.

System.out.println("Node count = " + nn.getLength());

XPathVariableResolver

The XPath specification allows variables to be used in the XPath expressions. XPathVariableResolver is defined to provide access to the set of user-defined XPath variables. Here is an example of an XPath expression using Variable:

String exp = "/Books/j:Book[j:Name=$bookName]";

xpath.setXPathVariableResolver(
	new SimpleXPathVariableResolver());

InputSource is = new InputSource("Books.xml");

Node n = (Node) xpath.evaluate(exp, 
	is, XPathConstants.NODE);

System.out.println("Node name is " + n.getNodeName());

A SimpleXPathVariableResolver can implement the resolveVariable() function as follows. (Note: For clarity, only the relevant code is shown here.)

public Object resolveVariable(javax.xml.namespace.QName qName)

    if(qName.getLocalPart().equals("bookName"))

        return "Name1";

         ....

XML Schema Data Types

JAXP 1.3 has introduced new data types in the Java platform, the javax.xml.datatypes package, that directly map to some of the XML schema data types, thus bringing XML schema data type support directly into the Java platform.

The DatatypeFactory has functions to create different types of data types -- for example, xs:data, xs:dateTime, xs:duration, and so on. The javax.xml.datatype.XMLGregorianCalendar takes care of many W3C XML Schema 1.0 date and time data types, specifically, dateTime, time, date, gYearMonth, gMonthDay, gYear gMonth, and gDay defined in this XML namespace:

http://www.w3.org/2001/XMLSchema

These data types are normatively defined in W3C XML Schema 1.0, Part 2, Section 3.2.7-14.

The data type javax.xml.validation.Duration is an immutable representation of a time span as defined in the W3C XML Schema 1.0 specification. A Duration object represents a period of Gregorian time, which consists of six fields (years, months, days, hours, minutes, and seconds) as well as a sign field (+ or -).

Table 1 shows the mapping of XML schema data types to Java platform data types. Table 2 shows the mapping of XPath data types and Java Platform data types.

Table 1. XML Schema and Java Platform Data Type Mapping
W3C XML Schema Data Type	Java Platform Data Type
`xs:date`	`XMLGregorianCalendar`
`xs:dateTime`	`XMLGregorianCalendar`
`xs:duration`	`Duration`
`xs:gDay`	`XMLGregorianCalendar`
`xs:gMonth`	`XMLGregorianCalendar`
`xs:gMonthDay`	`XMLGregorianCalendar`
`xs:gYear`	`XMLGregorianCalendar`
`xs:gYearMonth`	`XMLGregorianCalendar`
`xs:time`	`XMLGregorianCalendar`

Table 2. XPath and Java Platform Data Type Mapping
XPath Data Type	Java Platform Data Type
`xdt:dayTimeDuration`	`Duration`
`xdt:yearMonthDuration`	`Duration`

These data types have a rich set of functions introduced to perform basic operations over data types, for example, addition, subtraction, and multiplication.

Also, there are ways to get the lexicalRepresentation of a particular data type that is defined at XML Schema 1.0, Part 2, Section 3.2.[7-14].1, Lexical Representation. There is no need to understand the complexities of XML schema data types such as what types of operations are allowed on a data type, how to write a lexical representation, and so on. The javax.xml.datatype APIs have defined a rich set of functions to make it easy for you.

XInclude Support

JAXP 1.3 has also defined the support for XInclude. SAXParserFactory/DocumentBuilderFactory should be configured to make it XInclude aware. Do this by setting setXIncludeAware() to true.

Security Enhancements

JAXP 1.3 has defined a security feature:

http://javax.xml.XMLConstants/feature/secure-processing

When set to true, this operates the parser in secure manner and instructs the implementation to process XML securely and avoid conditions such as denial-of-service attacks. Examples include restricting the number of entities that can be expanded, the number of attributes an element can have, and the XML schema constructs that would consume large amounts of resources, such as large values for minOccurs and maxOccurs. If XML processing is limited for security reasons, it will be reported by a call to the registered ErrorHandler.fatalError().

Summary

This article has introduced you to some of the new features in JAXP 1.3. You have seen the benefits of the Schema Validation Framework and seen how it can be used to improve the performance of schema validation. Developers working with applications using JAXP 1.2 schema properties to validate XML document against schemas should upgrade to JAXP 1.3 and use this framework. Remember to reuse the parser instance by calling the reset() method to improve performance.

New object-model-neutral XPath APIs bring XPath support and can work with different object models. XML schema data type support is brought directly into the Java platform with the introduction of new data types. Security features introduced in JAXP 1.3 can help protect the application from denial-of-service attacks. Also, JAXP 1.3 provides complete support for the latest standards: XML 1.1, DOM L3, XInclude, and SAX 2.0.2. These are enough reasons to upgrade to JAXP 1.3, and the implementation is available for downloading from java.net

Neeraj Bajaj

Công nghệ khác:

Easy and Efficient XML Processing: Upgrade to JAXP 1.3 - Part 2	Easy and Efficient XML Processing: Upgrade to JAXP 1.3
Sử dụng Regular Expression - kiểm tra tính hợp lệ của e-mail với PHP	Công ty phần mềm thời nay: Nhiều tiền chưa hẳn đã hay
Lỗi driver làm giảm thời lượng dùng pin laptop	Để hiểu thêm về phần mềm, mã nguồn mở
	Xem tiếp

Lịch khai giảng của hệ thống

Ngày	Giờ	T.Tâm
TP Hồ Chí Minh
Hà Nội