(Post 27/06/2006) Before we look at this approach,
let's look at how we have been doing schema validation using the schema
properties that were defined in JAXP 1.2...
Figure
2. Set Compiled Schema on DocumentBuilder/SAXParserFactory |
|
Validate XML Using Compiled Schema
Before we look at this approach, let's look at how we
have been doing schema validation using the schema properties that were
defined in JAXP 1.2:
http://java.sun.com/xml/properties/jaxp/schemaLanguage
http://java.sun.com/xml/properties/jaxp/schemaSource |
Here
is an example showing how these two properties are used in JAXP 1.3:
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespace(true);
spf.setValidating(true);
SAXParser sp = spf.newSAXParser();
sp.setProperty("http://java.sun.com/xml/properties/jaxp/schemaLanguage",
"http://www.w3.org/2001/XMLSchema");
sp.setProperty("http://java.sun.com/xml/properties/jaxp/schemaSource",
"mySchema.xsd") ;
sp.parse(<XML Document>, <ContentHandler); |
The
user sets the schemaLanguage and/or the schemaSource
property on SAXParser and sets the validation to true .
Generally, a business application defines a set of schemas containing
the business rules against which XML documents must be validated. To accomplish
this, an application sets the schema using the schemaSource
property or relies on the xsi :schemaLocation
attribute in the instance document to specify the schema location(s).
This approach works well, but there is a tremendous performance
penalty: The specified schemas are loaded again and again for every XML
document that needs to be validated! However, with the new Validation
APIs, an application needs to parse a set of schemas only once. See Figure
2.
After the Compile
Schema step, do the following.
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setSchema(schema);
SAXParser saxParser = spf.newSAXParser();
saxParser.parse(new File("instance.xml"), myHandler); |
Just
set the Schema instance on the factory and you are done.
There is no need to set the validation to true and no need
to set the schemaLanguage or schemaSource property.
Validation of XML documents is done against the compiled schema set on
the factory. You will be amazed by the performance gain using this approach.
Try it yourself.
Run the sample ComparePerformance.java ,
which can be downloaded from here.
Performance gain largely depends on the ratio of the size of the XML schema
to the size of the XML document. Larger ratios lead to a larger performance
gain. Look at the Reusing
a Parser Instance section to further improve the performance.
Note that it is an error to use either of the following
properties:
http://java.sun.com/xml/jaxp/properties/schemaLanguage
http://java.sun.com/xml/jaxp/properties/schemaSource |
in
conjunction with a non-null Schema object. Such configuration will cause a SAXException when those properties are set
on SAXParser or DocumentBuilderFactory .
Validate a SAXSource or DOMSource
As we mentioned earlier, there has been fundamental shift
in XML parsing and validation. Now XML validation is considered a process
independent from XML parsing. Once you have the Schema instance
loaded into memory, you can do many things. You can create a ValidatorHandler
that can validate
a SAX stream or create a stand-alone Validator (see Figure
3). A stand-alone Validator can validate a SAXSource ,
a DOMSource , or an XML document against any schema. In fact,
a Validator can still work if the SAX stream
or DOM object comes from a different implementation.
Figure 3. Validate a SAXSource or DOMSource Using a Validator |
|
To receive any errors during the validation, an ErrorHandler
should be registered with the Validator . Let's look at some
working code. (Note: For clarity, only a section of code is shown here.
For the complete source, look at the sample Validate.java ,
which can be downloaded here.)
Validator validator = schema.newValidator();
validator.setErrorHandler( new ErrorHandlerImpl());
validator.validate(new StreamSource(<XML Document>)); |
Validator can also be used to validate the
instance document or DOM object in memory, with the augmented
result sent to DOMResult .
Document document = //DOM object
validator.validate(new DOMSource(document), new DOMResult()); |
The Validation APIs can validate a SAX stream
and work in conjunction with Transformation APIs to achieve pipeline processing,
as we will see in the next section.
Validate
XML After Transformation
Transformation APIs are used to transform one XML document
into another by applying a style sheet. There are times when we need to
validate the transformed XML document against a schema. Should we feed
that XML document to a parser and then use the schema feature to do the
schema validation? No. The new Validation APIs give you the power to validate
the transformed XML document against a different schema by allowing the
application to create a pipeline and pass the output of a transformer
to the Validation APIs to validate against the desired schema. It doesn't
matter if the output of the transformation is a SAX stream
or a DOM in memory.
Validate a SAX Stream
The following code snippet shows you how to use specially
designed javax.xml.validation.ValidatorHandler to validate
a SAX stream. In the downloadable
source, look at the sample ValidateSAXStream.java for
more detail. Also look at the sample TransformerValidationHandler.java ,
which shows how to chain the output of Transformer to ValidatorHandler .
Here is a section of the code:
String language = XMLConstants.W3C_XML_SCHEMA_NS_URI ;
SchemaFactory sf = SchemaFactory.newInstance(language);
Schema schema = sf.newSchema(new File(<SCHEMA>));
ValidatorHandler vh = schema.newValidatorHandler();
vh.setErrorHandler(new ErrorHandlerImpl());
vh.setContentHandler(new ApplicationContentHandler());
TransformerFactory tf = TransformerFactory.newInstance();
StreamSource ss = new new StreamSource(<STYLESHEET>);
Transformer t = tf.newTransformer(ss);
StreamSource xml = new StreamSource(<XML DOCUMENT>);
t.transform(new StreamSource(xml, new SAXResult(vh)); |
Figure 4 shows the whole flow, with an XML document and
a style sheet given as input to a Transformer and a SAX
stream as the output. We take advantage of the modular approach of doing
validation independent from parsing. The ValidatorHandler
is a special handler that is capable of working directly with a SAX
stream. It validates the stream and passes it to the application.
Figure 4. Validating a SAX Stream |
|
(Continued)
Neeraj Bajaj |