(Post 22/06/2006) In this
article, Neeraj Bajaj discusses how the Java API for XML Processing (JAXP)
1.3, with its validation APIs and XPath APIs, improves XML schema datatype
support, adds security, and increases performance.
Figure
1. |
|
Articles Index
This article explains some of the new
concepts and important features introduced in the Java API for XML Processing
(JAXP) 1.3. JSR 206 was developed with
performance and ease of use in mind. The new Validation Framework gives
much more power to any application dealing with XML schema and improves
performance significantly. XPath APIs provide access to the XPath evaluation
environment. JAXP 1.3 brings richer XML Schema data type support to the
Java platform by defining new data types that map to data types defined
in W3C XML Schema: Datatypes
specification.
Keeping pace with the evolution of XML
standards, JAXP 1.3 also adds complete support for the following standards:
XML 1.1, Document Object Model (DOM) L3,
XInclude, and Simple API for XML (SAX) 2.0.2. All this has already
gone into the Java platform in the latest release of the Java Platform, Standard Edition (J2SE) 5.0, code-named
Tiger. If you are using J2SE 1.3 or 1.4, you can download a stand-alone stable implementation
of JAXP 1.3 from java.net.
This article mainly concentrates on the
work done as part of the JSR 206 effort and explains new Schema Validation
Framework concepts, along with providing working code and diagrams. All
the samples are available for download from here. The major new features
introduced are the following:
Schema Validation Framework
JAXP 1.3 introduces a new schema-independent
Validation Framework (called the Validation APIs). This new framework
gives much more power to the application dealing with XML schema and can
accomplish things that were not possible before. The new approach makes
a fundamental shift in the way XML processing and validation are performed.
Validation used to be considered an integral part of XML parsing, and
previous versions of JAXP supported validation as a feature of an XML
parser: a SAXParser
or DocumentBuilder instance.
The new Validation APIs decouple the
validation of an instance document as a process independent of parsing.
This new approach has several advantages. Applications that rely heavily
on XML schema can greatly improve the performance of schema validation.
Perhaps more importantly, many previously unsolvable problems can now
be solved in an efficient, easy, and secure way. Let's look at what you
can do with the new Schema Validation Framework.
Validate
XML Against Any Schema
Though JAXP 1.3 requires support only
for W3C XML schema language, you can easily plug in support for other
schema languages, such as RELAX NG. The Validation APIs provide a pluggability
layer through which applications can provide specialized validation libraries
supporting additional schema languages. This is achieved using a SchemaFactory
class that is capable of locating implementations for the schema languages
at runtime. The first step is to specify the schema language to be used
and obtain the concrete factory implementation:
SchemaFactory
sf = SchemaFactory.newInstance(<SCHEMA LANGUAGE>);
<SCHEMA LANGUAGE> could be W3C XML Schema,
Relax NG etc. |
If this function returns successfully,
it means that an implementation capable of supporting specified schema
language is available. Getting the SchemaFactory
implementation is the entry point to the Validation APIs. This step goes
through the pluggability mechanism that has long been at the core of JAXP.
You can write the code in such a way that applications can switch between
W3C XML Schema and RELAX NG validation without changing a single line
of code.
Compile
Schema
With the new Validation APIs, an application
has the option to parse only the schema, checking schema syntax and semantics
against the constraints that the particular schema language imposes. This
is quite useful when you are writing a schema and want to make sure that
the schema conforms to the specification. The SchemaFactory
class does this job, loading the schemas and also preparing them in a
special form represented as a javax.xml.validation.Schema
object that can be used for validating instance documents against the
schema. A schema may include or import other schemas. In that case, those
schemas are also loaded.
When reading a schema, a SchemaFactory may need to resolve resources and
can encounter errors. As Figure 1 indicates, LSResourceResolver and an ErrorHandler can be registered
on SchemaFactory .
The ErrorHandler
is used to report any errors encountered during schema compilation. The
LSResourceResolver is
used to customize resolution of resources. This is a new interface introduced
as part of DOM L3. Functionally, it is the same as SAX EntityResolver , except
that it also provides the information about the namespace of the resource
being resolved -- for example, the targetNamespace of the W3C XML schema. (See Figure 1)
Here is a code sample that shows how
SchemaFactory can be used to compile schema
and get a Schema object:
String
language = XMLConstants.W3C_XML_SCHEMA_NS_URI;
SchemaFactory factory = SchemaFactory.newInstance(language);
factory.setErrorHandler(new MyErrorHandler());
factory.setResourceResolver( new MyLSResourceResolver());
StreamSource ss = new StreamSource(new File("mySchema.xsd")));
Schema schema = factory.newSchema(ss); |
A Schema
object is an immutable memory representation of schema. A Schema instance can be
shared with many different parser instances, even if they are running
in different threads. You can write applications so that the same set
of schema are parsed only once and the same Schema instance is passed to different instances
of the parser.
(Continued)
Neeraj Bajaj |