(Post 25/10/2005)
Historically, the word markup has been used to describe
annotation or other marks within a text intended to instruct a compositor
or typist how a particular passage should be printed or laid out. Examples
include wavy underlining to indicate boldface, special symbols for passages
to be omitted or printed in a particular font and so forth. As the formatting
and printing of texts was automated, the term was extended to cover all
sorts of special markup codes inserted into electronic texts to govern
formatting, printing, or other processing.
Generalizing from that sense, we define markup, or (synonymously)
encoding, as any means of making explicit an interpretation of a text.
Encoding a text for computer processing is in principle, like transcribing
a manuscript from scriptio continua, a process of making explicit what
is conjectural or implicit, a process of directing the user as to how
the content of the text should be interpreted.
By markup language we mean a set of markup conventions
used together for encoding texts. A markup language must specify what
markup is allowed, what markup is required, how markup is to be distinguished
from text, and what the markup means.
Historically, markup was used to refer to:
- The process of marking manuscript copy for typesetting with directions
for use of type fonts and sizes, spacing, indentation, etc. (from
the Chicago Manual of Style, the bible of most publishers.)
- Electronic Markup originally referred to the internal, sometimes
invisible codes in documents which described the formatting.
- In WYSIWYG systems, the system inserts the codes. In early WYSIWYG
systems such as Wordstar, the markup is visible on the screen.
Markup can be classified as one of two
types:
- Procedural Markup which is concerned with the appearance of text
- its font, spacing etc.
- Descriptive or Declarative Markup which is concerned with the structure
or function of the tagged item.
Markup Langauges permit you to use your
information for applications beyond traditional publishing. For example:
- World Wide Web home pages
- Information databases
- Diagnostic/expert systems
- Electronic mail
- Hypermedia and hypertext documents
- Database publishing
- CD-ROM publishing
- Interactive Electronic Technical Manuals
(IETMs)
- Electronic review
Various markup languages used are SGML
, HTML , XML and the latest being WML.
SGML
The encoding scheme defined by the TEI Guidelines for
Electronic Text Encoding and Interchange Guidelines is formulated as an
application of a system known as the Standard Generalized Markup Language
(SGML).
SGML is an international standard for the description
of marked-up electronic text. More exactly, SGML is a metalanguage, that
is, a means of formally describing a language, in this case, a markup
language.
There are three characteristics of SGML which distinguish
it from other markup languages: its emphasis on descriptive rather than
procedural markup; its document type concept; and its independence of
any one system for representing the script in which a text is written.
SGML is the basis of two essential Internet standards:
- HTML, the language of web pages;
- XML, the new solution for electronic documents and electronic commerce.
HTML
Short for HyperText Markup Language, the authoring language
used to create documents on the World Wide Web. HTML is similar to SGML,
although it is not a strict subset.
HTML defines the structure and layout of a Web document
by using a variety of tags and attributes. The correct structure for an
HTML document starts with <HTML><HEAD>(enter here what document
is about)</HEAD><BODY> and ends with </BODY></HTML>.
All the inormation you'd like to include in your Web page fits in
between the <BODY> and </BODY> tags.
XHTML 1.0 is the current W3C
Recommendation
W3C produces what are known as "Recommendations"
for HTML. These are specifications, developed by W3C working groups, and
then voted in by Members of the Consortium. A W3C Recommendation indicates
that consensus has been reached among the Consortium Members that a specification
is appropriate for widespread use.
XHTML 1.0 is W3C's recommendation for the latest version
of HTML, following on from earlier work on HTML 4.01, HTML 4.0, HTML 3.2
and HTML 2.0. With a wealth of features, XHTML 1.0 is a reformulation
of HTML 4.01 in XML, and combines the strength of HTML4 with the power
of XML.
Three "flavors" of XHTML:
XHTML 1.0 is specified in three "flavors".
XHTML Transitional -
Most people writing Web pages for the general public to access will want
to use this flavor of HTML 4. The idea is to take advantage of XHTML features
including style sheets but nonetheless to make small adjustments to your
mark-up for the benefit of those viewing your pages with older browsers
which can't understand style sheets. These include using BODY with bgcolor,
text and link attributes.
XHTML Strict - Use
this when you want really clean structural mark-up, free of any tags associated
with layout. Use this together with W3C's Cascading Style Sheet language
(CSS) to get the font, color, and layout effects you want.
XHTML Frameset - Use
this when you want to use HTML Frames to partition the browser window
into two or more frames.
XHTML markup must conform to the markup standards defined
in a HTML DTD.
When applied to Net devices, XHTML must go through a
modularization process. This enables XHTML pages to be read by many different
platforms.
A device designer, using standard building blocks, will
specify which elements are supported. Content creators will then target
these building blocks--or modules.
Because these modules conform to certain standards, XHTML's
extensibility ensures that layout and presentation stay true-to-form over
any platform.
Dynamic HTML is the
next big thing on the Internet. With Dynamic HTML, you can layer multiple
images on top of one another, precisely control the layout of your page,
add new interactivity and much more!
The next phase of work on HTML will seek to complete
the transition to XML, with continued work on modularization, work on
XML Schemas for XHTML, and registering an Internet Media Type for XHTML
following the guidelines set out by the W3C-IETF liaison group studying
Internet Media types for applications of XML.
XML
XML, or eXtensible Markup Language, is a recommendation
from the World Wide Web Consortium (W3C) issued in early 1998. It is a
language designed to deliver structured information over the web more
effectively than current languages used for web publishing, namely HTML.
XML separates the content of a document from its presentation and provides
a common format for transferring data across the World Wide Web or a company
intranet. The result is a technology that makes data available regardless
of the proprietary systems involved. Innumerable analysts and visionaries
predict that XML will surpass HTML as the "lingua franca" of
the Internet.
XML represents a capacity for sharing information that
didn't exist before. Virtually any kind of data can be encapsulated in
XML, moved across networks, processed automatically, and published dynamically.
Ultimately, XML opens a new world of possibilities for sharing, managing
and publishing information on the web.
XML allows you to tag your documents with meaningful
tags such as <productname>, <chapter>, and <title>.
You can leverage these tags in your search technology to allow users to
search for words only in chapter titles, for example, or to search for
all product names.
cXML
Commerce XML is a new set of document type definitions
(DTD) for the XML specification. cXML works as a meta-language that defines
necessary information about a product. It will be used to standardize
the exchange of catalog content and to define request/response processes
for secure electronic transactions over the Internet. The processes includes
purchase orders, change orders, acknowledgments, status updates, ship
notifications and payment transactions.
cXML began as a collaborative effort among 40+ companies
looking to reduce the costs of online business. This standardized methodology
will allow participating companies--and others who implement the cXML
framework--to constantly improve and streamline electronic commerce.
Some queries on SGML, HTML and
XML.....
Aren't XML, SGML, and HTML all the
same thing?
Not quite. SGML is the `mother tongue', used for describing
thousands of different document types in many fields of human activity,
from transcriptions of ancient Irish manuscripts to the technical documentation
for stealth bombers, and from patients' clinical records to musical notation.
HTML is just one of these document types, the one most
frequently used in the Web. It defines a simple, fixed type of document
with markup designed for a common class of office or technical report,
with headings, paragraphs, lists, illustrations, etc, and some provision
for hypertext and multimedia.
XML is an abbreviated version of SGML, to make it easier
for you to define your own document types, and to make it easier for programmers
to write programs to handle them. It omits the more complex and less-used
parts of SGML in return for the benefits of being easier to write applications
for, easier to understand, and more suited to delivery and interoperability
over the Web. But it is still SGML, and XML files may still be parsed
and validated the same as any other SGML file
What is the difference between SGML/XML
and C or C++?
C and C++ (and other languages like Fortran, or Pascal,
or Basic, or Java or dozens more) are programming languages with which
you specify calculations, actions, and decisions to be carried out:
SGML and XML are markup specification languages with
which you can design ways of describing information, usually for storage,
transmission, or processing by a program:
On its own, a file of SGML or XML text (including HTML)
doesn't do anything: you have to run a program to do something with it.
Why not just carry on extending
HTML?
HTML is already overburdened with dozens of interesting
but incompatible inventions from different manufacturers, because it provides
only one way of describing your information.
XML allows groups of people or organizations to create
their own customized markup applications for exchanging information in
their domain (music, chemistry, electronics, hill-walking, finance, surfing,
petroleum geology, linguistics, cooking, knitting, stellar cartography,
history, engineering, rabbit-keeping, mathematics, et cætera ad infinitum).
HTML is at the limit of its usefulness as a way of describing
information, and while it will continue to play an important role for
the content it currently represents, many new applications require a more
robust and flexible infrastructure.
Do I have to know HTML or SGML before
I learn XML?
No, but it would be useful because a lot of terminology
and practice is in common between SGML, HTML, and XML.
Be aware that ‘knowing HTML’ is not the same as ‘understanding
SGML’.. Although HTML was written as an SGML application, browsers ignore
large parts of the SGML (which is why so many useful things don't work),
so just because something is done a certain way in HTML in a HTML browser
does not mean it's correct.
Why should I use XML instead of
HTML?
Authors and providers can design their own document types
using XML, instead of being stuck with HTML. Document types can be explicitly
tailored to an audience, so the cumbersome fudging that has to take place
with HTML to achieve special effects can become a thing of the past: authors
and designers are free to invent their own markup elements
Information content can be richer and easier to use,
because the hypertext linking abilities of XML are much greater than those
of HTML.
XML can provide more and better facilities for browser
presentation and performance, using CSS (Cascading Sytle Sheets) and XSL
(Extensible Style Language)
It removes many of the underlying complexities of SGML
in favour of a more flexible model, so writing programs to handle XML
is much easier than doing the same for full SGML.
Information will be more accessible and reusable, because
the more flexible markup of XML can be used by any XML software instead
of being restricted to specific manufacturers as has become the case with
HTML.
Does XML replace HTML?
No. XML itself does not replace HTML: instead, it provides
an alternative which allows you to define your own set of markup elements.
HTML is expected to remain in common use for some time to come, and Document
Type Definitions for HTML are available in XML versions as well as in
original SGML. XML is designed to make the writing of DTDs much simpler
than with full SGML.
WML
Wireless Application Protocol (WAP) is a result of continuous
work to define an industry wide standard for developing applications over
wireless communication networks. WML (Wireless Markup Language) is a markup
language based on XML, and is intended for use in specifying content and
user interface for narrowband devices, including cellular phones and pagers.
WML is designed with the constraints of small narrowband devices in mind.
The official WML specification is developed and maintained by the WAP
Forum, an industry-wide consortium founded by Nokia, Phone.com, Motorola,
and Ericsson.
WML offers software developers an entirely new, exciting
platform on which to deploy their applications. With this new platform,
however, comes a host of tradeoffs and challenges. A new wrinkle
will be added to the design process as things like server round-trips,
bandwidth, and display sizes become issues to contend with. While it may
take several iterations for developers and vendors to get their product
offerings right, there is no doubt that WAP opens the door to a new era
in application development and deployment.
Other Markup Languages being
used for specific applications are
VRML - Virtual Reality Modeling Language
MathML - Mathematical Markup Language
CML - Chemical Markup Language
FpML - Financial Products Markup Language
FML - Forms Markup Language
W3C
Short for World Wide Web Consortium, an international
consortium of companies involved with the Internet and the Web.
The W3C was founded in 1994 by Tim Berners-Lee, the original architect
of the World Wide Web. The organization's purpose is to develop open standards
so that the Web evolves in a single direction rather than being splintered
among competing factions. The W3C is the chief standards body for HTTP
and HTML.
DTD
Short for document type definition, a type of file associated
with SGML and XML documents that defines how the markup tags should be
interpreted by the application presenting the document. The HTML specification
that defines how Web pages should be displayed by Web browsers is one
example of a DTD. XML promises to expand the formatting capabilities of
Web documents by supporting additional DTDs. |