Markup Languages  
 

(Post 25/10/2005)

Historically, the word markup has been used to describe annotation or other marks within a text intended to instruct a compositor or typist how a particular passage should be printed or laid out. Examples include wavy underlining to indicate boldface, special symbols for passages to be omitted or printed in a particular font and so forth. As the formatting and printing of texts was automated, the term was extended to cover all sorts of special markup codes inserted into electronic texts to govern formatting, printing, or other processing.

Generalizing from that sense, we define markup, or (synonymously) encoding, as any means of making explicit an interpretation of a text. Encoding a text for computer processing is in principle, like transcribing a manuscript from scriptio continua, a process of making explicit what is conjectural or implicit, a process of directing the user as to how the content of the text should be interpreted.

By markup language we mean a set of markup conventions used together for encoding texts. A markup language must specify what markup is allowed, what markup is required, how markup is to be distinguished from text, and what the markup means.

Historically, markup was used to refer to:

  • The process of marking manuscript copy for typesetting with directions for use of type fonts and sizes, spacing, indentation, etc. (from the Chicago Manual of Style, the bible of most publishers.)
  • Electronic Markup originally referred to the internal, sometimes invisible codes in documents which described the formatting.
  • In WYSIWYG systems, the system inserts the codes. In early WYSIWYG systems such as Wordstar, the markup is visible on the screen.

Markup can be classified as one of two types:

  • Procedural Markup which is concerned with the appearance of text - its font, spacing etc.
  • Descriptive or Declarative Markup which is concerned with the structure or function of the tagged item.

Markup Langauges permit you to use your information for applications beyond traditional publishing. For example:

  •      World Wide Web home pages
  •      Information databases
  •      Diagnostic/expert systems
  •      Electronic mail
  •      Hypermedia and hypertext documents
  •      Database publishing
  •      CD-ROM publishing
  •      Interactive Electronic Technical Manuals (IETMs)
  •      Electronic review

Various markup languages used are SGML , HTML , XML and the latest being WML.

SGML

The encoding scheme defined by the TEI Guidelines for Electronic Text Encoding and Interchange Guidelines is formulated as an application of a system known as the Standard Generalized Markup Language (SGML).

SGML is an international standard for the description of marked-up electronic text. More exactly, SGML is a metalanguage, that is, a means of formally describing a language, in this case, a markup language.

There are three characteristics of SGML which distinguish it from other markup languages: its emphasis on descriptive rather than procedural markup; its document type concept; and its independence of any one system for representing the script in which a text is written.

SGML is the basis of two essential Internet standards:

  • HTML, the language of web pages;
  • XML, the new solution for electronic documents and electronic commerce.

HTML

Short for HyperText Markup Language, the authoring language used to create documents on the World Wide Web. HTML is similar to SGML, although it is not a strict subset.

HTML defines the structure and layout of a Web document by using a variety of tags and attributes. The correct structure for an HTML document starts with <HTML><HEAD>(enter here what document is about)</HEAD><BODY> and ends with </BODY></HTML>. All the  inormation you'd like to include in your Web page fits in between the <BODY> and </BODY> tags.

XHTML 1.0 is the current W3C Recommendation

W3C produces what are known as "Recommendations" for HTML. These are specifications, developed by W3C working groups, and then voted in by Members of the Consortium. A W3C Recommendation indicates that consensus has been reached among the Consortium Members that a specification is appropriate for widespread use.

XHTML 1.0 is W3C's recommendation for the latest version of HTML, following on from earlier work on HTML 4.01, HTML 4.0, HTML 3.2 and HTML 2.0. With a wealth of features, XHTML 1.0 is a reformulation of HTML 4.01 in XML, and combines the strength of HTML4 with the power of XML.

Three "flavors" of XHTML:

XHTML 1.0 is specified in three "flavors".

XHTML Transitional - Most people writing Web pages for the general public to access will want to use this flavor of HTML 4. The idea is to take advantage of XHTML features including style sheets but nonetheless to make small adjustments to your mark-up for the benefit of those viewing your pages with older browsers which can't understand style sheets. These include using BODY with bgcolor, text and link attributes.

XHTML Strict - Use this when you want really clean structural mark-up, free of any tags associated with layout. Use this together with W3C's Cascading Style Sheet language (CSS) to get the font, color, and layout effects you want.

XHTML Frameset - Use this when you want to use HTML Frames to partition the browser window into two or more frames.

XHTML markup must conform to the markup standards defined in a HTML DTD.

When applied to Net devices, XHTML must go through a modularization process. This enables XHTML pages to be read by many different platforms.

A device designer, using standard building blocks, will specify which elements are supported. Content creators will then target these building blocks--or modules.

Because these modules conform to certain standards, XHTML's extensibility ensures that layout and presentation stay true-to-form over any platform.

Dynamic HTML is the next big thing on the Internet. With Dynamic HTML, you can layer multiple images on top of one another, precisely control the layout of your page, add new interactivity and much more!

The next phase of work on HTML will seek to complete the transition to XML, with continued work on modularization, work on XML Schemas for XHTML, and registering an Internet Media Type for XHTML following the guidelines set out by the W3C-IETF liaison group studying Internet Media types for applications of XML.

XML

XML, or eXtensible Markup Language, is a recommendation from the World Wide Web Consortium (W3C) issued in early 1998. It is a language designed to deliver structured information over the web more effectively than current languages used for web publishing, namely HTML. XML separates the content of a document from its presentation and provides a common format for transferring data across the World Wide Web or a company intranet. The result is a technology that makes data available regardless of the proprietary systems involved. Innumerable analysts and visionaries predict that XML will surpass HTML as the "lingua franca" of the Internet.

XML represents a capacity for sharing information that didn't exist before. Virtually any kind of data can be encapsulated in XML, moved across networks, processed automatically, and published dynamically. Ultimately, XML opens a new world of possibilities for sharing, managing and publishing information on the web.

XML allows you to tag your documents with meaningful tags such as <productname>, <chapter>, and <title>. You can leverage these tags in your search technology to allow users to search for words only in chapter titles, for example, or to search for all product names.

cXML

Commerce XML is a new set of document type definitions (DTD) for the XML specification. cXML works as a meta-language that defines necessary information about a product. It will be used to standardize the exchange of catalog content and to define request/response processes for secure electronic transactions over the Internet. The processes includes purchase orders, change orders, acknowledgments, status updates, ship notifications and payment transactions.

cXML began as a collaborative effort among 40+ companies looking to reduce the costs of online business. This standardized methodology will allow participating companies--and others who implement the cXML framework--to constantly improve and streamline electronic commerce.

Some queries on SGML, HTML and XML.....

Aren't XML, SGML, and HTML all the same thing?

Not quite. SGML is the `mother tongue', used for describing thousands of different document types in many fields of human activity, from transcriptions of ancient Irish manuscripts to the technical documentation for stealth bombers, and from patients' clinical records to musical notation.

HTML is just one of these document types, the one most frequently used in the Web. It defines a simple, fixed type of document with markup designed for a common class of office or technical report, with headings, paragraphs, lists, illustrations, etc, and some provision for hypertext and multimedia.

XML is an abbreviated version of SGML, to make it easier for you to define your own document types, and to make it easier for programmers to write programs to handle them. It omits the more complex and less-used parts of SGML in return for the benefits of being easier to write applications for, easier to understand, and more suited to delivery and interoperability over the Web. But it is still SGML, and XML files may still be parsed and validated the same as any other SGML file

What is the difference between SGML/XML and C or C++?

C and C++ (and other languages like Fortran, or Pascal, or Basic, or Java or dozens more) are programming languages with which you specify calculations, actions, and decisions to be carried out:

SGML and XML are markup specification languages with which you can design ways of describing information, usually for storage, transmission, or processing by a program:

On its own, a file of SGML or XML text (including HTML) doesn't do anything: you have to run a program to do something with it.

Why not just carry on extending HTML?

HTML is already overburdened with dozens of interesting but incompatible inventions from different manufacturers, because it provides only one way of describing your information.

XML allows groups of people or organizations to create their own customized markup applications for exchanging information in their domain (music, chemistry, electronics, hill-walking, finance, surfing, petroleum geology, linguistics, cooking, knitting, stellar cartography, history, engineering, rabbit-keeping, mathematics, et cætera ad infinitum).

HTML is at the limit of its usefulness as a way of describing information, and while it will continue to play an important role for the content it currently represents, many new applications require a more robust and flexible infrastructure.

Do I have to know HTML or SGML before I learn XML?

No, but it would be useful because a lot of terminology and practice is in common between SGML, HTML, and XML.

Be aware that ‘knowing HTML’ is not the same as ‘understanding SGML’.. Although HTML was written as an SGML application, browsers ignore large parts of the SGML (which is why so many useful things don't work), so just because something is done a certain way in HTML in a HTML browser does not mean it's correct.

Why should I use XML instead of HTML?

Authors and providers can design their own document types using XML, instead of being stuck with HTML. Document types can be explicitly tailored to an audience, so the cumbersome fudging that has to take place with HTML to achieve special effects can become a thing of the past: authors and designers are free to invent their own markup elements

Information content can be richer and easier to use, because the hypertext linking abilities of XML are much greater than those of HTML.

XML can provide more and better facilities for browser presentation and performance, using CSS (Cascading Sytle Sheets) and XSL (Extensible Style Language)

It removes many of the underlying complexities of SGML in favour of a more flexible model, so writing programs to handle XML is much easier than doing the same for full SGML.

Information will be more accessible and reusable, because the more flexible markup of XML can be used by any XML software instead of being restricted to specific manufacturers as has become the case with HTML.

Does XML replace HTML?

No. XML itself does not replace HTML: instead, it provides an alternative which allows you to define your own set of markup elements. HTML is expected to remain in common use for some time to come, and Document Type Definitions for HTML are available in XML versions as well as in original SGML. XML is designed to make the writing of DTDs much simpler than with full SGML.

WML

Wireless Application Protocol (WAP) is a result of continuous work to define an industry wide standard for developing applications over wireless communication networks. WML (Wireless Markup Language) is a markup language based on XML, and is intended for use in specifying content and user interface for narrowband devices, including cellular phones and pagers. WML is designed with the constraints of small narrowband devices in mind. The official WML specification is developed and maintained by the WAP Forum, an industry-wide consortium founded by Nokia, Phone.com, Motorola, and Ericsson.

WML offers software developers an entirely new, exciting platform on which to deploy their applications. With this new platform, however, comes a host of tradeoffs  and challenges. A new wrinkle will be added to the design process as things like server round-trips, bandwidth, and display sizes become issues to contend with. While it may  take several iterations for developers and vendors to get their product offerings right, there is no doubt that WAP opens the door to a new era in application development and deployment.

Other Markup Languages being used for specific applications are

VRML - Virtual Reality Modeling Language
MathML - Mathematical Markup Language
CML - Chemical Markup Language
FpML - Financial Products Markup Language
FML - Forms Markup Language

W3C

Short for World Wide Web Consortium, an international consortium of  companies involved with the Internet and the Web. The W3C was founded in 1994 by Tim Berners-Lee, the original architect of the World Wide Web. The organization's purpose is to develop open standards so that the Web evolves in a single direction rather than being splintered among competing factions. The W3C is the chief standards body for HTTP and HTML.

DTD

Short for document type definition, a type of file associated with SGML and XML documents that defines how the markup tags should be interpreted by the application presenting the document. The HTML specification that defines how Web pages should be displayed by Web browsers is one example of a DTD. XML promises to expand the formatting capabilities of Web documents by supporting additional DTDs.


 
 

 
     
 
Công nghệ khác:


Creating a cable free world - by Rajeev Shukla (Aptech Technology Group)J2ME Technology - by Mamta M (Aptech Technology Group)
The Evolution Of Web Services - by Preetham D (Aptech Technology Group)XML and Seamless Business (By- Bhairavi M)
Introduction to TCP/IPWhat is Business Process Outsourcing?
  Xem tiếp    
 
Lịch khai giảng của hệ thống
 
Ngày
Giờ
T.Tâm
TP Hồ Chí Minh
Hà Nội
 
   
New ADSE - Nhấn vào để xem chi tiết
Mừng Sinh Nhật Lần Thứ 20 FPT-APTECH
Nhấn vào để xem chi tiết
Bảng Vàng Thành Tích Sinh Viên FPT APTECH - Nhấn vào để xem chi tiết
Cập nhật công nghệ miễn phí cho tất cả cựu sinh viên APTECH toàn quốc
Tiết Thực Vì Cộng Đồng
Hội Thảo CNTT
Những khoảnh khắc không phai của Thầy Trò FPT-APTECH Ngày 20-11