Advertisment

XML: A Primer

author-image
PCQ Bureau
New Update

If you keep an eye on Web development technologies and

trends, you would have heard about XML. The eXtensible Markup Language (XML) has taken the

Web-development world by storm. This piece will introduce you to the technology and how

it’s implemented. Next time we’ll take a look at writing some XML and using it.

I’ll use comparisons with HTML wherever possible to help you understand the concept.

Advertisment

What’s XML?

One of the first things to realize about XML is that

it’s NOT a language per se. Although called a "markup language",

it’s actually a "Meta Language", which means that XML has only syntax (the

grammar) but no keywords, programming constructs, etc. It actually defines

"rules" that let you structure other markup languages. HTML, on the other hand,

is a markup language with a fairly well defined set of keywords.

So what does XML do? XML lets you define the structure of

documents rather than how they look (which is what HTML does). This structure definition

makes it easy for the document to be reused and viewed in different ways. By separating

the content and structure of the document from its look and feel, you can make your

document available to different viewers without having to worry about creating a version

for each. A typical example would be the (very painful) requirement of Web developers

having to make sure their HTML pages look the same on every major browser. With XML, all

they need to create is the document structure, and let the browser worry about formatting

the document. Yes, I know this sounds great, but the bad news is that no current browser

is capable of doing that yet. Version 5 of both IE and Netscape promise to provide XML

parsers. (Incidentally, IE 4 already has two parsers, one in C++ and the other in Java.)

Advertisment

What does "structure of

data" mean?

I have been talking about the "structure" of data

all along. What does this really mean? To understand the concept let’s try it out

with an example in HTML and XML. Our document here will contain data about some popular

movies.








Videos










Advertisment










Name Type Released Actors
Independence

Day
Sci-fi 12/10/1997 Jeff

Goldblum, Will Smith
Tomorrow Never

Dies
Action 1/10/1998 Pierce

Brosnan, Teri Hatcher









The HTML document (above) contains names of two

movies, with details including the type (or genre), date of release and main actors. But

notice the problem in this document. It’s very simple to view, but the document

itself doesn’t define what the data is, or how the values relate to each other.

Advertisment

Now, the same data in XML could look like:

Let’s dissect this document. The first line defines

the version of XML we are using. Next we define the top-level element (called the

"root" element) under which all the data is stored. Here we define the root

element to be containing videos. For each video, we create separate blocks, each

containing the associated data. Just by looking at the structure you can understand what

the document contains. If we had removed the header row of the table in the HTML version,

one could not know what the data was about.

XML also makes it easy to add qualifiers to the data. For

example, in the XML above, we can add the attribute "category" for the genre

element to give more details like this: Action. Or add attribute "sex" for the actor

elements:

Advertisment

Pierce

Brosnan




Teri Hatcher

The data becomes both human and machine readable and understandable,

which was one of the foremost requirements of the XML specification.

Rules for creating XML documents

Advertisment

XML documents meeting certain criteria can be well formed

and valid.

A well-formed document is one that follows these rules:

  1. It contains one or more elements.
  2. Advertisment
  3. There’s one and only one root element for the whole

    document.
  4. Every element must end its start-tag with a corresponding

    end-tag.
  5. If the start-tag of an element is within the content of

    another element, the end-tag must also be within the same.
  6. The last two rules are the most important. Unlike HTML,

    which has a lot of standalone tags like ,


    , and

    , which

    don’t have any end-tags, every tag in XML MUST have an end tag, even if there is no
    content within. But XML does give a shorthand notation to ease this. For example, if you

    wish to have an element "rating", which could be empty in some cases, it could

    be written as or just . Also, elements must

    be nested so that each element ends where it started. For example,

    would be wrong as

    is not properly nested within <video> like this:<video><br></br> <title>Antz.

    If all the above rules are followed, you end up with a

    well-formed XML document. But it becomes a valid document only if it contains a Document

    Type Declaration (DTD) and conforms to it. A DTD is a set of constraints defined for the

    grammar of the document. It contains a list of valid elements, and their logical

    structure. This is to make sure that others who are viewing, using or editing the XML

    document maintain its original grammar. Interestingly, HTML also has DTDs. Except that no

    one bothers to use them, as the syntax of HTML is predefined and well known. But in case

    of XML, who is to know what the elements you create represent. To add a DTD to the

    XML,

    add just after the "1.0" ?> line. We’ll look at creating DTD’s next month.

    How to you "use" XML

    documents

    There are many ways of accessing and using the data in XML

    documents. XML is "parsed" by an XML processor and the results are sent to the

    application that requested it. The XML processor uses the DTD to "recognize"

    elements and browse through the XML document. To format the output from an XML processor,

    the application can use an Extensible Stylesheet Language (XSL). This is a W3C

    specification similar to Cascading Style Sheets (CSS) for HTML. Using different XSL sheets

    for the same XML lets the data to be formatted in many different ways. This is a very

    powerful way of sharing data.

    The future of XML

    Creating a standard way to share data all over the world is

    one of the driving forces of the Net. Already there are many different applications of the

    XML meta language. This includes MathML (for math related documents), CML (for chemistry

    related documents), VML (for vector drawings), SMIL (Synchronized Multimedia Integration

    Language) and more.

    To understand how XML works, we’ll do a bit of coding

    next month. We’ll create a valid XML document, a DTD, an XSL sheet and a JavaScript

    program to display the data. Till then browse over to the W3C page on XML and see what the

    excitement is all about.

















    Let’s dissect this document. The first line defines

    the version of XML we are using. Next we define the top-level element (called the

    "root" element) under which all the data is stored. Here we define the root

    element to be containing videos. For each video, we create separate blocks, each

    containing the associated data. Just by looking at the structure you can understand what

    the document contains. If we had removed the header row of the table in the HTML version,

    one could not know what the data was about.

    XML also makes it easy to add qualifiers to the data. For

    example, in the XML above, we can add the attribute "category" for the genre

    element to give more details like this: Action. Or add attribute "sex" for the actor

    elements:

    Pierce

    Brosnan




    Teri Hatcher

    The data becomes both human and machine readable and understandable,

    which was one of the foremost requirements of the XML specification.

    Rules for creating XML documents

    XML documents meeting certain criteria can be well formed

    and valid.

    A well-formed document is one that follows these rules:

    1. It contains one or more elements.
    2. There’s one and only one root element for the whole

      document.
    3. Every element must end its start-tag with a corresponding

      end-tag.
    4. If the start-tag of an element is within the content of

      another element, the end-tag must also be within the same.
    5. The last two rules are the most important. Unlike HTML,

      which has a lot of standalone tags like ,


      , and

      , which

      don’t have any end-tags, every tag in XML MUST have an end tag, even if there is no
      content within. But XML does give a shorthand notation to ease this. For example, if you

      wish to have an element "rating", which could be empty in some cases, it could

      be written as or just . Also, elements must

      be nested so that each element ends where it started. For example,

      would be wrong as

      is not properly nested within <video> like this:<video><br></br> <title>Antz.

      If all the above rules are followed, you end up with a

      well-formed XML document. But it becomes a valid document only if it contains a Document

      Type Declaration (DTD) and conforms to it. A DTD is a set of constraints defined for the

      grammar of the document. It contains a list of valid elements, and their logical

      structure. This is to make sure that others who are viewing, using or editing the XML

      document maintain its original grammar. Interestingly, HTML also has DTDs. Except that no

      one bothers to use them, as the syntax of HTML is predefined and well known. But in case

      of XML, who is to know what the elements you create represent. To add a DTD to the

      XML,

      add just after the "1.0" ?> line. We’ll look at creating DTD’s next month.

      How to you "use" XML

      documents

      There are many ways of accessing and using the data in XML

      documents. XML is "parsed" by an XML processor and the results are sent to the

      application that requested it. The XML processor uses the DTD to "recognize"

      elements and browse through the XML document. To format the output from an XML processor,

      the application can use an Extensible Stylesheet Language (XSL). This is a W3C

      specification similar to Cascading Style Sheets (CSS) for HTML. Using different XSL sheets

      for the same XML lets the data to be formatted in many different ways. This is a very

      powerful way of sharing data.

      The future of XML

      Creating a standard way to share data all over the world is

      one of the driving forces of the Net. Already there are many different applications of the

      XML meta language. This includes MathML (for math related documents), CML (for chemistry

      related documents), VML (for vector drawings), SMIL (Synchronized Multimedia Integration

      Language) and more.

      To understand how XML works, we’ll do a bit of coding

      next month. We’ll create a valid XML document, a DTD, an XSL sheet and a JavaScript

      program to display the data. Till then browse over to the W3C page on XML and see what the

      excitement is all about.

      Advertisment