Developers

XML: A Primer

PCQ Bureau

04 Apr 1999 09:27 IST

New Update

If you keep an eye on Web development technologies and

trends, you would have heard about XML. The eXtensible Markup Language (XML) has taken the

Web-development world by storm. This piece will introduce you to the technology and how

it’s implemented. Next time we’ll take a look at writing some XML and using it.

I’ll use comparisons with HTML wherever possible to help you understand the concept.

Advertisment

What’s XML?

One of the first things to realize about XML is that

it’s NOT a language per se. Although called a "markup language",

it’s actually a "Meta Language", which means that XML has only syntax (the

grammar) but no keywords, programming constructs, etc. It actually defines

"rules" that let you structure other markup languages. HTML, on the other hand,

is a markup language with a fairly well defined set of keywords.

So what does XML do? XML lets you define the structure of

documents rather than how they look (which is what HTML does). This structure definition

makes it easy for the document to be reused and viewed in different ways. By separating

the content and structure of the document from its look and feel, you can make your

document available to different viewers without having to worry about creating a version

for each. A typical example would be the (very painful) requirement of Web developers

having to make sure their HTML pages look the same on every major browser. With XML, all

they need to create is the document structure, and let the browser worry about formatting

the document. Yes, I know this sounds great, but the bad news is that no current browser

is capable of doing that yet. Version 5 of both IE and Netscape promise to provide XML

parsers. (Incidentally, IE 4 already has two parsers, one in C++ and the other in Java.)

Advertisment

What does "structure of

data" mean?

I have been talking about the "structure" of data

all along. What does this really mean? To understand the concept let’s try it out

with an example in HTML and XML. Our document here will contain data about some popular

movies.

Videos

Advertisment

Name	Type	Released	Actors
Independence Day	Sci-fi	12/10/1997	Jeff Goldblum, Will Smith
Tomorrow Never Dies	Action	1/10/1998	Pierce Brosnan, Teri Hatcher

The HTML document (above) contains names of two

movies, with details including the type (or genre), date of release and main actors. But

notice the problem in this document. It’s very simple to view, but the document

itself doesn’t define what the data is, or how the values relate to each other.

Advertisment

Now, the same data in XML could look like:

Let’s dissect this document. The first line defines

the version of XML we are using. Next we define the top-level element (called the

"root" element) under which all the data is stored. Here we define the root

element to be containing videos. For each video, we create separate blocks, each

containing the associated data. Just by looking at the structure you can understand what

the document contains. If we had removed the header row of the table in the HTML version,

one could not know what the data was about.

XML also makes it easy to add qualifiers to the data. For

example, in the XML above, we can add the attribute "category" for the genre

element to give more details like this: Action. Or add attribute "sex" for the actor

elements:

Advertisment

Pierce

Brosnan

Teri Hatcher

The data becomes both human and machine readable and understandable,

which was one of the foremost requirements of the XML specification.

Rules for creating XML documents

Advertisment

XML documents meeting certain criteria can be well formed

and valid.

A well-formed document is one that follows these rules:

It contains one or more elements.

Advertisment

There’s one and only one root element for the whole

document.

Every element must end its start-tag with a corresponding

end-tag.

If the start-tag of an element is within the content of

another element, the end-tag must also be within the same.

The last two rules are the most important. Unlike HTML,

which has a lot of standalone tags like ,

, and

, which

don’t have any end-tags, every tag in XML MUST have an end tag, even if there is no
content within. But XML does give a shorthand notation to ease this. For example, if you

wish to have an element "rating", which could be empty in some cases, it could

be written as or just . Also, elements must

be nested so that each element ends where it started. For example,

would be wrong as

is not properly nested within <video> like this:<video><br></br> <title>Antz.

If all the above rules are followed, you end up with a

well-formed XML document. But it becomes a valid document only if it contains a Document

Type Declaration (DTD) and conforms to it. A DTD is a set of constraints defined for the

grammar of the document. It contains a list of valid elements, and their logical

structure. This is to make sure that others who are viewing, using or editing the XML

document maintain its original grammar. Interestingly, HTML also has DTDs. Except that no

one bothers to use them, as the syntax of HTML is predefined and well known. But in case

of XML, who is to know what the elements you create represent. To add a DTD to the

XML,

add just after the "1.0" ?> line. We’ll look at creating DTD’s next month.

How to you "use" XML

documents

There are many ways of accessing and using the data in XML

documents. XML is "parsed" by an XML processor and the results are sent to the

application that requested it. The XML processor uses the DTD to "recognize"

elements and browse through the XML document. To format the output from an XML processor,

the application can use an Extensible Stylesheet Language (XSL). This is a W3C

specification similar to Cascading Style Sheets (CSS) for HTML. Using different XSL sheets

for the same XML lets the data to be formatted in many different ways. This is a very

powerful way of sharing data.

The future of XML

Creating a standard way to share data all over the world is

one of the driving forces of the Net. Already there are many different applications of the

XML meta language. This includes MathML (for math related documents), CML (for chemistry

related documents), VML (for vector drawings), SMIL (Synchronized Multimedia Integration

Language) and more.

To understand how XML works, we’ll do a bit of coding

next month. We’ll create a valid XML document, a DTD, an XSL sheet and a JavaScript

program to display the data. Till then browse over to the W3C page on XML and see what the

excitement is all about.

Pierce

Brosnan

Teri Hatcher

The data becomes both human and machine readable and understandable,

which was one of the foremost requirements of the XML specification.

Rules for creating XML documents

XML documents meeting certain criteria can be well formed

and valid.

A well-formed document is one that follows these rules:

It contains one or more elements.

There’s one and only one root element for the whole

document.

Every element must end its start-tag with a corresponding

end-tag.

If the start-tag of an element is within the content of

another element, the end-tag must also be within the same.

The last two rules are the most important. Unlike HTML,

which has a lot of standalone tags like ,

would be wrong as

is not properly nested within <video> like this:<video><br></br> <title>Antz.

How to you "use" XML

documents

The future of XML

Advertisment