If you keep an eye on Web development technologies and
trends, you would have heard about XML. The eXtensible Markup Language (XML) has taken the
Web-development world by storm. This piece will introduce you to the technology and how
it’s implemented. Next time we’ll take a look at writing some XML and using it.
I’ll use comparisons with HTML wherever possible to help you understand the concept.
What’s XML?
One of the first things to realize about XML is that
it’s NOT a language per se. Although called a "markup language",
it’s actually a "Meta Language", which means that XML has only syntax (the
grammar) but no keywords, programming constructs, etc. It actually defines
"rules" that let you structure other markup languages. HTML, on the other hand,
is a markup language with a fairly well defined set of keywords.
So what does XML do? XML lets you define the structure of
documents rather than how they look (which is what HTML does). This structure definition
makes it easy for the document to be reused and viewed in different ways. By separating
the content and structure of the document from its look and feel, you can make your
document available to different viewers without having to worry about creating a version
for each. A typical example would be the (very painful) requirement of Web developers
having to make sure their HTML pages look the same on every major browser. With XML, all
they need to create is the document structure, and let the browser worry about formatting
the document. Yes, I know this sounds great, but the bad news is that no current browser
is capable of doing that yet. Version 5 of both IE and Netscape promise to provide XML
parsers. (Incidentally, IE 4 already has two parsers, one in C++ and the other in Java.)
What does "structure of
data" mean?
I have been talking about the "structure" of data
all along. What does this really mean? To understand the concept let’s try it out
with an example in HTML and XML. Our document here will contain data about some popular
movies.
Name | Type | Released | Actors |
---|---|---|---|
Independence Day |
Sci-fi | 12/10/1997 | Jeff Goldblum, Will Smith |
Tomorrow Never Dies |
Action | 1/10/1998 | Pierce Brosnan, Teri Hatcher |
The HTML document (above) contains names of two
movies, with details including the type (or genre), date of release and main actors. But
notice the problem in this document. It’s very simple to view, but the document
itself doesn’t define what the data is, or how the values relate to each other.
Now, the same data in XML could look like:
Let’s dissect this document. The first line defines
the version of XML we are using. Next we define the top-level element (called the
"root" element) under which all the data is stored. Here we define the root
element to be containing videos. For each video, we create separate blocks, each
containing the associated data. Just by looking at the structure you can understand what
the document contains. If we had removed the header row of the table in the HTML version,
one could not know what the data was about.
XML also makes it easy to add qualifiers to the data. For
example, in the XML above, we can add the attribute "category" for the genre
element to give more details like this:
elements:
Brosnan
The data becomes both human and machine readable and understandable,
which was one of the foremost requirements of the XML specification.
Rules for creating XML documents
XML documents meeting certain criteria can be well formed
and valid.
A well-formed document is one that follows these rules:
document.
end-tag.
another element, the end-tag must also be within the same.
The last two rules are the most important. Unlike HTML,
which has a lot of standalone tags like ,
, and
, which
don’t have any end-tags, every tag in XML MUST have an end tag, even if there is no
content within. But XML does give a shorthand notation to ease this. For example, if you
wish to have an element "rating", which could be empty in some cases, it could
be written as
be nested so that each element ends where it started. For example,
would be wrong as
If all the above rules are followed, you end up with a
well-formed XML document. But it becomes a valid document only if it contains a Document
Type Declaration (DTD) and conforms to it. A DTD is a set of constraints defined for the
grammar of the document. It contains a list of valid elements, and their logical
structure. This is to make sure that others who are viewing, using or editing the XML
document maintain its original grammar. Interestingly, HTML also has DTDs. Except that no
one bothers to use them, as the syntax of HTML is predefined and well known. But in case
of XML, who is to know what the elements you create represent. To add a DTD to the
XML,
add
just after the
"1.0" ?> line. We’ll look at creating DTD’s next month.
How to you "use" XML
documents
There are many ways of accessing and using the data in XML
documents. XML is "parsed" by an XML processor and the results are sent to the
application that requested it. The XML processor uses the DTD to "recognize"
elements and browse through the XML document. To format the output from an XML processor,
the application can use an Extensible Stylesheet Language (XSL). This is a W3C
specification similar to Cascading Style Sheets (CSS) for HTML. Using different XSL sheets
for the same XML lets the data to be formatted in many different ways. This is a very
powerful way of sharing data.
The future of XML
Creating a standard way to share data all over the world is
one of the driving forces of the Net. Already there are many different applications of the
XML meta language. This includes MathML (for math related documents), CML (for chemistry
related documents), VML (for vector drawings), SMIL (Synchronized Multimedia Integration
Language) and more.
To understand how XML works, we’ll do a bit of coding
next month. We’ll create a valid XML document, a DTD, an XSL sheet and a JavaScript
program to display the data. Till then browse over to the W3C page on XML and see what the
excitement is all about.
Smith
Let’s dissect this document. The first line defines
the version of XML we are using. Next we define the top-level element (called the
"root" element) under which all the data is stored. Here we define the root
element to be containing videos. For each video, we create separate blocks, each
containing the associated data. Just by looking at the structure you can understand what
the document contains. If we had removed the header row of the table in the HTML version,
one could not know what the data was about.
XML also makes it easy to add qualifiers to the data. For
example, in the XML above, we can add the attribute "category" for the genre
element to give more details like this:
elements:
Brosnan
The data becomes both human and machine readable and understandable,
which was one of the foremost requirements of the XML specification.
Rules for creating XML documents
XML documents meeting certain criteria can be well formed
and valid.
A well-formed document is one that follows these rules:
document.
end-tag.
another element, the end-tag must also be within the same.
The last two rules are the most important. Unlike HTML,
which has a lot of standalone tags like ,
, and
, which
don’t have any end-tags, every tag in XML MUST have an end tag, even if there is no
content within. But XML does give a shorthand notation to ease this. For example, if you
wish to have an element "rating", which could be empty in some cases, it could
be written as
be nested so that each element ends where it started. For example,
If all the above rules are followed, you end up with a
well-formed XML document. But it becomes a valid document only if it contains a Document
Type Declaration (DTD) and conforms to it. A DTD is a set of constraints defined for the
grammar of the document. It contains a list of valid elements, and their logical
structure. This is to make sure that others who are viewing, using or editing the XML
document maintain its original grammar. Interestingly, HTML also has DTDs. Except that no
one bothers to use them, as the syntax of HTML is predefined and well known. But in case
of XML, who is to know what the elements you create represent. To add a DTD to the
XML,
add
just after the
"1.0" ?> line. We’ll look at creating DTD’s next month.
How to you "use" XML
documents
There are many ways of accessing and using the data in XML
documents. XML is "parsed" by an XML processor and the results are sent to the
application that requested it. The XML processor uses the DTD to "recognize"
elements and browse through the XML document. To format the output from an XML processor,
the application can use an Extensible Stylesheet Language (XSL). This is a W3C
specification similar to Cascading Style Sheets (CSS) for HTML. Using different XSL sheets
for the same XML lets the data to be formatted in many different ways. This is a very
powerful way of sharing data.
The future of XML
Creating a standard way to share data all over the world is
one of the driving forces of the Net. Already there are many different applications of the
XML meta language. This includes MathML (for math related documents), CML (for chemistry
related documents), VML (for vector drawings), SMIL (Synchronized Multimedia Integration
Language) and more.
To understand how XML works, we’ll do a bit of coding
next month. We’ll create a valid XML document, a DTD, an XSL sheet and a JavaScript
program to display the data. Till then browse over to the W3C page on XML and see what the
excitement is all about.