Advertisment

Beyond XML: Structured Data

author-image
PCQ Bureau
New Update

Last

month
we saw what eXtensible Markup Language (XML) is and where it can be used. With a

small example of an XML document that laid out the structure for a small video library

collection, I tried to explain how an XML document looks and works. This month, we’ll

actually use this document in an application, which will let us reuse this document in

different ways.

Advertisment

To reiterate a few points, XML defines the structure of data in a document. To put it

in perspective, imagine a DOC file to be equivalent to an HTML file, wherein you see the

formatting and the content of the document. An XML equivalent would be the RTF file of the

same document, which defines the structure rather than the formatting. (This is NOT a true

representation of the difference, it’s only meant to quickly put things in

perspective.) XML has a well-defined syntax, but almost no keywords of its own. This is

because it’s more of a "meta-language" that lets you define structures on

your own.

Document-type declarations

To create a valid XML document, it should have DTD (document-type declaration)

associated with it. The DTD is analogous to declaration of variables in any programming

language before using them. However, DTDs are optional. You can create a well-formed XML

document that follows all the rules of XML syntax. All valid documents are well formed

also.

Advertisment

DTDs have been around for quite some time. HTML also has DTDs. In fact, there are

multiple DTDs for each version of HTML that has been released, called the loose, standard,

and strict editions for HTML 2, 3, 3.2, and 4. Typically, browsers don’t need to use

the DTD as the syntax is hard coded into the HTML engine. What the browsers don’t

know, they ignore.

However, with XML this is not a very clever thing to do. After all, one cannot know in

advance the kind of keywords that XML will have or even the order in which they should

appear. DTDs solve this problem. They define the "entities" that can appear in

the document, their attributes, child elements, and the kind of data each of them can

carry.

Let’s quickly create a small DTD for a video library.

Advertisment





<br> <br></br><br></br> <br><br></br> <br><br></br> <br><br></br> <br><br></br> <br><br></br> <br><br></br> <br><br></br> <br> </br></br></br></br></br></br></br></br></br>

This DTD specifies that the (root) element called "movies" has a child-

element "video". It then specifies that this element has further children

elements called "title", "date", "type", and

"actors" in that order. Video also has a required attribute called

"id" which holds character data. Following this reasoning you can make out that

the "title", "date", and "type" elements all contain parsed

character data, and type also has a required attribute called "category". The

"actors" element further has details of an individual "actor" within.

To attach a DTD to an XML document, use the tag just below the

declaration, like this: "video. dtd">.

Advertisment

That’s all there is to it. Now, any XML compliant application will be able to

understand how to process the document. DTDs are usually much more complex than this. Take

a look at the HTML 4 DTD to understand how comprehensive DTDs can be.

Formatting contents of an XML document

So you’ve got your document laid out in XML. But now what do you do to display it?

If you’ve IE 5 (the first browser to fully support XML, although IE 4 does have some

support) you can simply open the XML file in it. IE 5 has an XML parser that checks that

XML is well formed. That is, it doesn’t contain any syntax errors like missing end

tags, incorrect nesting, etc. If it does find an error, it displays the line, which might

be the cause of the error with the reason. If the document is well formed, it displays the

entire document in a collapsible list. You can expand or collapse items in the tree by

simply clicking the text or the small + or - sign next to each.

Advertisment

But this isn’t what you really want to show to your visitors, is it? IE5''''''''s

rendering is quite Spartan. But with a little ingenuity you can spruce up the document in

any way you please. For this, you’ll need to use a technology called the Extensible

Stylesheet Language (XSL).

XSL is an application of XML. This means that XSL follows all the rules that XML

imposes but has a set of keywords and language constructs as well. XSL is used, in the

same way, with XML as cascading style sheets (CSS) is used with HTML. Both define the

style or presentation attributes of a document (refer our August 1998 issue for Cascading

Style Sheets).

XSL is very flexible. By defining different XSL files for a single XML document, you

can get a variety of different outputs, like different views of the data in HTML, or even

a set of different XML documents.

Now that we’ve seen what XSL is, we’ll use it next month–on an XML

document.

Advertisment