The Elegance of XML

As those who follow my Technology archive know, Extensible Markup Language (XML) fascinates me. Finally, there is a standard and non-proprietary way for transmitting and receiving data of all kinds. These data, even in this trivial example may appear bloated. After all, every single piece of data must be wrapped in a tag that describes its meaning. This drawback though is also its biggest virtue. It makes XML both unambiguous and easily decipherable. The human eye can infer the meaning of the data in an XML document as readily as a computer. (The computer of course can process it much faster.)

Because XML is an open standard, there is no possibility of vendor lock in. Using programs called parsers offered by many vendors (often for free) XML files are easily sliced and diced by computers, thus allowing data to be transformed from one use into other use very easily.

If there is a true dark side to XML, it is that to share data widely, it really helps to get all the key players who understand a problem domain together to agree on how to describe their information in XML. For example, I work with hydrologists. However, my agency is not the only one out there that needs to describe the characteristics of water. So although we are engaging in our own efforts to describe surface and ground water in XML, we really need to have conversations with everyone is this arena. I hope that in time all parties will agree on a common set of schemas that will properly describe water’s characteristics. This will greatly enhance the ease on which data can be shared.

Since I am in the business of serving water information, I would very much like to liberate the data on our web site. It is currently largely embedded inside the HTML that makes up web pages. This is great if you are a human being looking at the information with your browser. However, it sucks if you need to grab the data and process it. I described some of my work in other entries. As we move away from looking at XML in the abstract to using it in real life, a few things strike me.

First is that XML can be collected in either complex or simple ways. Each has its virtues and both should be supported. The complex way of sending or receiving XML data is using a protocol called SOAP. (Ironically, SOAP stands for Simple Object Access Protocol. Prior to SOAP even more complex technologies like CORBA were needed.) SOAP has a few important virtues that can be summed up as follows: it supports transactions and error messages. You would probably want a confirmation that if you moved $1000 from savings to checking that the $1000 did not disappear. The SOAP protocol can effective say “Yep, moved exactly $1000.” In many cases when doing commerce over the internet, this kind of information is crucial.

However, it turns out that more often “quick and dirty” data dumps in XML are quite acceptable. For example, if you sent an XML request to to return the current temperature for your town, you probably do not care too much if you do not get a timely response. If you do not get a response in, say, ten seconds you figure, “well, their server is down or slow”. If you are a programmer, you can write some simple logic to work around these situations. This is analogous to sending a casual letter to a friend or sending a certified letter. Not all data moving as XML needs to be wrapped numerous times to ensure safe shipment and delivery. You may not need to deal with all that overhead. As a response to these many simpler needs, web services for serving XML have actually devolved a bit. Protocols like REST make XML queries and responses quite simple.

Where I work we have been putting these XML transport protocols through their paces. Since we are a government agency, we look at other agencies as examples. “Lightweight” XML seems to be the direction most federal agencies are going. (NOAA has some excellent examples.) However, upon further investigation we discovered that some lightweight protocols could actually be heavyweight too.

Impossible you say? Not at all. What is happening is that the complexity of dealing and understanding XML does not have to rest with the data provider. The data consumer can also deal with it. The light bulb went off in my brain when we started experimenting with Keyhole Markup Language, or KML. KML is of course an instance of an XML schema definition. It is used by the popular mapping application Google Earth. It is trivial to annotate a point or an area on the earth in KML. Once you have done so, you can simply make the KML file available for download. Google Earth will read it and suddenly these new sites of interest will appear in Google Earth. If done right they will appear and disappear depending on the scale that you choose. What is handling this complexity? Google Earth is. Google Earth is really a very complex but excellent client computer program optimized for showing places on the earth that are described in simple KML.

It is not just Google Earth. Complex broad XML-consuming applications are popping up more frequently these days. To use another example many of you may be familiar with – a newsreader program. Yahoo! News, for example, makes its news content available in the now ubiquitous RSS (Really Simple Syndication) XML format. Your newsreader may be another web site like Bloglines, a program you download, or even (my favorite) an extension to your browser. For Firefox bigots like myself, the free Sage newsreader is a great way to grab these RSS news sources, aggregate them and present what to me feels a customized newspaper from sources that I care about. Again, though the RSS files provided by the host are very simple to create, it is up to a complex client computer program (the newsreader) to organize and present it. This pushes dealing with the complexity of the information to the client application.

Instances of XML schema definitions like KML and RSS demonstrate that lightweight protocols can do some very heavy lifting indeed. In my office, I have a programmer experimenting with making some of our water information available as a news feed. It could tell someone’s newsreader, for example, the current water height for their favorite local stream. The RSS news feed format, since it represents news, assumed that the key data in a RSS feed will be read by a human being. Therefore, an RSS news item might have a title like “Latest Station X Gauge Reading” and content something like “Station X on River Y reported a gauge height of Z at Time T”.

Could this technology that is used to facilitate human access to news sources also be used to support machine to machine data communications? The answer, surprisingly, is yes. Because as I dug into the details of the RSS 1.0 specification I realized that it also supports modules. While of course text like “Station X on River Y reported a gauge height of Z at Time T” could be parsed by a client program to pull out the essential information (Z and T), then store it and serve it locally, with a module this extra work is not necessary. Through the use of XML namespaces, the same RSS 1.0 news feed could incorporate a module that might include relevant tags like <hml:stage units=”feet”>4.5</stage>. Here is a way to assert a gauge height reading while piggybacking on a generic RSS news feed. By doing so, the news is still served by the newsreader, but the feed contains more discrete information for those who need it.

This may seem like nothing. I think this it is revolutionary stuff. A news feed is no longer just a news feed, but a news feed that could morph into multiple other uses. RSS is not just data on steroids. It is not just data married with its metadata. Using modules to piggyback discrete data to vanilla RSS, XML becomes even more powerful. It is like transforming a voice line into a DSL line. It becomes far greater than the sum of its simpler parts.

Observation: XML is both data and software. Yet it is not software in the conventional sense. XML is really a marriage of data (the values), information (the tags) and logic (the associated schema definition). I do not believe this has been done before in one taxonomy. In some sense then XML itself is also truly revolutionary. It is a completely new software paradigm, the implications of which we are just beginning to understand.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: