XML Isn’t Enough

A lot of this is in my XML Server presentation at the Innovative Users Group conference in a couple weeks…

Jenny Levine is an outspoken advocate for the use of RSS in libraries. One example she cites is posting lists of new acquisitions to library websites. She estimates that folks in the 77 libraries of her library system spend 924 hours per year on that one activity, time that could be used elsewhere if automated by RSS. So it’s easy to see why I wanted to mention her in my presentation.

The problem is that even though RSS is XML, XML isn’t always as useful as RSS. Quickly, what is RSS? RSS is an XML schema for content syndication that has broad client and server software support.

XML, however has no schema. It’s a standard, but so is ASCII. As a practical matter, ASCII data must delimited in some way to be useful as a form for information exchange between computers. XML without a schema is like ASCII without those pesky delimiters. Here it is from Wikipedia:

XML provides a surface syntax for structured documents, but imposes no semantic constraints on the meaning of these documents.

Data exchange requires all the standards of the networks the data travels on, plus standard ways of reading and understanding the information once received. XML provides a standard and flexible way of delimiting the data, but relies on schemas or DTDs layered on top of it to make the data it contains meaningful.

When done right, however, XML-formatted data are the basic building blocks for the Semantic Web, “a project that intends to create a universal medium for information exchange by giving meaning (semantics) in a manner understandable by machines, to the content of documents on the Web.”

XML schema standards are still new, and Wikipedia admits that the specification is “difficult to understand and implement,” but a few standard schemas are emerging. One useful standard for libraries is MARC XML (example). XML representations of Dublin Core are also standardizing.

Amazon deployed their XML Web Services some time ago and in doing so created one of the first standards for exchanging catalog information. Fortunately, their documentation is good and the schema is well designed. They’ve made it easy to build storefronts that push their content, and more than few people are doing just that.

Amazon is also pushing OpenSearch, an RSS-like XML schema that aims to create standards for metasearch. RSS has shown us how useful news aggregators can be, Amazon is making their A9 search engine into a sort of “search aggregator.” (There are too many terms for this: metasearch, federated search, broadcast searching; but now I’d like to add “aggregated search”).

Try it out now, you can search the Seattle Public Library, then click a button to bring up Wikipedia and the web right next to it.

So what do these other XML schemas offer that RSS doesn’t? RSS’s simplicity is also a barrier to more complex uses. It’s easy to embed book covers, titles, and descriptions of new books along with a link in an RSS feed, but on it’s own RSS is at a loss to express these bibliographic details in a way another computer can understand. (Yes, RDF solves some of these problems at the cost of the simplicity that made RSS so popular in the first place). This same list of new books in MARC XML would have richer detail, perhaps allowing the user to re-sort the list by author, publisher, call number, or any other field in the MARC XML data. Computers need to be told “this is the publisher, this is the call number,” and MARC XML allows that.

All of these technologies depend on XML and a schema. And they’re all changing the way we consume and interact with information.

[update:] it looks like Richard Wallis over at Talis was thinking the same thing I was thinking in response to Jenny Levine’s post on OPACs and XML.