Mar 26 2003

What's the deal with XML and Content Management?

It seems that XML is in the process of taking over the technology world and content management is no different. While most everyone else is interested in the exchange of data via XML, CM is focused on the separation of content from presentation and access to content structure.

Let’s start with yet another description of XML. I consider XML to be the latest incarnation of self describing data. In a conventional data file, the number 516.1 might appear by itself. In a self describing data file, that same number might be accompanied by descriptors that qualify it as being a checking account balance in US Dollars. Self describing data has been around for a while (see More Programming Pearls by Jon Bentley, 1988), but XML provides a standard methodology for describing, writing, transforming, and reading the data.

Separation of Data from Presentation: This is one of the mantras of XML. Separation allows the presentation to be customized for multiple purposes. While very effective for XML data, this tends to be less effective for XML prose. Because data is generally generated by a computer while prose is generally authored by a person, XML data tends to have a finer granularity with more precisely defined elements. Consequently, presentation rules generally have greater flexibility in handling XML data than XML prose.

Let’s consider the presentation of a table. It is likely that the underlying data is contained in the XML data file, and that the presenation rules are responsible for determining what data to present and how to present it. However, a prose table is probably specified as a table and the presentation rules have no flexibility.

Exposing the Content Structure: Content Management Systems treat most content as an atomic type. No attempt is made to open the content up and treat the pieces independently, because there generally isn’t enough information to treat the pieces intelligently. XML changes this by exposing the content structure, making it possible to make intelligent use of the pieces.

Only making it possible, because the content must also be structured in a meaningful way. Identifying sections and paragraphs is nice, but doesn’t provide any information on why section two should be treated differently than section four. When structure exposes the introduction, summary, and other meaningful components, then it becomes possible to treat the independent components in an intelligent fashion.

Take the First Step

Dwight Shih's Soap Box on the Internet Commons

What's the deal with XML and Content Management?