There used to be a long, poorly formulated rant about XML that I wrote and forgot about until I recently received some emails. I have decided to remove the rant for the following reasons:
The XML stack moves so quickly that many criticisms had become dated and new criticisms or improvements had emerged that weren't addressed.
The rant was partially parody but apparently not everybody appreciates the humour. Everyone knows apples are better than oranges!
Its tone was more negative than I intended, detracting from its actually very positive message: XML's complexity is not fundamental to the problems it addresses, instead it is accidental in the solution and often easy to avoid.
The summary is that in my opinion XML is used for many tasks that it is very poorly suited. Maybe we don't need heavy-weight standards for many of these protocols and can instead use an extensible format to achieve portability and flexibility. I think S-expressions are the best extensible material for serial data formats. (Oh, and XML, don't worry, you're still in the hall!)
These quotes from Glenn are so amusing and informative I couldn't bear to remove them:
S-expressions
An S-expression is a light-weight, highly-extensible, non-redundant serial data format that has been more-or-less standardised (though note that they are standard in a different sense than XML is standard) and stable for decades.
The best discussion of one of the most feature-rich S-expression formats I know of is in Guy Steele's Common Lisp the Language, 2nd Edition (CLtL2). S-expressions and the Common Lisp reader/printer are completely specified in chapter 22: Input/Output. While this might be a daunting 100 pages to somebody not familiar with lisp, to those so acquainted it stands out as an island of simplicity and elegance in an ocean of XML turmoil.
This quote from chapter 22, page 509 of CLtL2 mostly covers what S-expressions are:
You have a huge amount of control over the printer and the reader. You can change the numeric base that numbers are read or printed by changing the *read-base* and *print-base* variables. For cases where you don't trust the source of an S-expression you can read it in a "secure mode" by setting *read-eval* to nil. *print-circle* controls how cyclic/shared data structures are serialised. You can customise all sorts of levels of quotation through pretty printing.
Controlling the printer/reader is very powerful but doing so still doesn't really extend S-expressions. It's all right there in CLtL2. Unlike XML, however, S-expressions really can be extended. With S-expressions you can change the meaning of every character, define alternative data representations, and other fun things we will look at shortly. Unlike XML, which has the meaning of most characters permanently decided, lisp lets you extend any character through a construct called a read table. To extend the behaviour of the reader/printer you add functions called read macros to the read table.
And lisp coders stand by their data format and code these read macros in, like everything else, S-expressions.
Extending Your S-expressions
This page is not an introduction to S-expressions or read macros. For that, if you are an experienced lisp programmer I recomend either the CLtL2 link above or Paul Graham's On Lisp. If you have little-to-no lisp experience, this article is a good gentle introduction and a great read.
I plan on making this section a resource of S-expression and read macro gems that show ways you can extend S-expressions into being the kind of data representation format you need. The extensions are slightly polished bits of CL code I have created for my own use. They should all conform to ANSI/CLtL2 Common Lisp. The main focus for this article is on techniques that are very difficult or impossible in XML.
WARNING: If you aren't one of those aforementioned "experienced lisp programmers" this is going to get really hard really quickly. :)
Questions? Comments? New examples of S-expression extensions you'd like added (with full credit, of course)? Send them to me.