Thoughts from the office by Ed Ball
Monday, February 23, 2004

I enjoyed reading Effective XML: 50 Specific Ways to Improve Your XML, by Elliotte Rusty Harold. Fortunately, I have not been engaging in particularly ineffective XML, but I haven't found practical advise like this anywhere else. Even the introduction was useful, as it clarified often misused terms like “element” and “tag”.

Part 1, on Syntax, encourages us to include an XML declaration and use ASCII for tags, neither of which I can disagree with. I was surprised that he didn't insist that I include the “standalone” attribute in my XML declarations; I seldom create documents that aren't standalone, and feel guilty that I don't include that attribute... His advise to “Stay with XML 1.0” was good to hear, particularly since XML 1.1 was just released; his arguments seem valid to me. I tend to stay away from DTDs – I suppose I just haven't felt that the added complexity would be worth the trouble – so the related items weren't as important to me. I like the concept of using standard entity references (e.g. Ě) instead of character entities (e.g. ě), but the subsequent requirement of dealing with DTDs is too much for me. Also, though I've preferred hyphen-delimited element and attribute names in the past – the first XML-related specifications let me down that path – I'm coming around to appreciate the use of “camel case” for names, as they probably are more readable.

Part 2, on Structure, gives good advice on how to decide between attributes and elements, whether a date should be encoded as a single string (“2004-02-23”) or three elements (“<year>2004</year>...”), and even the proper use of processing instructions. He reminds us that mixed content is still important, and suggests using XHTML as the standard for rich text (or “narrative content”). He almost contradicts the advice about character entities in part 1 that I disagreed with, for the purpose of interoperability with parsers that don't read the DTD. I was educated on the finer points of URIs, URNs, and URLs, as used by namespaces. He gives a little help on picking a schema language, and clearly isn't fond of the W3C XML Schema Language. There are also a few guidelines that seemed obvious to me, but must be in response to common abuses of XML.

Part 3, on Semantics, starts with a great summary of the many XML-related technologies that have become available, in each case giving an unapologetic opinion as to the general usefulness of that technology. Needless to say, I'm now less concerned about the fact that I haven't yet learned XLink, XPointer, etc. He touches on the joy of XPath; I certainly can't imagine using XML without it, and the “non-portable” nature of the SelectNodes methods of Microsoft's XML parsers has been worth the productivity gains. He touches on the difference between “push” and “pull” XML parsers; unless performance demands it, I'd certainly recommend sticking with “pull” – the “push” style of SAX simply takes too much development effort in many cases. The book encourages me to validate documents, a lesson that I learned long ago with SGML, but have yet to bring into the world of XML, probably because DTDs just don't seem like a good fit for XML with namespaces, and no clear winner has yet emerged from the battle of the schemas...

Part 4, on Implementation, made me curious to learn more about “native XML databases,” and wisely suggests that XML has not replaced relational databases. Beyond that, part 4 didn't really hold my interest much, probably because my focus is on Microsoft technologies, and I'm basically locked in to using their tools and techniques. For the sake of the first three parts, Effective XML is definitely worth reading, and I recommend it to anyone developing with XML.

2/23/2004 1:55:33 PM (Pacific Standard Time, UTC-08:00) | Comments [2] | Books#
3/2/2004 9:51:00 AM (Pacific Standard Time, UTC-08:00)
Re: standalone XML documents
As noted in Item 34:
"Unfortunately, the standalone attribute can be set to yes only when the document contains no white space in element content. (See Item 10.) Since most documents do use such ignorable white space, the standalone option is not always available."

"Ignorable white space" is white space in element content, when the element declaration in the DTD specifies that that element can contain only child elements and not PCDATA. In that case, the white space is really just there for formatting and is not significant as part of mixed content. A standalone XML document would have to have all that non-essential whitespace removed.
Bradley Grainger
3/2/2004 10:29:01 AM (Pacific Standard Time, UTC-08:00)
I suppose I was thinking that a document would always be "standalone" if it had no DTD, but according to the XML spec: "If there are no external markup declarations, the standalone document declaration has no meaning." So, if there's no DTD, "standalone" has no meaning, so it's appropriate for me to omit that "attribute" from the XML declaration. I feel better already.
Ed Ball
Name
E-mail
Home page

Comment (HTML not allowed)  

Enter the code shown (prevents robots):

Search
Archive
Links
Categories
Administration
Blogroll