Why Not Use a Richer DTD?

My recent post on EPub Format For Papers in Repositories generated some interesting discussion. In particular I was interested in Peter Sefton’s response to Stian Haklev’s suggestion that:

… instead of specifying the exact margins and fonts to be used, why not give us a DTD? Or some other form of easy authoring a structured document? This would make it much more future-proof, and also enable creation of different versions ….

I’m still not sure about what format and production process that would be the best. The NIH DTDs for academic publishing seem very robust and future-proof, but there would have to be an easy way to generate the content, with stylesheets or macros for Word/OOffice etc.

The advantages of a more structured authoring environment seem to be self-evident. However Pete Sefton is unconvinced, not of the merits of the benefits which this approach could provide but whether such an approach is achievable. As Peter reminds us:

The ETD movement is littered with attempts to use DTDs and coerce people into using structured authoring tools like XML editors. As far as I know none of these have been successful, and what happens is they end up falling back on word processor input

Experiences at the University of Southern Queensland

In his comment Peter linked to a post he published recently entitled “ICE to DocBook? Yes, but I wouldn’t bother“. On the post Peter summarised the benefuts of the DocBook standard, quoting the Wikipedia article which describes how:

DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software but it can be used for any other sort of documentation.

As a semantic language, DocBook enables its users to create document content in a presentation-neutral form that captures the logical structure of the content; that content can then be published in a variety of formats, including HTML, XHTML, EPUB, PDF, man pages and HTML Help, without requiring users to make any changes to the source.

As Peter pointed out this “sounds like a good idea for documents – getting all those formats for free“.  But in reality “but you have to take into account the cost of creating the documents, inducing the authors to capture the semantics, and providing tools for authors that they will actually use“. Peter described how this has filed to happen: “when USQ (University of Southern Queensland) tried to get academics to climb a steep hill with the GOOD system, they simply wouldn’t do it“.

I agree with Pete’s concerns – and even getting users to make use of MS Word in a more structured way can be difficult.

Users Can Be A Barrier

It strikes me that the users can be a barrier to the effective deployment of more interoperable and richer services in general whether this is, as in this case, use of more structured content creation environments or, as I suggested in a recent post on “Why Skype has Conquered the World” and, some time ago, in a post on “Why Did SMIL and SVG Fail?“, the deployment of open standards.

I had previously suggested some reasons for the failures of such laudable approaches to take off which included (a) over-complex solutions and (b) lack of engagement from vendors.  However it now seems to be that a barrier which may be overlooked is a lack of interest from the end user community. I can recall having discussions about the likely take-up of emerging open standards in which the dangers that users might be happy with existing solutions were dismissed with the argument that ‘open standards provide interoperability and that’s what users want’.

There is a need to factor in user inertia into development plans, even when such plans are based on what appear to be clear benefits.