Validators Don’t Always Work


A standard of much interest to us at UKOLN is RSS. We came across RSS in its very early days: I gave a workshop session on Automated News Feeds at the national Institutional Web Management Workshop back in June 2001 and Andy Powell, a former colleague, included RSS is the JISC Information Environment technical architecture.


I recently discovered that UKOLN RSS feed did not validate, according to the Feed validation service hosted at the W3C. The error appeared to be with the <taxo> modul, but a colleague was convinced that the feed was fine and the problem was with the RSS validator. I was sceptical (surely an open source validation service, hosted at W3C, can’t have a bug in such a fundamental area) and raised this issue on the web-support JISCMail list. Sebastian Rahtz pointed out errors in the examples given in the RSS specification, which made me wonder whether the specification itself was flawed. When I found out that our news feed was created by the RSS::XML module, I wondered if the error could possibly be in this module.


I raised this issue on the W3C’s QA list, asking whether the problem was with (a) our RSS feed; (b) the RSS specification; (c) the application used to generated the feed or (d) the RSS validator. I received a prompt response from Olivier Thereaux (first thing the following morning) which confirmed that our feed was fine; that there were errors in the RSS specification (in particular in an example included in the spec) but that the fundamental error was due to a bug in the validator. This was reported to Sam Ruby, the developer of the validator who, a few hours later, implemented a patch and released this on the main Feed Validator site.


I was very impressed with the speed with which this problem was addressed and a solution deployed. Many thanks to Olivier and Sam for this.

I was, though, also very shocked that a validator for such a widely deployed standard (RSS 1.0) had such bugs (I bet a colleague a pint, later raised to a gallon, that the validator was fine – luckily he didn’t take me up on this!). I had assumed that:

  • The development process would have spotted this bug (through use of test cases, code walk-throughs, schema validation, etc.)
  • The development community would have spotted bugs in an open source applications, through the ‘many eyes make all bugs shallow’ principle.
  • The W3C QA processes would have detected this problem prior to the installation of the service on the W3C Web site.

A colleague pointed out that software developers (which I am not) tend not to have so much faith in validators, and many important and widely deployed applications have bugs.

I am not the only person to have concerns over the lack or resources allocated to this important area: Bjoern Hoehrmann left the W3C QA in July 2006, sending a message to the public-qa-dev list giving his reasons for leaving the group.

Where, then, does this leave me? How can I advise others of the importance of validation and of systematic QA processes if such processes don’t seem to be in place with the W3C? Should I stop writing and giving talks on this (I suspect people’s eyes do glaze over when they hear me harping on about this issue).

But on the other hand, if digital library development programmes are being funded on the assumption that the data and formats are ‘clean’ aren’t services going to break, if this isn’t the case?

And perhaps I’m being over-dramatic over this one incident – the problem may have been an obscure one and at least the bug detected a false negative (it reported that a valid RSS file was invalid) rather than a false positive. And, as I said, the bug was fixed very speedily. So maybe I should continue to promote the importance of compliance with standards – but the wider development community should help to validate the validators. And for formats owned (or, as in the case of RSS 1.0, closely affiliated with) W3C, the W3C QA Interest Group has demonstrated that concerns don’t disappear down a black hole.

Technorati tags: validators validation


  1. “The development community would have spotted bugs in an open source applications, through the ‘many eyes make all bugs shallow’ principle.”

    Haven’t you just done that?

  2. Widely deployed must be a relative term 😉

  3. Hi Phil – yes, you’re right, I have :-) I’m not a software developer, but I am part of the development community – and, incidentally, for several years I received a parcel containing updates of the SuSE Linux distribution as a copy of my Running A WWW Service handboook was included in the SuSE distribution in the mid-1990s. So I guess a can consider myself to be part of the Liunux development community too!

  4. Brian, I understand your surprise – the Feed validator is excellent software and its bugs are are – but I quite disagree with your conclusions: I think finding, and fixing, bugs is a natural and desirable process. In the case you describe here, the QA process worked very nicely.

    I wrote some thoughts on this in Nothing is perfect (and that is why we have QA).

  5. Yes, indeed, Greg Tourte did pretty well in highlighting that bug, especially in the face of your scepticism. In fact I think you probably do owe him that barrel for standing his ground! (I doubt that he would mind you mentioning his name in this context, by the way – though it is very tactful of you to anonymise) :-)

    We had mentioned that the examples in the spec were problematic.

    A really useful lesson from this: as you demonstrated, it is worth carefully considering input from your developers. They really do sometimes read specs :-)

  6. Hi Em – yes, you’re right – I was convinced that an application error was for more likely than an error in a validator. Greg (Tourte) was right – the bug was in the validator. I’m pleased he didn’t take me up on the bet of a pint (which I later raised to a gallon)! BTW I’m still discussing the quality assurance issue (in the context of validators) on W3C’s QA list. One suggestion I might make is that they offer a prize of a gallon of beer for any legitimate validation errors in validators hosted on their Web site. That might incentivise the development community :-)

  7. Widely deployed must be a relative term


Submit a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>