The Importance of Open Data
The importance of open data has been highlighted over the past few days with the Government’s public release of the Coins database. For those of us working in the public sector I feel there will be a need to move beyond the provision of Web sites designed for humans to the provision of open data for consumption by software, unencumbered by licence conditions which place restrictions on how the date can be used (a point, incidentally, raised yesterday by Tony Hirst in a post on “Time for data.ac.uk? Or a local data.open.ac.uk?“).
This post describes how the data contained in Web sites for UKOLN’s IWMW events has been made available in RSS format, thus allowing reuse by others. The challenges in understanding the meaning of the data (which may have subtly changed over the years) and the quality of the data (which was not initially provided for use by others) is described, with conclusions provided for best practices in this area.
Web Sites for IWMW Events
The Web sites for all of UKOLN’s IWMW events since the event was launched in 1997 are still available in their original format. in 1997, 1998 and 1999 the Web site consisted of a couple of pages giving details about the events and providing links to the workshop materials. In 2000 we launched a more comprehensive event Web site which provided comprehensive details of the timetable, speakers’ biographies, abstracts of the talks and workshop sessions, information about the social; events, latest news, etc. We have continued with this approach ever since, although the look-and-feel of the Web site has changed a couple of times.
Web-Accessible Data for IWMW Events
This information was initially provided as HTML pages. However a few years ago we decided to make the key informational resources – speakers’ biographies, session abstracts as well as the latest news – available in RSS format. As described in a post on “RSS Feeds For Structured Information About Events” this enabled this data to be used by other applications, such as location maps of the speakers or Wordle displays of the session abstracts.
We now provide access to comprehensive data (as opposed to Web sites) related to all 14 of the IWMW events which are available in RSS formats. These files are listed below.
|Data||Links to RSS Feeds|
|Plenary Speakers|| –  –  –  –  –  –  –  –  –  –  –  –  –  – [All years]|
|Facilitators|| –  –  –  –  –  –  –  –  –  –  –  –  –  – [All years]|
|Plenary Sessions|| –  –  –  –  –  –  –  –  –  –  –  –  –  – [All years]|
Note that the RSS feed for the abstracts of the plenary talks over all 14 years has been created using a Yahoo Pipe which aggregated the RSS feeds for the individual years. Yahoo Pipes are also available for aggregating the information for the abstracts of the workshop sessions and the biographical details for the plenary speakers and workshop facilitators are also available.
What’s The Data About?
Releasing data is one thing; understanding the data is another. One response to the Guardian article on the Government’s release of the Coins database commented that “This data is pretty much unintelligible to anyone outside of Treasury“. There are dangers that even simple data, such as the data files described above, may not be what they appear, the data may be inaccurate or links from the data may be broken.
I feel there is a need to provide a statement about the data quality when making data available for reuse by others. In order to attempt to identify what might be included in such a statement I am summarising my knowledge regarding the data described above.
Data files: This statement relates to the RSS files providing information about the speakers and workshop facilitators and the abstracts of plenary talks, workshop sessions and other events at UKOLN’s IWMW events from 1997-2010 together with location data for the events.
File formats: The data is provided in a mixture of RSS 1.0 and RSS 2.0 formats.
Workflow: The data has been manually migrated from the initial HTML formats.
Description of the files: The files for the plenary speakers and sessions should contain information about speakers who gave plenary talks to all participants at the event and abstracts of the plenary sessions. This may also include panelists for panel sessions which were provided in plenary sessions. The files for the workshop facilitators and sessions should contain information about those involved in hosting parallel workshop sessions and the abstracts for the workshop sessions.
Data elements: The files will normally contain a title (speakers or facilitators name, sometimes with the year, and the name of the talk or session, sometimes with a session code); a URL to further information about the speakers, facilitator or session, where this information is readily available; biographical details of the speaker, facilitator or session abstract, based on information provided by the speaker or facilitator and the data, and sometimes the time, the session was given.
Known limitations: Full information about the speakers and facilitators may not be available (e.g. where people had multiple roles at the event there may only be a single entry provided). There may have been errors in the original HTML resource or changes to scheduled timetable (e.g. due to last minute cancellations or changes in the running order). There may also have been errors introduced in migration of the data to RSS format. Links from the RSS files may also contain errors,
Risks: In light of the possible limitations of the data care should e taken in exploiting this data.
Changes: The information provided in the RSS files may change as errors are fixed. When errors are fixed we will seek to regenerate a new RSS feed providing information which covers all years.
We hope this summary of the limitations of the data files prove useful to anyone who wishes to make use of the data.
What About Linked Data?
Although we are aware of limitation in the quality of the data perhaps the biggest barrier to reuse of the data relates to the very limited links to other related information, Information about speakers and their host institution, for example, is provided as text strings and not links so we (currently) don’t provide links which allow the data to be easily integrated with other data stores. We are currently exploring ways in which we can migrate from an open structured data to open linked data. If and when such data becomes available we will provide a summary of the approaches used in the data migration and explore ways in which such data can be used.