EPub as a Format for Use in Institutional Repositories?

In a post entitled “File Formats For Papers In Your Institutional Repository” I suggested that depositing a HTML version of a paper might have various advantages over the PDF format which is the norm. But in light of the growing importance of mobile devices wouldn’t it seem appropriate to make such papers available in the EPub format?

EPub is described in Wikipedia as “a free and open e-book standard by the International Digital Publishing Forum (IDPF)“. The article goes on to add that “EPUB is designed for reflowable content, meaning that the text display can be optimized for the particular display device used by the reader of the EPUB-formatted book. The format is meant to function as a single format that publishers and conversion houses can use in-house, as well as for distribution and sale.

In terms of the open standards used EPub consists of three specifications:

  • Open Publication Structure (OPS) 2.0, contains the formatting of its content.
  • Open Packaging Format (OPF) 2.0, describes the structure of the .epub file in XML.
  • OEBPS Container Format (OCF) 1.0, collects all files as a ZIP archive.

The articles states that “EPUB internally uses XHTML or DTBook (an XML standard provided by the DAISY Consortium) to represent the text and structure of the content document and a subset of CSS to provide layout and formatting. XML is used to create the document manifest, table of contents, and EPUB metadata. Finally, the files are bundled in a zip file as a packaging format.

Using the EPub Format

Paper in EPub format, showing imagePaper in EPub format showing page-turningThis sounds interesting so I converted the HTML version of my recent paper on “Empowering users and their institutions: A risks and opportunities framework for exploiting the potential of the social web” into EPub format and added it to my library of ebooks on my iPod Touch using the Stanza application.

The accompanying images show how the paper is displayed. The first image illustrates the page turning style of navigation provided using EPub and the second image illustrates an embedded image.

The paper is also available from Opus, the University of Bath’s institutional repository service. I should mention that the URL for the EPub file is http://opus.bath.ac.uk/17484/5/i4.epub. I discovered that entering the URL into a browser on my iPod Touch allowed me to view the document in the Stanza application. On a normal PC users will probably not have a viewer set up to render this format, which may cause some confusion.

As might be expected for a format which uses XHTML the conversion from the XHTML original was a simple operation. I should add that I also experimented with converting a PDF version of the paper to EPub but this resulted in various problems due, I think, to the way in which the two-columns used in the paper were linearised.

Revisiting the Issue of Formats for Use in Repositories

This initial experiment seemed to show that creating an EPub version of a paper in a repository can be done quite easily. However the ease of doing this may have been due to the availability of a HTML version of a paper; doing this on a large-scale may be time-consuming if HTML formats of papers are not available.

Let’s revisit the question of what formats for papers should we be seeking to deposit in institutional repositories?

From a preservation perspective the advice from archivists tends to be that you should preserve the original master copy. In many cases this is likely to be MS Word, although other popular formats will probably include Open Office and LaTeX.

From an interoperability perspective an open standard is preferable. I would suggest that rather than making use of a specific DTD designed for scholarly publishing we should use a well-established and popular existing open format – HTML (in whatever version).

If we wish to maximise the take-up of our repositories whilst minimising the effort in processing the files it seems to me that we should explore ways of creating derivative versions from the master source. So rather than uploading a PDF shouldn’t we be uploading the master file and creating a PDF automatically form this resource? And rather than creating an EPub file, as I have done, shouldn’t the repository software create the EPub file from a HTML version of the file? And whilst I acknowledge that authors may not wish to make their original document (in, say MS Word or Open Office format) available to others and would regard the interoperability aspects of PDF as a feature rather than a flaw there should be nothing to stop the master file being stored in the repository but not openly accessible.

Is anyone thinking along these lines?

Twitter conversation from Topsy: [View]