A PDF Repository for my Research Publications

In a recent post which explained Why I’m Now Embedding ORCID Metadata in PDFs I described my intentions to ensure that my research papers contains rich embedded metadata to held enhance the discoverability of the publications, ensure that authorship is asserted (by embedding the ORCID ID of the authors of the papers) and ensure that embedded images contain descriptions which help ensure that the content can be understood by visually impaired readers. In addition I wish to ensure that the PDF is stored in PDF/A format which provides a more preservable format.

In light of discussions on the blog and on email I have decided to embed the ORCID IDs for co-authors of my peer-reviewed papers although, as suggested by Geoffery Bilder, I will be embedding the HTTP URI version of the ORCID IDs (e.g. http://orcid.org/0000-0001-5875-8744) rather than just the ORCID ID itself (0000-0001-5875-8744). In addition I will also be embedding the DOI for papers which have been assigned a DOI.

But I am now faced with the problem of where the paper should be hosted. This post summarises the processes I am using in the selection of an appropriate repository service to complement my institutional repository.

Selection Processes

As described previously workflow processes used in the creation of cover sheets for items hosted in our repository means that metadata embedded in PDFs is lost. Although we’re having discussions with repository staff about this, it occurred to me that I now have an ideal opportunity to make use of a third-party repository service.

In the past I have normally deposited papers in my institutional repository and used third-party services (such as ResearchGate and academia.edu) to host the metadata, with links being provided to the full-text of the papers hosted in the institutional repository. The main reason for doing this was to ensure that usage statistics for accesses of the full-text was available in a single location rather than being fragmented across a range of services. There was a need to minimise the effort in collating such statistics for the product of evidence reports of our work which our funders have required in the past. However in light of the recent announcement of the cessation of core-funding for UKOLN, this is no longer a priority! Indeed it is now important to ensure that ideas described in peer-reviewed papers are widely disseminated.

Using ResearchGate

Having recognised the value of hosting PDF copies of my papers on a third-party repository service the question then was which one to select. The key criteria used in the selection were:

  • Easy to upload files.
  • Popular with readers.
  • Resource is easily found using Google.
  • PDF files preserved intact.
  • Service appears to be viable.

Researchgate: University of BathOn 25 December 2012 I received an automated email from ResearchGate which informed me that “28 of your colleagues from University of Bath have joined ResearchGate in the last month“. On 24 January 2013 an automated message announced “44 of your colleagues recently joined ResearchGate“. As illustrated the University of Bath”s entry of ResearchGate shows that there are currently researchers from 26 departments who have uploaded a total of 7,263 publications. It seems ResearchGate is growing in popularity, at least at the University of Bath.

On 20 December 2012 I was notified of the numbers of views of my papers (or, more accurately, the numbers of views of the metadata for my papers): “Your published research was viewed 1,678 times in 2012” so perhaps ResearchGate is popular beyond the University of Bath!

In light of the apparent popularity of the service I decided to upload one of my papers to the service: the PDF copy of the paper on “Developing A Holistic Approach For E-Learning Accessibility“.

It was trivial to upload the paper, especially as the associated metadata had been created previously. I then downloaded the PDF and was able to confirm that the metadata was still embedded in the PDF resource.

The paper can be accessed from ResearchGate and the user interface is shown below. I’ll leave others to judge the usability of the service.

ResearchGate page for CJTL 2004 paper

Page on ResearchGate for one of my papers

But in addition to users who are linked directly to the paper or access resources on the ResearchGate service using the Web site’s browse and search functionality, what of the discoverability of resources using Google.

ResearchGate, Google and Embedded Metadata

The PDF version of the paper now contains content which will not be widely used elsewhere: a combination of the authors’ names and their ORCID ID. A Google search for “Brian Kelly ORCID: 0000-0001-5875-8744“, “Lawrie Phipps ORCID: 0000-0002-0834-273X” or Elaine Swift ORCID: 0000-0002-6101-6861” should initially find information about the paper hosted on the UKOLN Web site, the UK Web Focus blog and other services which may be used by the co-authors, although not the institutional repository as this does not currently provide ORCID information (understandably, as ORCID is so new).

I have therefore provided links to the following Google searches which I will monitor to see when Google has indexed the PDFs hosted on ResearchGate:

Search Term Findings Date
Brian Kelly ORCID: 0000-0001-5875-8744 Large number of hits from UK Web Focus blog
together with ORCID, UKOLN and Slideshare Web sites
27 Jan 2013
Lawrie Phipps ORCID: 0000-0002-0834-273X 5 hits (ORCID and UKOLN Web sites and UK Web Focus blog) 6 Feb 2013
4 hits (ORCID Web site and UK Web Focus blog) 27 Jan 2013
Elaine Swift ORCID: 0000-0002-6101-6861 3 hits (ORCID and UKOLN Web site and UK Web Focus blog) 6 Feb 2013
2 hits (ORCID Web site and UK Web Focus blog) 27 Jan 2013

It appears that over a period of a week the ORCID metadata is being found from citation records hosted on the UKOLN Web site together with the citation records already indexed on the ORCID Web site and this blog, but not yet the PDF files hosted on ResearchGate. Might this be due to Google not indexing the researchgate.net site? In order to answer this question Google was used to provide information on the total number of resources on the service and the total number of PDF files. The results are given below.

Purpose Search Term Nos. of results Date
Total number of resources on researchgate.net site site:researchgate.net 24,100,000 –
55,300,000 *
6 Feb 2013
Total number of PDF files on researchgate.net site site:researchgate.net filetype:pdf 2,980,000 6 Feb 2013

* The numbers of search results have fluctuated from 24,100,000 – 55,300,000 during the last few days.

It seems that a large number of PDF files hosted on Researchgate have been indexed by Google, but it takes longer than a week for new resources to be indexed and the results found using a Google search.

Sustainability of the Service

Numbers of ResearchGate usersWhat Does The Evidence Say?

The home page for the service displays a graphic (to users who are not logged in) of the numbers of the service. It seems that 2.4 million users have subscribed. Since there are likely to be researchers, this does appear to be a significant number.

But what else do we know about the service and the company which provides the service? TechCrunch provides a handful of posts about the company together with the following summary:

ResearchGate is the leading social network for scientists. It offers tools and applications for researchers to interact and collaborate. ResearchGate offers a social, crowdsourced platform designed for researchers. The platform provides a global scientific web-based environment in which scientists can interact, exchange knowledge and collaborate with researchers of different fields.

The results of ResearchGate’s new search engine, called ReFind, are not merely based on keywords, but selected in an intelligent way based on semantic, contextual correlations.

Researchgate: numbers of users in 2012In addition the article also provides a graph showing the numbers of users over the past year, based on figures provided by Compete.

As can be seen, the numbers of unique visitors seem to be growing significantly, from 61,640K in December 2011 to 236,170K in December 2012.

MajesticSEO figures for ResearchgateI also used MajesticSEO to report on the SEO characteristics of the service (note free subscription required in order to view findings). As can be seen there are 7,459 domains which have links to Researchgate.net and a total of 177,945 backlinks. Although such figures need to be regarded with caution (for example, they can be skewed significantly by link spam) the number of links from educational domains (3,241) and the numbers of educational domains (551) may be more appropriate to measure, due to the difficulties in creating educations domains to host link farms. This snapshot may therefore provide a useful baseline for measuring changes in the link popularity in the service.

Terms and Conditions

It should be noted that looking at the ResearchGate terms and conditions I found no suggestions that the company claims rights to sell my data or my attention data to others (although I haven’t studied the terms and conditions in great detail). Although some may welcome this, others may wonder what the business model for the company is. An article entitled ResearchGate Wants To Be Facebook For Scientists published by Forbes in March 2012 described how:

ResearchGate will also be looking into ways to monetize its platform. The “no-brainer” way to do that, in Madisch’s words, is to provide job boards for scientists looking for jobs. Universities and companies would pay the site to place listings. The company is also looking for ways to partner with other companies that manufacture and sell biotech lab equipment, as well as several other different programs.”

 Perhaps this is an appropriate business model which will accepted by researchers who normally shy away from free services on the grounds that “If You’re Not Paying for It; You’re the Product“.

Interest in UK HE Sector

Although ResearchGate seems to be growing in popularity globally (and in the University of Bath) is there any evidence of interest with the UK’s higher education community? For me this is not necessarily a significant issue (it can be fine to be an early adopter) but it would be interesting to see what others in my community are saying about the service.

Using a Google search for “researchgate terms and conditions I found that the DCC have provided a summary of ResearchGate in its list of resources of digital curators with a similar resource being provided by the University of Edinburgh’s College of Humanities and Social Science. A Google search for “researchgate UK finds a number of additional resources from the sector including pages provided by the University of Leeds (PDF format), the University of Leicester, the University of Liverpool (PDF format) and the University of Gloucester together with blog posts at the University of Loughborough and the University of Warwick.

My Decision

In light of these figures and my experiences in using the service I am happy to use the service to provide additional exposure to my research papers which complements the master copy of papers which are hosted on my institutional repository. Are other researchers making similar decisions or are alternative services felt to provide better options?

View Twitter conversation from: [Topsy] | View Twitter statistics from: [TweetReach] – [Bit.ly]