The JISC-funded Linking You Project

The Linking You project was provided by the University of Lincoln and funded by the JISC under the Infrastructure for Education and Research Programme. Its aim was to look at and make recommendations for improving the way that identifiers for domains are planned and managed in higher education institutions. The background to this work was described on the project web site:

The web is now fundamental to the activity and idea of the university. This Toolkit provides a standard way of thinking about your institutional URI structure, making it easier for people (and their browsers) to both remember your web addresses and locate where they are in your web site. It also helps prepare your institution for the world of linked data by proposing a clear and concise model for your data, making smooth integration with other systems easier and faster. A good URI structure can be easily understood by both humans and machines.

Although one of the benefits which implementation of the report’s recommendation was to “Improve discoverability of resources (and SEO)”, the Linking You project focussed primarily on identifiers for resources hosted within an institutional domain. This post aims to complement the Linking You work by gathering evidence on additional aspects: gaining an understanding of the size of institutional web sites, measuring the numbers of links to the institutional home page and other resources hosted within the domains and variants of the URI for the most important page on a web site – the institutional home page.

About Blekko

A few weeks ago James Burke (@deburca) introduced me to Blekko: a “search engine that slashes out spam, content farms, and malware. We do this by having a smaller crawl of 3 billion pages that focuses on quality websites. We also have a tool called a slashtag that organizes websites around specific topics and improves search results for those topics.” In response to the question “What information is available on the SEO pages?” the site describes how:

blekko doesn’t believe in keeping secrets. As part of our effort to make search more transparent, we provide a view of the data that our crawler gathers as it crawls the web.”

Every blekko search result has data associated with it that you can see. You can access it by either clicking on the SEO link tool in the second line of each result or else searching with the /seo slashtag. For example, /seo

Further information about Blekko (although it spells its name as ‘blekko’ on its web site I’ll use ‘Blekko’ in this post) is available on Wikipedia.

Using Blekko to Analyse Russell Group University Web Sites

What might Blekko tell us about UK university web sites? Blekko’s SEO pages provide details of the following information: geographic link distribution by state and country; inbound links; duplicated content; page source; sections and site pages. Blekko was used to survey the 20 Russell Group university home pages. The survey was carried out on 2 January 2012. However on 24 January it was noticed that the host rank, numbers of site pages and numbers of inbound links had changed significantly from 702.3 to 205.4, 945 to 8,406 and 24,442 to 627 respectively. The findings were rechecked but no other significant changes were noted.

The results are given in the following table. Note that you need to be logged in to the service in order to view the results.

Ref. No. Institution Blekko Analysis  Host
 Site   Pages   Inbound Links
Inbound Links
Outbound Links Notes
1 University of Birmingham [Analysis]  205.4   8,406 36,082 from
3,608 domains
627  {0 links} There are 13 outbound links (11 unique) from
2 University of Bristol [Analysis]  812.1  21,018 73,016 from
5,705 domains
40,101  5 links
3 University of Cambridge [Analysis]   1,042.7   16,041 309,734 from 10,145 domains 337,091  8 links
(7 unique)
4 Cardiff University [Analysis]         816.5   17,213 75,635 from
5,638 domains
59,590 5 links There are 29 links (26 unique) from
5 University of Edinburgh [Analysis]      991.5   11,544 160,422 from 6,885 domains 168,545 {0 links} There is 1 outbound link from
6 University of Glasgow [Analysis] 1,090.5   12,243 100,271 from 9,454 domains 40,101 5 links
7 Imperial College [Analysis]  476.8   12,984 87,086 from
2,920 domains
34,566  {0 links} There are 3 outbound links from
8 King’s College London [Analysis] 1,105.4   14,263 97,943 from
9,566 domains
26,986  {0 links} The are 11 outbound links (9 unique) from
9 University of Leeds [Analysis]   1,141.8   16,617 134,501 from 10,886 domains 88,520 7 links
(5 unique)
10 University of Liverpool [Analysis]   1,260.3     4,727 59,797 from
9,794 domains
19,082  0 links
11 London School of Economics & Political Science [Analysis] 1,201.1   12,243 122,437 from 10,886 domains 29,795 {0 links} There are 3 outbound links from
12 University of Manchester [Analysis]   694    13,292 186,893 from 5,193 domains 215,887  8 links
(7 unique)
13 Newcastle University [Analysis] 1,125    16,041 75,635 from
9,127 domains
  40,101 4 links
(3 unique)
14 University of Nottingham [Analysis] 1,380.8    16,041 94,551 from 10,759 domains   34,566  16 links
(12 unique)
15 University of Oxford [Analysis] 1,092.4    11,959 309,734 from 12,388 domains 290,563 1 link
16 Queen’s University Belfast [Analysis]   928.4   12,534 59,099 from
6,492 domain
  21,068 4 links
17 University of Sheffield [Analysis]   529.7    13,449 44,578 from  3,524 domains 20,050 13 links 8 outbound links from
are to
18 University of Southampton [Analysis]   1,018.1    12,338 129,845 from 9,127 domains 69,132  9 links 5 outbound links from
are to
19 UCL [Analysis] 1,607.6  507,319 783,542 from 23,638 domains 476,718 9 links
20 University of Warwick [Analysis] 820     9,679 45,638 from
4,106 domains
  16,448 {0 links} 14 links (12 unique) for

Note that in the above table explanatory notes are given for figures displayed in braces e.g. {0}. Also note that the Universities of Newcastle and Nottingham both seem to have 16,041 pages.

A Tale of Two Domains

Whilst carrying out this survey it was noticed when checking inconsistencies that different results were obtained when using variants of the domain name and the institutional entry point. The following table lists known domain name variants. Note that the main domain was taken from the address given on the Russell Group web site.

Institution Main Domain Variant
University of Birmingham (Automatic redirect)
University of Bristol
University of Cambridge Page at provides notice giving official domain name
University of Edinburgh
University of Glasgow,uk (Automatic redirect)
Imperial College
University of Liverpool
University of Manchester (Automatic redirect)
Newcastle University
University of Oxford (Automatic redirect)
University of Southampton

It should be noted that although the table describes the institutional part of the domain which is taken from the Russell Group web site, the analysis is carried out using  In two cases the well-established www. prefix was not used. These were and However for the analyses the www.  prefix was used as it was felt that this would be the form used by the majority of users.


The Blekko web site contains a page which summarises its core principles, which include:

Quality vs Quantity: blekko biases towards quality sites. We do not attempt to gather all of the world’s information. We purposefully bias our index away from sites with low quality content.

Source based, not link based: blekko does NOT rely solely on link based authority.Too many people engage in efforts to game search by linking for purposes other than navigation. blekko relies on human beings and their judgement of the authority of sources to dictate search results.

Open and Transparent: blekko makes freely available to its users all of the data that provide the underpinning of our search results. This includes web data, ranking information and the curation efforts of our users.

Blekko would appear to have a role to play in providing universities (which are unlikely to use ‘black hat’ SEO techniques such as use of link farms) with a better understanding of their visibility to search engines. However, despite the commitment to openness and transparency, the Blekko web site does not appear to provide details of their ranking algorithms.

Despite the current difficulties in interpretting the host rank in the above table, the information is provided as a snapshot, which may prove useful if Blekko subsequently do publish details. Of perhaps greater interest is the site pages column which, it would seem, contains the number of pages which have been indexed.

There does appear to be a significant diversity in the size of the Web sites, ranging from 4,727 (for the University of Liverpool) to  507,319 (for UCL), although apart from these two outliers the size of other Russell Group university web sites ranges from  9,679 and then 9,679 to  20,053.  These figures seem to suggest that there may be differing patterns of uses for institutional Web sites, ranging from the small and managed provision of focussed resources through to a more devolved approach. However although the managed approach would appear to have benefits, it does lead to the question as to where resources and services which are felt to be useful to the individual researcher or academic or their department should be hosted, and whether policies which acts as barriers to the creation of resources on an institutional service will result in content being hosted on cloud services.

Further interpretation of these findings will probably require an understanding of the institutional web environment.  However one aspect of the survey which does not require an understanding of the local context is the numbers of links from external services to the institutional web site. Links from authoritative web sites to a web site can influence the discoverability of the resources. A more detailed survey of such links will be published shortly.

Paradata: As described in a post on Paradata for Online Surveys it is important to document the tools and methodologies used when gathering evidence based on use of online tools in order that findings can are reproducible. In addition possible limitations in the tools or the way in which the tools are used should be documented.

This survey was carried out using the Blekko web-based service over the first two weeks in January 2012 and the findings rechecked on 25 January 2012 and changes recorded. Links are provided to the  results provided by the service. However in order to view the findings you will need to be signed in to the service (the service is free to join).

The findings for the University of Birmingham had changed significantly over a period of three weeks. It is not clear whether the variation was due to changes in the University of Birmingham web site, an artefact of the multiple domain names and entry point URLs for the University of Birmingham home page ( and all resolve to the same page) or changes at Blekko (e.g. reindexing the web site).

Note that the form of the domain name given on the Russell Group University Web site has been used. This is normally based on the full name, with the exceptions of Cambridge (which uses “”), Edinburgh (“”) and Glasgow (“”).

The results for the host rank are based on an undocumented algorithm. The information on the size of the site is dependent on the number of pages which are harvested. prefix was used as it was felt that this would be the form used by the majority of users.