Saturday, 23 August 2014

Open access and librarian detectives: the challenges of data management

View David Walters's profile on LinkedIn
I authored and published this editorial in the UKSG eNews on 21st March 2014.
http://www.jisc-collections.ac.uk/UKSG/317/Open-access-and-librarian-detectives/?n=121b8a3f-e721-4bc0-8354-7ad4d47f99de

The ideas expressed about data management have been implemented in an internal database project called KOA (King's Open Access) and harvests data from King's Research Portal, The Directory of Open Access Journals, Cottage Labs' HowOpenIsIt? as well as internal finance and funding request systems for a hollistic approach to open access data management.

I will be presenting these ideas and this project at the UKSG 2015 conference in Glasgow and more updates will follow on this blog.

Image licensed for reuse on google images 23/8/14
http://s0.geograph.org.uk/geophotos/02/73/86/2738640_492a909c.jpg

Current challenges

My team in the library work hard to support researchers in the open access publication of their research. We assist with full text deposit in the institutional repository and administrate the various Gold funding streams received by the college. All our research publication data is held in Pure, our current research information system (CRIS). Some fantastic external data resources have been steadily emerging across the open access publishing landscape - the Directory of Open Access Journals (DOAJ) and Sherpa services to name a couple.

In the development of our service, it quickly became clear the importance in providing effective open access data management in order to support our requirements. The data challenges are to provide high-level analysis for strategy makers and to extract low-level detail for service provision and reporting, in order to meet the needs of stakeholders across the university and various external bodies.

"It is quite a three pipe problem, and I beg that you won't speak to me for fifty minutes."
      (Arthur Conan Doyle, The Red-Headed League)

Researchers are undergoing a transformation in research dissemination and, with a national push for Gold, need expert, unbiased advice and support from their institution. Detailed analysis is needed by service providers in order to tailor appropriate training to researchers in very different disciplines. It also goes without saying that, across different universities, every research community – indeed every researcher - will have different levels of knowledge, support and engagement with this topic.

To guide the development of our services, universities must undertake a degree of detective work to monitor the relationship between researcher outputs and researcher practices of open access. This should be an ongoing investigation providing real-time evidence of changing in-house habits in a fast changing area. In order to be timely and accurate – or even a realistic possibility - we need to maximise the use of existing data sources. Effective data management has the potential to provide many other benefits to the service and, in turn, to its patrons.

"Data! data! data!" he cried impatiently. "I can't make bricks without clay."
     (Arthur Conan Doyle, Adventure of the Copper Beeches)

Our open access services advise authors on a multitude of different open access policies (from different funding bodies) as well as Green open access permissions for numerous repositories and the various Gold publishing options for authors. Data is less accessible to us as a service because there we find that attributes are spread across different resources. This leaves us with a somewhat inefficient and fragmented approach to our workflow and little hope of ongoing data scrutiny or automation of repetitive tasks. This is particularly true when it comes to a high-level analysis of our research output. How do we measure dissemination at a department level in order to identify those who might need support, information or encouragement?

"It is a capital mistake to theorise before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts."
     (Arthur Conan Doyle, Scandal in Bohemia)

With funder demands, the serials crisis and HEFCE’s upcoming announcement, it’s vital for institutions to determine their open access output - measuring success and identifying areas of improvement. Until we ascertain our authors’ open access publishing trends through unified data resources it’s harder to inform our strategy and measure our successes. With enough 'bricks', we can build a supportive, accurate, efficient and consistent service appropriately advising all our subject disciplines on their open access publishing options.

Addressing the challenge

This should be addressed through the existing research management information in the university’s CRIS, which should be enhanced with major open access data sources through data feeds. This ought to include updates from prominent subject repositories and also provide authors with the means to link bibliographic records to permissible documents hosted on personal websites. This would prevent costly timewasters like duplicating data entry, ensure a high quality of data from trustworthy sources and maximise the reporting capabilities of universities.

A completed dataset enables high-level open access analysis across thousands of outputs, compared with non-open access dissemination. Relationships with funding and bibliometrics data are then also available. Green self-archiving allowances from Sherpa and Gold publication information from DOAJ will feed the CRIS journal tables. Article metadata for research outputs will include funding/payment information, collected upon submission, identifying hybrid publications in the CRIS – this should be imported directly from publishers resources such as Scopus. Related articles and authors are then also identifiable for further guidance on OA dissemination.

Breakdowns of funding, departments and authors can then be extracted through the system’s regular reporting functionality.

Analysing the subsets of a full dataset would enable us to answer some key questions:

  • How are authors and departments engaging with open access across schools and departments?
  • How many authors published in fully Gold journals?
  • How many paid a fee?
  • What creative commons licenses were used?
  • How many authors could archive a document in the institutional repository but haven’t yet done so?
  • What are the publications funded by RCUK or the Wellcome Trust, and how have they been made open access?


Elementary

Here at King’s, we have achieved some success by combining these datasets into a small database. For the first time we have visibility of open access dissemination across all our schools and departments. By enhancing our dataset with DOAJ data and Sherpa Romeo data we can see exactly how our authors are engaging with Green and Gold. For 2013 alone, a number of primary research articles were identified as having permission to archive a post-print in the institutional repository, but hadn’t yet done so. Having data in this format gives us visibility of the situation and the potential to communicate automatically with our authors to provide specific, tailored advice for the individual open access publishing options available for their articles.

We can relate this data to our other services such as funding administration. We could also provide regular, clear and consistent messages around these concepts. Communications can be automated based on information in our enhanced datasets. Because all our researchers are required to utilise the CRIS, this means we potentially have the ability to reach everyone across the institution - no matter how dark the corner.

I believe that CRIS led, effective data management will inform the development of our open access services and help tailor our strategies to ensure a smooth transformation of the dissemination practices of our researchers for their respective fields.

"Well, Watson, we can but possess our souls in patience and see what the hour may bring."
     (Arthur Conan Doyle, Adventure of the Three Garridebs)