Open futures - A blog for information technology and library related things: June 2014

Wednesday, 18 June 2014

NASIG conference 2014, Fort Worth Texas: Time to take the bull by the horns

David Walters (Merriman Award Winner 2014, King’s College London)

UKSG, NASIG and the John Merriman Award

As the lucky winner of the joint UKSG/NASIG 2014 John Merriman award I was invited to attend the annual NASIG conference, held this year in Fort Wort Texas. It goes without saying (but I will anyway) that I was extremely thankful to the UKSG panel and the Merriman Award sponsor, Taylor & Francis, for this exciting and rewarding opportunity. Published as an editorial in an earlier edition of eNews , you can read my submission here.

http://www.jisc-collections.ac.uk/UKSG/317/Open-access-and-librarian-detectives/?n=121b8a3f-e721-4bc0-8354-7ad4d47f99de

The reason that NASIG and UKSG are closely affiliated is that John Merriman had a hand in setting them both up, going back well over 30 years ago. Clearly a very well respected chap, the Merriman Award honours his contribution and aims to promote this strong partnership.

The UKSG article (below) gave me a new appreciation of this award in this context. At the NASIG conference, I came across many people who remembered John. It really was quite fascinating to hear how these organisations have grown and matured since their inception in order to meet a rapid pace of change and changing direction in our profession. The idea of a coming together of key stakeholders to steer developments across the scholarly communications landscape continues to endure through significant times of change.

http://uksg.metapress.com/content/k302r0k32513m657/

Crossing the pond really helped highlight to me the global reach of our profession and the significance our skills bring to researchers around the world. The event really underlined the importance of librarianship as an international collaboration – sharing the same conversation - in the same way the other stakeholders like publishers and researchers do. Research itself is often a global enterprise.

Same song, different steps?

There were too many highlights to do justice with a short post! I’ll be adding more stuff to my blog in due course. Additionally, reports for each of the sessions of the conference will be published in the journal ‘The Serials Librarian’ (0361-526X) early next year. This year I recorded a talk by Richard Wallis (OCLC) on ‘The Power of Sharing Linked Data’.

http://openfutures.blogspot.co.uk/2014/06/nasig-talk-by-richards-wallis-oclc.html

The information sector was described as a ‘frontier’ - where anything is possible. It’s uncharted, empty space. Whilst in the arena of chance, choice and change, librarianship is poised to change our society for the better. Knowledge has never been easier to submit, transmit or access. There is public good in the mission of our services compared with the sometimes nefarious ideologies of private interests. We are radicals with transformative power and we are ambitious that our changes to the world leave a positive effect for the next generation.

Many are keen to see development surrounding the formalisation and growth of libraries as publishers of research output. There are many on-going initiatives that have a mission to create sustainable and innovative publishing models. University leaders need to facilitate the required investment to redevelop the library’s supporting roles. This means updating staff skills and infrastructure to serve the needs of their community, specialising around their top subject areas, just as the university presses did almost a century ago.

Some speakers argued passionately for internet archiving and preservation as the only way, going forward, to provide context to a piece of research. This is a key area where libraries are ideally suited to take the lead. This kind of service will enable internet time travel for users allowing them to accurately recreate the circumstances surrounding a piece of research in the same way neighbouring journals on the shelf in the library does. I likened this often overlooked service to my inbound, abstract flight path - having effectively travelled backward and then forward in time – only much faster and at the twist of a dial.

The vital importance of key stakeholders sharing these points across at events like NASIG is that sometimes the importance of a system like this is overlooked; whilst I am keenly interested in the transformation of research outputs in an unrestricted, scholarly dissemination system, the surrounding web ‘garden’ in which it grows is also in a constant state of change, flux and adaptation. The number of institutions that even archive their own institutional presence is quite low.

The changing structure of data on the internet and the dramatic implications this has for discovery of libraries on the web was also discussed. We are all familiar with the ‘internet of documents’ – the web of links. The transition to what’s known as the ‘internet of things’, or a ‘web of data’ is emerging. If we expose our resources to the web in the way that it wants it will cement the role of libraries today and transform our mission in providing public access to our collections. Our data must be consumable to the same services used by our students and researchers, and usable to the web at large. It will enable our resources to be signposted on the same virtual roads of discovery our users are walking.

Anyone can get up and join the line dance in-step, with the right knowledge and practice. Knowing the steps is important when no-one is exactly sure how the song will sound. Everyone from Texas seemed to know what they were doing anyway!

Time to take the bull by the horns

Another great aspect of an event like NASIG is the opportunity to network with a wide range of fellow professionals from a variety of different backgrounds. The global impact of these changes presents us all the same issues at the same, seemly enormous, scale. However, all I spoke to see this change as a groundbreaking opportunity to develop ourselves and our profession by enhancing the roles we can play. And in this interconnected world we have no better means to work together to meet these challenges. It’s an overwhelmingly positive message that will ultimately be of huge benefit to the needs of our researchers and students as our services shift to meet their requirements.

Friday, 13 June 2014

For libraries - Digitising a print item with adobe acrobat pro - how it's done!

Developing an effective digitisation process

This is a fun video which illustrates the process developed by me during the course of my role as a Digital Assets Assistant. The institution I work for has a digitisation service that makes readings available to students under the terms of the Copyright Licensing Agency agreement.

This was used as part of a presentation I delivered to the directorate of information resources to demonstrate the Digital Course Pack digitisation service to colleagues.

This also references a magic macro-enabled spreadsheet I developed which manipulates data from our database in order to generate coversheets (a laborious and timeconsuming task) which is a requirement of the license.

The aim of working to this level of quality are to satisfy the license and hopefully add more value to the student experience. The process was also developed with efficient processes in mind.

Enjoy.

Thursday, 5 June 2014

NASIG 2014 - The Power of Sharing Linked Data: Giving the Web What It Wants

Introduction

The role of libraries and librarians continues to evolve. The days of polished catalogue drawers are behind us. The information sector is transitioning through a period of change as the role of the library adapts to meet new service requirements. The implications of linked data are huge and have fundamental implications for the future role of the libraries in connecting users to the resources that match their study and research needs.

Richard Wallis from OCLC discussed topics surrounding the power of linked data and what the web wants at the NASIG 2014 conference:

Image licensed for reuse on google images

http://upload.wikimedia.org/wikipedia/commons/8/89/Linking-Open-Data-diagram_2007-09.png

The Power of shared data:

Changes and challenges

Richard began by outlining some of the current challenges facing librarians today.

The changing format of our resources and the evolving needs of our users is gradually shifting to prioritise ubiquitous online access to material rather than just managing access to a physical collection. Our users now exist in both the physical and online space of the Library - socially and virtually. Users interact through technology away from the physical location of the library on devices personalised to their requirements. Library budgets often target electronic resources for customers who demand instant access to material. The idea of collection management is changing as access moves to online portals.

The rapid pace of technological change continues to be a big challenge facing our sector. Our users, perceptions of collections, research outputs and other factors continue to adapt to the current infrastructure and the new, emerging scholarly landscape. Universities are becoming more involved in producing and disseminating materials. This is prompting changes in the behaviour of our users.

It’s widely accepted that the user is now everywhere, using their devices as a window to the world. It’s clear that libraries must inhabit this space in such a way that our collections and services are visible to the virtual community, thereby facilitating the needs of the user.

In this new landscape our users search for knowledge in places that have served them well in the past– places that are readily accessible and available. Whilst this may drive a librarian or professor to the point of despair, the fact is services like Facebook, Google and Wikipedia remain central starting points for many of our users.

It’s a fact that People don’t start in library catalogue at the beginning of their research process. The risk in this new environment is the connection between the library and the user doesn’t happen. The library may never connect its users to with the resources vital to study and research. The channels of communication between the library and the user may be completely circumvented because the library is not signposted on the virtual roads of discovery our users are walking.

Another aspect of the mission of libraries is to select, describe and preserve our material for public access. In support of this, many collections and archives are now accessible through the network. To some extent, our ability to present this information to users and the wider world has been taken out of the hands of librarians because our traditional methods of exposure are incompatible with the requirements of a web of linked data.
This global issue of discovery crosses many industries and organisations. Information professionals need to ensure their collections are properly exposed on the web, in the way the web wants. There needs to be a move away from record management to the management of entities that can be recognised and consumed by services on the web.

Libraries and Discovery

From the ancient cataloguing systems (providing access to scrolls) to ‘cutting edge’ systems of card cataloguing using index's (to discover authors, titles, subjects etc.) the library’s has been a central means of providing access to information. The pre-printed catalogue card really was groundbreaking technology in its day. It made records sortable, expandable and manageable. The functions and processes of library catalogues and metadata administration were developed in the context of traditional systems of managing physically crafted documents.

MARC was born as the ‘machine readable catalogue card’ as a means of sharing records. In 1994 the first web based OPAC was born, combining the formats and features of the printed card, bringing these benefits to users as well as staff. Librarians were among the first to make their records available behind servers to a wider community.

The format of the OPAC quickly began to show its age. This is unsurprising when you consider the pace of change of the web. Whilst the Library website is usable, patrons often require practice, guidance and help of librarians to perform the most basic of searches. Machine readable card catalogues were built out of a specific time and technology. Going forward, this not a good way to present data to the web. As we approach the web of data we must transform legacy formats to be compatible with the common identifiers and web schemas required for correct exposure. As the world begins to embrace linked data, libraries have an opportunity for our resources to exist in a common space inhabited by our users. An excellent symbiotic relationship; libraries can deliver information to the web in the way that it wants, to be consumed by high level services for the mutual benefit of our information seekers.

What does the web want? What is required in order to join the web of data?

Richard talked about what’s needed to improve the structure of our data and talked around the Worldcat linked data project which has transformed their static bibliographic records into a format compatible with the web of data. He discussed how this new phase of the web adds real-world value to our virtual environments and value to the web community. He described how libraries can use this space to assert their own value in an environment shared with our users.

The web respects and likes size

Big sites and resources tend to attract more web traffic. If it’s a popularly used resource, the data it holds could be an authoritative hub of information. If this hub is constantly updated and maintained it becomes a valuable resource for the web community as a whole. There are numerous examples of this, e.g. Wikipedia. In terms of preservation, large websites have the potential for greater longevity, encouraging other services to seek them out.

WorldCat has been involved in this for some time. They have been an aggregator of library records for many years. Currently they hold approximately 311M records and are the biggest collection of linked bibliographic data on the on web.

Benefits from using big, authoritative hubs, structured as linked data, are things like cascading updates. For WorldCat, any updates they make to their records at the work level will be cascaded down to the manifestation level. Similarly any external resource that is consuming information from WorldCat will always be brought current, up-to-date information that is constantly being maintained.

Standing together, libraries can have a bigger impact on the web. This is part of what WorldCat is aiming to achieve. Libraries can contribute to this today by ensuring their holdings are registered at Worldcat, and current.

There are advantages for companies like Google in allowing their services to consume large, popular resources. As well as making browsing a more useful browsing experience for the user, people will continue to journey through the services of their provider of choice.

The web wants structure and standards.

The construction of the current web is based on fundamental standards, e.g. HTML. Most people are now familiar with the structure of web pages, accessible through a network of links. However, in order to join the web of data, the web wants us to change our techniques in order to join entities together.

Schema.org is a standard adopted by OCLC (and many other organisations), which allows us to define entities and attributes in a way that is consumable by the web. By adopting a shared vocabulary we are able to connect with other services that speak the same language. A central aim when adding linked data in WorldCat was to ensure they behave like the rest of the web. Schema.org is widely understood and shared across the web - approximately 15% already use it. Big competing companies Google/Bing/Yahoo and Yandex all collaborated for its inception. This is because there is a growing global clamour for the benefits provided by structured data across the web.

There are other standards that OCLC could have chosen for WorldCat. For example, Bibframe (developed by the library of congress) that recognises entity based data as the best structure to serve the needs of our users. Utilising multiple standards is complimentary with the aims of linked data. Library rich vocabularies are too complex for the rest of the web. We should expose our data through standards in a rich form like Bibframe, but also in a high level form that the rest of the web can devour.

RDF is format that linked data travels through the web. RDFa is a way of putting it in amongst the HTML in order that services like Google can harvest information from the pages.

WorldCat data links to Dewey, DOID, VIAF and many others. WorldCat explicitly licenses its data as open (under ODC-BY). This means any person or web service can use it, not just Google, highlighting its value as a community resource to different scales of the web.

Network of links

The bibliographic web of data is starting to form. It’s not just libraries; there are plenty of organisations across the world forming their own webs. Google is the obvious big player here, as a service which harvests data from all these sources, but all these services also link with each other. WorldCat is integrated with Wikipedia, VIAF, LCSH, Dewey. These are all well respected linked data hubs that contain authoritative sources of data.

For example, from World cat records there is linked data identifier representing the book this page is about. Clicking on the attributes will eventually take you through to its description, e.g. item type, which is ultimately held by the authoritative hub which looks after this description.

In a web environment the user is clicking on links, which feels natural. But WorldCat is actually going out to the web on behalf of the user, to bring back their information from an authority for display.

Entity identifiers - “Things not strings”

The web wants entity identifiers – a unique identifier for a thing. These are known as URIs (Uniform Resource Identifiers). Why does the web want this? The web is a representation of our world and increasingly we spend more and more time in this space. By gathering, identifying and describing entities in this way we add value to the virtual world. As information professionals, these are skills we are very familiar with. Additionally, when things are identifiable they are consumable by numerous different services.

WorldCat makes use of persistent identifiers for its entities. It is a new concept that allows everyone to know that the same ‘thing’ is being referred to, which allows it to be linked.

There are well recognised relationships between the attributes of our catalogue records and the wider world in concepts like people and places. The world thinks in entities, not just subject records. As part of their linked data project, OCLC have been harvesting this information out of catalogue records, including persons, author, producer, creator, etc. With library data stored as entities connections between things, persons and works, item availability, subjects and concepts are possible. These connections can be used in a new way to promote discovery of these resources.

The FRBR data model (which RDA is based on) is already being used in commercial world, i.e. Amazon. This is because it is a comfortable way to organise data. OCLC are operating on similar models when extracting works data from their records.

Richard introduced a library knowledge graph of relationships. This demonstrates how different resources relate and link with each other. Knowledge cards are an example of this in practice. Google currently uses knowledge cards in search results, which provides access to related, linked data surrounding search topics. Libraries could use similar tools to connect people with related research outputs. This is known as serendipitous discovery, with the user following paths to subjects they didn’t know were available.

What other relationships are valuable to our users? Relationships of availability? Authors? Publishers? These are entities that can be uniquely identified by URIs on the web. Because relationships work both ways, this has the potential to be a very powerful tool in bringing explorers back to our collections and services. Once they have found the library, they have the opportunity to explore the detailed descriptions of our collections to further their knowledge or research.

The Power of Sharing Linked Data for libraries

We are all familiar with the ‘internet of documents’ – the web of links. The transition to what’s known as the ‘internet of things’, or a ‘web of data’ is emerging. An internet of entities, which may have relationships with other entities. It has huge implications for collaboration, shared data and impact.

Commercial enterprises are using linked data internally to further their business aims and improve their services. Facebook describes the data it collects across huge numbers of entities. By ensuring this information is properly related, they are able to target advertising by focusing on the patterns that emerge. Libraries have the opportunity to use data about entities and their relationships in the same way, albeit with more altruistic behaviours. This could offer huge insights into trends of usage and research, helping to inform the services we provide and demonstrating the impact of our collections across the research community. This also helps minimise the risks in connecting users to content.

Our catalogue records are often buried in the OPAC and not properly exposed to the wider world. Marc records often contain a wealth of wonderful, descriptive information about a piece work - vital for those exploring our collections. However, when not exposed or not understandable by the wider web, that value is not useful.

We already store this kind of information across our resources in the library and are very familiar with the concepts surrounding why and how we use it. If we expose it to the web in the way that it wants it will cement the role of libraries today and transform our mission in providing public access to our collections. Our data must be consumable to the same services used by our students and researchers, and usable to the web at large.

Beginners steps

Who am I?

My name is David Walters. I work in information and library services. In my current role I have been supporting and developing services for open access at the Russell Group university where I work.

I'm passionate about how technology is driving change in my profession, allowing for both innovation and partnership with researchers, institutions, authoritative data services across the web and exciting new start up companies.

I'm studying a BSc in Computing and IT with the Open University. I'm an aspiring data buff and developer. I want to dedicate my career to the implementation of and adaptation to new technologies in a changing scholarly landscape. I hope one day to be part of the development of vital new services to the support of research dissemination, impact and discovery. Let's change the world!

What is the subject and aim of this blog?

This is a blog based around my professional experience and project I am working on. I want to connect with new people to share ideas and hopefully learn a few new things too!

Contact me

Below is my dawky photo from linkedin. If you like the content of what you hear and would like to talk about this further please look me up.

https://www.linkedin.com/profile/view?id=283434780

I've also recently started twitting (ok tweeting). Find me at @dav_eye

Open futures - A blog for information technology and library related things

Pages