Today I attended a workshop organised by people who work on the current iteration of the Listening Experience Database project (LED), and involved members of the project team as well as a few more of us from the Open University and the University of Glasgow. This was a really good chance to hear about one of the AHRC-funded Linked Data projects I’ve been studying this year, as well as to talk about my research for a new audience (as part of a session containing short ‘work in progress’ presentations) and participate in discussions about directions for the LED and Digital Humanities projects more generally.
The LED is a project to bring together people’s experiences of listening to music from any period in history through to the present day. Users are invited to contribute their own experiences (or transcriptions of other people’s) to the database; these are then checked and approved before being linked to other entities in the database (such as performer, time period, location) and added to the searchable resource. The event was led by Lorna Hughes from the University of Glasgow and Alessandro Adamou from the Open University (one of my supervisors), who explained a bit more about current issues they are dealing with in the LED project, such as handling incomplete dates or people who are described but not named.
After Alessandro’s presentation, there followed several short ‘work in progress’ presentations, including my own, before we discussed some of the issues that had been highlighted during the session. One of the major topics that came up was the difference between projects that aim to answer a specific research question, as opposed to projects that are more about setting up a research infrastructure. Pelagios, which Elton Barker (another of my supervisors) had talked about during the presentation session, is definitely an example of the latter category, and possibly went in this direction from the outset due to being initially funded by Jisc, who are more keen to support infrastructures. The fact that Pelagios developed in this way could partly account for its relative success when compared with projects that were set up to answer a specific research question, published online, then had no resource to maintain them once the funding ran out. If a resource is made relevant to different audiences outside of the scope of a particular research question, it is more likely to be used, for this usage to increase, and for a community to develop around it, as has happened with Pelagios. It is hoped that the LED might develop in a similar way, and linking through to other resources via the Music Scholarship Online (MuSO) project, presented by Tim Duguid, may be one way of achieving this. However, the main difference from Pelagios is that the LED is collecting and curating its own data, rather than linking entirely to external resources, so there may be different pressures that would act as a barrier to the project developing into a similar infrastructure.
Another issue we discussed was that of the balance between effective data curation and preservation of detailed metadata. One example where this could initially have been handled better is with Europeana, whose reductive data model stripped away much of the rich metadata associated with objects in the collections it links to, making it in some cases prohibitively difficult to search – an issue which they are now working to address. A factor affecting data curation is where manual approval is required before a new entry can be added to the resource, such as with the Reading Experience Database (RED). Attracting a large amount of new submissions to a resource should be really positive, but a manual approval process can cause a huge bottleneck, which is often difficult for under-resourced projects to manage. However, the process of curation cannot be done away with entirely, because it is important for a resource to be seen as trustworthy and authoritative with regard to the quality of content it contains. The importance of authority appeared at several points through out the day – one example occurred in Marilena Daquino’s talk, where she spoke about how multiple photo archives often contain the same photo, but there is often no central authority that contains official information about that photo. During the discussion, crowdsourcing also featured in relation to this issue – this method has been deployed successfully for other projects, so perhaps it would be a good approach for projects like LED and RED to take in the future, if they could attract enough volunteers to participate.
Both the main parts of the discussion tied into my work on the AHRC-funded Linked Data projects, and more generally into my research on how Linked Data might best be integrated with the Humanities. Perhaps the infrastructure route promises more longevity, but potentially takes more work to set up initially. Maybe crowdsourcing is the best way to make the compromise between rich data and effective curation. These issues have given me a lot to think about, and I was really glad to have had the opportunity to take part in this workshop. I would like to thank Lorna, Alessandro and everyone else on the LED project for organising the event and giving me the chance to present my research. There are another two LED seminars scheduled for later this year, which I very much hope to attend.