Today I attended a workshop on ‘big cultural heritage data’ at UCL’s Institute of Archaeology, which introduced me to several interesting projects and initiatives that are relevant to my research, and allowed me to renew some connections with people I’d met at other events.
Various speakers tackled the issue of what ‘big data’ is, and whether cultural heritage data could actually be described as ‘big’, as the amount of data involved in such projects is minuscule compared with that in high energy physics, for example. It seems that ‘big data’ can be a bit of a buzzword, to make data sound more exciting. It is the content and structure of the data, and what is done with it, that is important and that will determine the tools used for data processing and analysis.
Melissa Terras started the talks by discussing work at UCL’s high performance computing centre, and the lack of involvement thus far for Arts and Humanities projects. This might be due to a lack of knowledge among researchers in these disciplines about the potential of using this facility, or a preconception that the high performance computing people might not want Arts and Humanities researchers to use their service. Melissa explained that actually, they would be very keen to work with more Arts and Humanities data, and made the excellent point that such a conversation would be a two-way street – this wouldn’t just be a case of HPC performing a service or training Humanities researchers, it would be an important opportunity to open up a dialogue and introduce the complexities of heterogeneous Humanities data.
This is the paper I mentioned in #gefbigdata heritage data talk: how to get A&H researchers access to high performance computing. https://t.co/tlc4N0v1VA
— melissa terras (@melissaterras) May 22, 2017
I particularly enjoyed Ethan Wattrall’s talk that focused on data usability, which discussed the idea that making data available does not necessarily mean that the data is usable. Ethan advised that in order to make it as easy as possible for users to download data, cultural heritage institutions should clean their datasets and make them available as downloads on their websites. He also spoke about applying Creative Commons licences to data, and the important point that you still retain ownership of your data even if you give it a licence that allows other people to reuse it. It was really great to hear this talk and for it to be given at a Humanities-related event, as openness and usability are such important issues with regard to data, and true openness cannot exist without usability.
For those who are interested, here is my #gefbigdata presentation slide deck https://t.co/G4yA1WBRPy
— Ethan Watrall (@captain_primate) May 22, 2017
Shawn Graham raised an interesting point in his talk about the re-humanisation of data. By definition, data in the Humanities tends to relate to humans, but it can be very easy when looking at large volumes of numbers and snippets of information, to start seeing ‘things’ instead of people – and when you start seeing people as things, that is a huge problem for using this data to do Humanities research. Part of Humanities study is to look at the geographical, temporal and political context surrounding data, and what the implications are for the group of people being studied. As well as being disrespectful, seeing people as things loses this important dynamic.
Another important, and related, point Shawn raised was thinking about the implications of the work you are doing as a researcher with this data – the initial question should always be along the lines of “does what I’m doing have the potential to hurt someone?”. Researchers should always be aware that just because something is published online (e.g. on social media), this doesn’t mean that the author necessarily wants their personal details to be included in a publication. Additionally, certain community groups have very strong and differing feelings regarding having data about them published online, and these feelings need to be taken into account in any subsequent work.
If you missed the wonderful #GEFbigdata workshop in May, here is, for you..the brilliant @electricarchaeo performing https://t.co/27JyJtpoUZ
— Chiara Bonacchi (@ChBonacchi) July 6, 2017
During the afternoon, Harrison Pim spoke about the ‘Data Champions’ initiative at the British Museum, where people from different departments who all use digital data in some way come together on a regular basis and give lightning talks to their colleagues about the types of data and issues they are working with. Harrison highlighted that bringing colleagues together in this way breaks down barriers between departments and facilitates the connection of data that previously existed in separate silos. This sounds like a brilliant initiative, and I think it would be great if something like this was started at the OU, as I know very few people who are doing this sort of research and I’m sure there are many more I could learn from.
The great @hmpim from @britishmuseum #bigdata team talking at the #GEFbigdata workshop at @UCLarchaeology https://t.co/RFFyC382f7
— Chiara Bonacchi (@ChBonacchi) July 6, 2017
This was an interesting and varied day, and the talks described above only really give a small taster of the topics discussed. Luckily, however, there was an active Twitter hashtag #GEFbigdata, which kept going throughout the day. I would like to thank Chiara Bonacchi and Daniel Pett for organising, introducing and facilitating the event, and I hope to attend more such events in future.