The second day began with talks from Matthew Sillence (UEA) and Stefan Rüger (Open University) on the subject of Digital Images. Matthew started by looking in more detail at the metadata that is captured for digital images, and using the IconClass system of classification. I had not used IconClass before, but very much enjoyed the exercise of classifying a particular image as classification was one of the tasks I enjoyed most when working in libraries. It definitely highlighted the subjective nature of classification, and the way in which a different person will focus on different aspects of an image, or might have a different interpretation that will then affect their classification choices.
We also looked at the British Library’s Flickr account, and the way in which images have been classified using tags. Some of these tags were assigned by a person working at the British Library when the image was catalogued, whereas others have been added by SherlockNet, an automated tagging system that assigns keywords based on particular features within the image. This project was developed as part of the British Library Labs competition, and is potentially a very exciting way of making sense of large amounts of images for which we have no metadata. However, it does look like it could do with a bit of tweaking as some of the tags it assigns are incorrect or questionable – a human editor would be required to ensure quality and accuracy of information. It did also make me wonder if crowdsourcing the creation of a ‘folksonomy’ might instead be a better way of adding keyword tags to large numbers of digital images.
Stefan then went on to talk about the physics of colour vision, the way in which colour information is understood by a computer, and the process by which image files are compressed into JPEG format. This was almost entirely new information for me, and I was really interested to learn about a topic that is related to tasks I perform fairly regularly (taking digital photographs and converting images to JPEG), despite being outside my immediate research interests. Looking at the CIE chromaticity graph really clarified why some colours are not rendered well, or in some cases at all, by print or digital images. Our brain can perceive many more colours than a computer or a printer, and this diagram shows very clearly which areas are likely to be poorer quality in print or digital reproduction. I also found it fascinating to go through the various steps that a computer takes to convert an image to JPEG from a larger format, as this is something I do fairly often but had never previously considered what was actually happening ‘behind the scenes’.
In the afternoon session, we first heard from Francesca Benatti (Open University) about working with Digital Texts. She demonstrated the use of Voyant Tools and AntConc for pulling useful information out of long texts. For example, we looked at the words that appear most frequently in various works of Jane Austen, and in particular compared the amount of times the word ‘sister’ appears across the corpus. This seemed like it would be really useful for people studying literature to gain an idea of the main themes of a particular text, as well as potentially how to gauge the mood or relationships of characters as the text progresses. I was familiar with the concept of these tools before, but had no practical experience in using them, so this was a good opportunity to gain some experience with a different aspect of Digital Humanities research.
Francesca also talked about the potential application of these techniques on a larger scale – so called ‘distant reading’, using tools such as Google NGrams, while being mindful that there is still inherent bias in the selection of texts for even very large datasets. Additionally, she talked about the Reading Experience Database, an Open University Digital Humanities project, which aims to capture every instance of someone reading something between 1450-1945.
Mia Ridge (British Library) closed the day by talking about Information Visualisation. This was particularly interesting to me as I have recently been looking at various tools I could use to visualise my research data on AHRC projects. A theme that ran throughout Mia’s talk was the importance of adequate data preparation and cleaning – ensuring that the visualisation is based on accurate and good quality data, particularly if it is going to be shared. She gave some examples from her own research into female scientists throughout history, and showed the process from gathering data in a spreadsheet through to the different pieces of information she focused on when choosing to visualise this data. In particular she demonstrated that simply a knowledge of Excel can be sufficient in many cases, and can produce useful and visually pleasing results.
We went on to look at visualisations about the recent US election, to show aspects such as the predicted and actual vote share for specific states, and how this has changed over time, as well as information relating to the way in which different demographics voted. We were invited to critique these visualisations in terms of their reliability and trustworthiness, i.e. how up to date is the data, and were they created to make a particular political point. This kind of critique will hopefully make me more critical of my own visualisations, and address any shortcomings in the accompanying text and analysis.