Transcribing for the future: CLARIN transcription chain tools for turning recorded speech into textual representation

Maureen Haaker of the UK Data Service, and student at University of Essex, shares her experience on the May CLARIN workshop for transcription tools for turning recorded speech into a textual representation that is as close as possible to what has been uttered.

Last month I was invited as junior researcher to a workshop in Arezzo, Italy hosted by CLARIN to explore some of the tools for transcription offered through CLARIN’s network. CLARIN ERIC aims to create and maintain research infrastructure, supporting the sharing, using and sustainability of digital tools for research in humanities and social sciences. There is a growing interest in digital humanities, an area of study where digital technologies meets humanities and social science research. Digital humanities goes beyond just researching how culture interacts with technology – it actively uses technology to create and disseminate cultural knowledge. This discipline challenges what we know about our cultural heritage and leans on technologies to help re-analyse – or even re-present – our history.

The focus of this workshop was to link together the plethora of tools currently available to transcribe and process oral history interviews with the hope of opening up the way we engage with oral history transcripts. This is the third of three workshop supported under a CLARIN grant known as Transcription Chain.

The Workshop

The workshop brought together software developers, oral historians, and data infrastructure experts to re-examine the process of transcription. With so much expertise in one room, it allowed people to learn about what transcription tools were already available, how these tools could be further developed to meet the need of researchers and archives, and what outputs would be most useful for a range of potential users. Stef Scagiola, one of the Co-Investigators has put together a blog post that sets out that explains this transcription “chain” of the tools that were presented and tested. The tools highlighted how researchers and archives can enrich interview transcriptions, which included audio alignment (with the written transcript), logging turn-taking and speaker tags, and adding metadata. Louise Corti, Functional Director of Collections Development and Producer Support at the UK Data Service and Co-Investigator on the project notes that “It’s great for a social science perspective to be represented in CLARIN business, especially as oral history data provides a bridging between the sometimes stark divide between social sciences and humanities research interests. It is great that CLARIN is engaging social scientists, and we are delighted to have the opportunity work in such a constructive manner and in such beautiful surroundings! As you read this we are preparing a bid for the next workshop!”

Some take-away points

From the perspective of someone who works with archived qualitative data, the possibilities seem really encouraging . With the addition of text that is aligned with high quality audio files, social scientists can begin to ask new questions of oral history interviews, possibly exploring aspects of emotion and interaction that is easily captured in audio but sometimes lost in audio. As Louise explains in this short video (prepared using T-Chain tools in 3 languages!), “I think we can go further and offer much more than flat text.” As part of our work with this stream of workshops, we aim to organise a transcription workshop this autumn to show researchers, from disciplines like sociology which may be unfamiliar with these technologies in action, how sociologists can utilise these valuable tools within their work. Since these tools are open source, we hope to offer researchers a glimpse into how speech and language technology can transform their datasets and analysis, opening up the possibilities of writing new understandings of history and social life. We’re also teaming up with Thomas Hain, Professor of Speech and Audio Technology at the University of Sheffield, to submit some of our large-scale audio collections to his project, and contributing to exploiting the development of speech and audio technology to help improve tools that can continue expanding the potential to analyse the social world.

From the perspective of a budding researcher, I’ve also taken away a few practical tips about better practices for recording interviews. If I want to maximise the most I can get from these tools, I now know that I should:

record audio files in WAV format
try to use a neutral setting with minimal background noise to record the interview
use a microphone that records in stereo

The setup can be as simple as something you record on your phone with plugged-in microphone – no need for expensive equipment! Additionally, recording in 16 hertz – which is all that is needed for ASR programmes to work – is an option if digital storage space is an issue. By ensuring that I am making a quality audio of the interview, I’ll be able to utilise the full potential of these tools, which will help me to get the most out of my data.

Want to know more about the technologies?

For more information about the open source tools that are feeding into CLARIN’s oral history and technology work, see Arjan van Hessen‘s great page on Oral History and Technology website’s page on Technology. You can also read more about our work on transcription on the subpage on Transcription Chain, where Arjan’s concise diagram neatly sums it up:

Data Impact blog

Transcribing for the future: CLARIN transcription chain tools for turning recorded speech into textual representation

Tags