Facing the future of the Research Library: managing and sharing data in China

Louise Corti reports on her recent time spent at Fudan University, Shanghai as part of their centennial celebrations, which also looked forward to the future of the research library and research data management.

This autumn I was honoured to be invited to give a keynote talk at two events held by Fudan University Library in Shanghai, renowned for being a forward-thinking research library with a great reputation throughout the world.

Fudan and Shanghai

Shanghai is a massive buzzing city, with a population of around twenty-four million, a thriving and business district, and seven ranked universities, with Fudan appearing in the top three.

In Shanghai I was very fortunate to experience the friendliness of all those I encountered. Trying to walk off my seven hour jetlag, I strolled for miles on the first day I arrived, navigating the streets and subway partly by map and partly with the help of young people, who were always more than happy to help in doing pretty basic things such as buying a metro ticket, locating a subway entrance or exit; showing my current location on a map. While road and metro stop names are in English, nothing else is (and why would they be), which makes getting around without a Chinese-speaking guide a real challenge.

I strolled along the two-mile pedestrian walkway along the Huángpǔ River with a fabulous view of the central business district, and its looming towers. A trip up the 636m tower with the fastest elevator in the world let me appreciate the geographical enormity of the city and its densely populated residential tower blocks.

Shanghai

All photos in this post: Louise Corti

 

Fudan University Library Centennial Celebration: Facing the Future of the Research Library

To mark one hundred years of its prestigious Library, Fudan University invited more than three hundred delegates from libraries across China and the world who host Chinese archival collections, or who have donated collections to Fudan.

Introduction slide

Jiao Yang, Party Secretary of Fudan University, welcomed delegates and spoke of the success of the past hundred years which paves the way for restoring historical knowledge and cultural heritage for the next centenary.

She reflected that the library has become one of the top priorities for the University, aiming to build on their already-rich heritage to cultivate a professional team of librarians in preparation for the future. The Ministry of Education’s agenda drives innovation for the library sector.

Chen Jianlong, Director of Peking Universities Library, spoke of the established library’s role in forging a scientific base based on books. He alluded to past challenges, such as the introduction of foreign language books and the start of the ‘digital library’ in 2012.

A formal awards ceremony followed where thanks and gifts were presented to various donors of books and financial contributions to Fudan Library.

Gift being presented to a donor

 

Symposium on Scientific Data Management and Services in Universities

In the afternoon, a special symposium was held on the challenge of research data management and data sharing in China, and specifically, how libraries could rise to meet some of the new challenges.

A panel of library leaders discussed various challenges and directions.  Live questions were taken using WeChat. Questions raised were included collection usability, size as a challenge, and the role of data editing; all topics familiar to the UK Data Service.

Picture: WeChat questions posed by the audience

 

My talk was on managing and sharing data in social science, focusing on ‘papery data’, i.e. qualitative data.

‘Think before Digitising’ was my key message, which was welcomed. This involves cataloguing and box-listing first, then digitising on demand, preferably with the scholar paying for this undertaking or seeking a joint grant to undertake the work.

As with the Digital Futures project, linking a scholar to a collection prior to making it digital is vital

My intro slide

 

Following my talk, we heard from Mr Chang on the situation with humanities and social science data at Fudan University.

He recounted that this year China’s Ministry on Science and Technology has published positive mandates on FAIR, GDPR and Scientific Data Management Regulations as national policies.

Prestige universities, such as Fudan, have grabbed this mandate and started to run with it, to show they can be quickly responsive to higher-level demands. In the room were invited experts on emerging technologies for open access, such as blockchain, as well as the leading librarians from across China.

Mr. Chang talked about the new Scientific Data Research Institute, hosted by the Fudan Development Institute (FDDI) that includes the Contemporary China Social Life Data and Research Center.  The FDDI brings together a multidisciplinary mix of big data experts, prominent policy analysts and library scientists.

I also learned that in partnership with the Library, they have set up an Archive of Contemporary China Life, a Big Data Centre, where researchers – largely economists – can access real-time population and economic data via their secure-access Hadoop system, and smaller research datasets deposited in their Dataverse.  The data centre makes available around five hundred datasets on energy flow and carbon use on a big data platform that supports AI operations. This project has thought carefully about the pipelines needed for different disciplines and how evaluation of data differs across the disciplines. Working with government and companies, they have built a ‘NextGeneration’ Research Platform that supports large and federated data storage, sharing and analysis with role-based authorisation. Data tools include existing commonly used tools by social scientists (SPSS, Stata SAS, WinMax etc.) as well as 3D visualisation tools and an article-publishing platform. The platform is underpinned by new big data platform tools like Spark, Kafka, VPS Docker, similar to the approach being used by the UK Data Service’s emerging Data Services as Platform (DSaaP).

Picture: UK Data Service Data Services as a Platform (DSaaP)

 

Following the Party Secretary General’s request for librarians to quickly skill up on data, many university leaders have mandated that their libraries should take the lead on establishing data management and repository services.  It was noted that while the Ministry has great expectations, it is unaware of the scope and scale of the undertaking and current lack of expertise. University-level data policies were felt to be a good start, similar to what happened in the UK when in 2014 the EPSCR set the bar by mandating University-level data sharing responsibilities.

Further discussion pointed to the new roles and skills required for librarians’ jobs and the need for information, data and statistical literacy to be fully recognised and included in librarian training. Evolution of the profession to support new policies is welcomed by librarians across China, who argue that they need to be better integrated with IT specialists in various disciplines.

Before platforms are built, training and advocacy in data management and sharing need to come; adding these into core training is one way to facilitate this.  Topics of openness, privacy, incentives to share, and data re-use are relatively new to researchers in China. Our Sage Handbook, Managing and Sharing Research Data, translated into Chinese, can support this, and is already being used in in library science training.

Sage Publications, Managing and Sharing Research Data:  A guide to good practice, Corti, van den Eynden, Bishop and Woollard

 

Celebrating the Future of the Library and Indexing in China

Fudan also hosted the annual international conference of Indexing Societies.

Many of the participants I met were more experienced independent contractors who indexed a massive range of text-based publications, from PhDs and biographies to indigenous materials and cookbooks to scientific high school textbooks. Many used ‘time-tested’ manual efforts to index, using dedicated commercial software. I also learned that one could do a 6-month accredited course to be an indexer and wondered if this might suit me for a post-retirement activity!

I came across an interesting take on China’s past position on the role of indexing and use of scholarly material. It was explained to me that in the past China has not recognised the need to ‘index’ their scholarly materials, including books.  Further, culturally, scholars have not needed to cite scholarly information, as published knowledge is deemed to be free and available to all. This is a different situation to those of us used to dealing with citation, accreditation, intellectual property and publishers’ copyright.

As I asked more about various indexing softwares on the market, including CINDEX, MACREX and SKY Index, I learned that the main use of these is for the indexer to create index entries directly in the software, with the software organising them into alphabetical or page number order.  These can then be imported as an RTF into a publishers’ own system, which is very different to my own experience in this area.

There was a question by a Chinese participant on the use of XML for more effective and efficient indexing, as well as import/export but sadly this didn’t lead to any further discussion around reusable ontologies and schemas.

I presented on the case of social science data as a new kind of FAIR (Findable Accessible Interoperable Reusable) output/publication that needed some degree of indexing to improve its FAIRness. For example, key words can reflect method and study coverage, questions and code books.

I raised the question of how best to index qualitative data – method or data content? I hope to follow this up with an article in the Journal of Indexing on this conundrum, with support of my expert colleagues at the UK Data Service.

 

Autoindexing: a solution or a sin?

Questions about using automated or semi-automated methods employing Machine Learning and AI for indexing books, especially of scientific nature were not well received, with the chair saying that ‘If a machine can replace me, I won’t have job’.  She argued that the subtleties of language in a text such as a biography, where co-referencing is commonly used (e.g. Louise Corti, the Associate Director, her etc.) could not be easily indexed, despite the existence of sophisticated algorithms that can successfully identify co-references, using Natural Language Processing.

Another speaker closed down the idea as being impossible to automate, as indexing was too varied and too sophisticated for a machine. A tale was recounted of their community being promised an autoindexing system by IBM some 20 years ago, which never materialised. But they did agree that it might be possible in their lifetimes for more straightforward topics to be auto-indexed.

I was disappointed but also renewed in my aim to get the UK Data Service question and variable autoindexing project started soon!

Having just worked with my colleague, Jeannine Beeken on a recent CLARIN workshop where we introduced linguistic approaches and tools for oral history data, I advised them to try out the free Stanford CoreNLP to support named entity recognition and coreferencing.

Curious? Try this for yourself. Copy the italic text and paste into the search box in the demo version: Louise is a Service Director at the UK Data Service. She has been working there for 30 years. She often enjoys her role, especially when she gets invited to China

 

Postscript

Many thanks to my host for the week, data librarian, Yin Shenquin who, with her colleagues have undertaken progressive work in China to set up a small network of data libraries in Chinese academic libraries (populated Dataverses). She also translated our Sage Handbook, Managing and Sharing Research Data into Chinese. This is being well-received by library science professionals and master’s students.

I really enjoyed my invitation to Shanghai and was honoured to be part of Fudan Library’s hundred-year celebrations.  The need for academic libraries to rapidly transform to support a connected digital era is recognised by the leading libraries across China. They have reached out to colleagues in the data and technology worlds to partner in developing new information and data platforms.


Louise Corti is Service Director, Collections Development and Producer Relations for the UK Data Service.

Louise leads two teams dedicated to enriching the breadth and quality of the UK Data Service data collection: Collections Development and Producer Support. She is an associate director of the UK Data Archive with special expertise in research data management and the archiving and reuse of qualitative data.

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *