Experimenting with AI: My Experience Creating a Closed Domain QA Chatbot

In this post James and Louise introduce us to the world of developing an innovative chatbot for answering research data management queries.

A chatbot is a computer program which responds to user input either by textual data (typing) or audio data (speaking). The UK Data Service wanted to utilise this emerging technology to benefit its users by reducing the response time for its research data management (RDM) queries. Louise explains: “The idea was inspired by a presentation on ‘Cognitive systems redefining the library user interface’ presented by Kshama Parikh at the International Conference on Changing Landscapes of Science and Technology Libraries in Gandhinagar in the province of Gujarat, North West India  late last year. I envisaged a new online service: Meet Redama, our new chatbot for research data management. You can ask Redama anything about research data management, and she will be available 24/7 to provide an instant response as best she can. And, most conveniently, the night before your research application is due in to the ESRC! Go on, ask her about consent forms, data formats, and encrypting data – her breadth of expertise is excellent. However, with only 3 months to try it out, that was always a big dream!”

We hired James, a recent graduate, for a summer project to develop a chatbot to try and solve a closed domain question answering (QA) problem, using the domain of ‘research data management’. James highlights the main steps and some of the challenges that arose.

“The basic premise of a QA chatbot is to map a question to a relevant answer. This can be done in different ways from machine learning to more statistical approaches such as cosine similarity (which still have an element of machine learning to learn the initial vectors). In other words, one can draw similarities between question and answering with information retrieval. What follows is my experience of chatbot development.”

Stage 1: Sourcing and cleaning the data

As with many AI problems one first needs to have valid “real life” data which can form a useful baseline.  Here at the UK Data Service we used two sources:

  • queries related to research data management, submitted from users to the Service via our web-based Helpdesk system
  • text from our fairly substantive RDM help pages

Once the first set of data was assembled, the next step was to clean and pre-process it so that it could be fed into a mapping function which tries to predict an answer for a given question. We did this by taking existing emails and putting each individual one in its own text file. This resulted in a collection of 274 plain text files.

The same was done with all the data management web pages – each page is located in its own individual text file. This proved to be a time consuming task. Due to the initial file format of the data, both knowledge bases had to be generated by hand. Textual data often has special escape characters such as newline “\n”, or tab “\t” which need to be stripped from the text, and this was performed when separating the emails and web pages into separate files.

Once we had set up two simple knowledge bases, we then created a data management object.  This object loads all the necessary scripts and acts as a simple interface between a chatbot and the data itself. The simple script I wrote is located in Datamanager.py. This way, one can change the existing knowledge base structure underneath (to a database instead of a collection of files for example) and because the Datamanager object acts as an application programming interface (API), the chatbot itself will not require any code changes.

Step 2: Calculating a Baseline

A baseline performance has to be established to see what can be achieved using someone else’s framework, or by adopting a well-used approach. For this project, we decided to use chatterbot which we adapted to feed in questions and answers from our dataset using the DataManager object. To see the exact script refer to https://github.com/jamesb1082/ukda_bot/tree/master/bot. However, this particular chatbot did not work well with our dataset due to its relatively small size.  The examples given in the chatterbot project use rather large conversational corpora such as the Ubuntu Dialogue corpus. Having been trained on our corpus, it achieved a precision of 0 almost every time and so any model going forward would be just as good as this particular framework for our dataset. This chatbot was a good learning experience to start off the project, as it highlighted some key issues around building a chatbot, such as storing and inputting data.

Step 3: Using Machine Learning Techniques/Statistical Techniques

Having now established our baseline, we then approached the problem of designing a mapping function from a different angle: using machine learning and information retrieval techniques to generate relevant answers.  First of all, it is important to establish how similarity and relationships between words can be modelled in a computer program. The modern approach is to use Vector Space models which map each individual word to a unique point in vector space, in other words, each word is represented as a series of 100 numbers. The position of each word in vector space is relative to all other words in vector space:  words that have similar meanings are close to each other, and the resulting vector produced by the subtraction of one word’s vector to another defines the relationship between two words. A common example is King-Queen: Man – Woman where each word is actually the vector of that word. The detail of how these vectors are generated is beyond the scope of this blog, just to note that they are critically important.  Enabling words to be treated as numbers means that mathematical calculations can be performed on lexical items.

Step 4: Siamese Neural Network

Since the previous two methods performed unsatisfactorily, we adopted a different approach which centres on using “neural networks” to learn and generate a mapping function instead. As the dataset we are working with is rather small (only 171 correct QA pairs). We opted to use a Siamese Neural network (SNN) which is a special type of neural network consisting of two identical neural networks which share a set of weights. The question vector is fed into one neural network and the answer is inputted into the other network (see diagram below).

The distance between the output of the two neural networks is calculated with the idea being that the difference is 0 when the answer is correct and 1 if it is not. The weights are updated to adjust the network depending on whether the answer was right or wrong and by how much. Essentially, by training the network in this manner, we can calculate the distance between a question and an answer, which in turn acts as a distance function. This stage of the project was the hardest theoretical part of the project.  However, the actual coding was relatively straightforward, due to the very simple, modular API provided by Keras.

Step 5: Results

After implementing and training the model on our dataset, we performed some testing on it, to see how well it actually performed in different scenarios. The first test used the complete training set, to see how well it “remembered” questions, with our dataset correctly identifying 79% of questions. It is important to note that one does not want 100% at this stage, as it is a common sign that the model will have likely just memorised the initial dataset, and has not generalised the relationships between questions and answers. When we tested it on unseen questions, our model did not perform particularly well, however, we suspect that this is due to some answers only having one relevant question, meaning that it cannot generalise well.

Step 6: Further Improvements

Neural networks learn through being shown examples, and as a result, the performance of a neural network is reliant upon the quality of the dataset it is trained upon. Although deploying a very small dataset and we did upscale it to contain correct and incorrect QA pairs, it often featured only one or two correct QA pairs for certain topics. To combat this issue, one could improve the dataset by not only asking more questions but seeking a more uniform distribution of questions. For example, our distribution (see below) is not even, with some very dominant peaks and with a lot of answers which have very few answers pointing at them. We also used Stanford’s SQuAD to directly compare the model provided in this project against other chatbot model. To do this, we modified our data loading functions to read in the training data, and modified our script to output a prediction file.

James told us: “Overall, I think the project went well, with a basic interface chatbot having been created. With the possible expansion of using other datasets and some tuning of parameters, I believe the main method implemented in this project will be reasonably successful.”

James was embedded​ into the Big Data Team at the UK Data Service for the duration of the Chat Bot work. Nathan Cunningham, Service Director for Big Data, said: “It’s really important that the we start to investigate new methodologies for understanding and structuring our data challenges. Over the next couple of years we aim to try and organise our collection and resources with enough context and information to make sense to anyone reading, querying or using the Service. The Chatbot work was an initial pilot study in using Machine and Deep Learning algorithms to construct a knowledge base and answer questions. We now have better understanding of what we need to do to build a UK Data Service knowledge base and how this will improve our data services.”

James’ supervisor said “I was delighted to work with UK Data Service in bringing this exciting project to life – a good question/answering bot can greatly decrease first response times for users queries. A number of exciting developments in Neural Networks were incorporated into the bot – the bot is not just trying to assess the existence or not of certain words, but aims at inferring the exact semantic content of the questions.”


Papers that introduce datasets:

Leave a Reply

Your email address will not be published. Required fields are marked *