Anne Solon, data manager for Young Lives, whose role involves working with Young Lives research partners to coordinate the complete survey cycle and coordinating the processes of survey design, piloting, training of field staff, data collection, data entry and data management, discusses adding value to survey data to support analysis.
More is more
People often say that ‘less is more’ but at Young Lives we feel ‘more is more’. This is especially important in ensuring our data is not only clean but provides added value. Over the years we have collected a wealth of data on a variety of aspects of children’s lives. However, we wanted to take it one step further than just providing the results from our collected data but also share key outcome variables and striving to always take our data to the next level. After the submission of our Round 4 data to the UK Data Service, I’ve had time to reflect on the unique features of our dataset.
The majority of our data is of the ‘question and answer’ variety. But this doesn’t mean that a lot of work hasn’t gone into ensuring this data tells a story. With any longitudinal project, the more rounds you have the cleaner the data should get. Young Lives has a thorough and constant cleaning process that is never classified as done. With over 79,000 people in just our household roster data across the two cohorts and 4 study countries from our Round 4 data, you can imagine there is a lot of work just ensuring that genders are consistent across rounds, ages progress logically and no deceased individuals come back to life. Other considerations for cleaning are that skip and logic patterns are followed and that the data is consistent across sections and questionnaires for each child and household.
Coding String Variables
One of the valuable lessons we’ve learned with a longitudinal dataset is that you always need to be thinking of the next round, even if it is 3-4 years away and how the data needs to be accessible and usable. String variables are never archived as it’s nearly impossible to ensure that no confidential or sensitive information is included in the text, especially if it’s in the local language. However, this doesn’t mean that the string variables can’t be coded to become useful. Over the last two years we’ve put in a lot of effort to code some key string variables such as the locations of our children and names of the schools our children are and have attended. To say this was tedious would be an understatement. For location variables many of our country data managers had to go so far as reviewing individual address of our children. With 12,000 children – some who have moved every round – this task was time intensive, but we hope that the added benefit of deducing who has moved each round using codes will strengthen analysis. This is also important when coding our schools. Whereas before this data was simply omitted, it is now available in code format so that if the code stays the same you can see where children have stayed in the same school or at what point in their trajectory they moved school.
Young Lives also generates calculated variables such as health indicators, wealth and consumption index’s and test scores for each round of data. These data are constantly updated after each round, as are the methods for calculating these scores, and submitted alongside our collected data to the public archive. These indicators are often widely used and can be quite difficult to create. By including these in our public datasets we hope to save the user’s time in their own analysis.
Panel Data set
After the submission of the Round 3 data to the UK Data Service, we began to build a panel dataset for each country. This set contained the core data that has been collected across Rounds 1 -3 and was submitted to the public archive. Again, the aim of this was to aide user analysis by providing a dataset that summarises variables that have remained constant across the rounds for each country and cohort. After our recent submission of the Round 4 data, we have updated the panel dataset and plan to submit this in the coming weeks.
Alongside our Household and Child questionnaires, we have also conducted other streams of research such as the School Survey. In order to maximise its use, we code the data in such a way that it is easily to link to our main child and household data including the main cohorts, their households and even individuals within the household. These means we have no ‘stand-alone’ datasets.
As mentioned above, we will submit an updated panel dataset which will include Rounds 1 – 4 to the UK Data Service. Additionally, we are finalising our Round 5 questionnaires, building our computer-assisted personal interviewing (CAPI) programs with preloaded data from Rounds 1 – 4 and plan to begin fieldwork this summer. Young Lives is also planning to conduct another round of School Surveys in India, Vietnam and Ethiopia alongside the Round 5 work.
Find out more about Young Lives here