Stephanie Blanchard continues our census series by introducing the Office for National Statistics’ approaches to ensuring individuals and households aren’t identifiable in the 2021 Census outputs.
Where’s the elephant?
A question we never get asked in the ONS Statistical Disclosure Control (SDC) team is
How do you hide an elephant?
Image: elephants at the waterhole. Photo by Richard Jacobs on Unsplash
Let’s say the census collected information on pets (which it doesn’t) but in a hypothetical world, a household could have a pet elephant. If it were the only household in the local area with a pet elephant, it would be identifiable in the neighbourhood.
ONS is legally and ethically required to protect unusual records, such as these, in the census outputs which is done through the application of SDC methodology.
In previous censuses, ONS published lots of static tables, carefully designed to ensure the disclosure risk was very low. For example, elephants may have been grouped into a category along with other large mammals such as hippopotamuses and rhinoceroses.
The manual process for producing and disclosure checking these tables was time consuming, sometimes necessitating changes to those tables, and didn’t always allow users the detailed outputs they expected or required.
New approaches to disseminating Census 2021
Among the many firsts for the Census 2021 is how the published data will be disseminated, with products including a Flexible Table Builder (FTB).
The FTB makes it more challenging to protect unusual records in the published outputs because a data intruder could tabulate type of pet in turn against each of the other variables in the census. This could lead to the construction of an individual record revealing information from the census about a person or household.
In fact, a household doesn’t have to be that unusual to be at risk because any low count in a single variable table, say the only pet elephant in an area, could be used to identify that record in every table that uses type of pet.
Our shield against data intruders is our SDC methodology which has been specifically developed to protect Census 2021 data. We will use a combination of Targeted Record Swapping, Cell Key Perturbation and Automatic Disclosure Checks to ensure that no respondents, even those with elephants, can be identified in the published outputs with certainty.
ONS approaches to Statistical Disclosure Control
Targeted Record Swapping
Targeted Record Swapping is our main source of protection against unique records. We identify the records most at risk by targeting a wide range of variables and protect these households by matching them to another household with similar characteristics in a nearby geographic area.
In the case of the elephant, this could be a household of the same size with a pet hippopotamus in the next town, and the two households will swap between the two areas. The aggregated characteristics of both areas will remain broadly unchanged, with the same number of people and the same number of large mammals, however where the neighbours expect to see an elephant in the outputs, they will find a hippopotamus.
Cell Key Perturbation
The next layer of protection is the Cell Key Perturbation method that protects against disclosure by differencing similar tables. The FTB will allow users to choose the geography, population, variables and level of detail in their tables.
If one selection requested all pets while another requested only furry pets, if the elephant is the only non-furry pet in an area, the two outputs could be differenced to reveal information on the household with the elephant.
Originally developed by the Australian Bureau of Statistics to protect their census outputs, the Cell Key method makes small changes to table cells in a consistent way so that a table query will always result in the same values, and the same cell in a different table will always display the same value. We have adapted the method to include changes to zero value cells, which along with Record Swapping, allows small cells to be published in census tables due to the sufficient uncertainty they apply, without significantly changing the characteristics of the outputs.
Automated Disclosure Checks
The final line of defence against intruders is our Automated Disclosure Checks. Even after applying Record Swapping and the Cell Key methods, some tables may be sparse for some geographic areas or categories if too much detail is requested. We can’t manually assess every possible combination of variables within a reasonable timeframe so we have developed a set of rules to look for possible disclosures.
For any query in the FTB, where an area fails the rules, the data will be kept safe and not published. However, the same query for another area where sufficient protection has been applied by the other methods and the rules are met, the data will be released. This will allow more data to be published where it is safe to do so.
A shield of protection
A phrase we do hear a lot in SDC is that ‘you can’t have it because of SDC’.
Without the SDC shield of protection, there wouldn’t be detailed published outputs or data accessible to all. Without the innovations in SDC methodologies behind the ONS dissemination systems, there wouldn’t be flexible, timely outputs that better meet user needs. Because of SDC and the hard work of many other teams within ONS, the Census 2021 outputs will be some of the most exciting yet!
More information on the SDC methods that will be used for the dissemination of the Census 2021 can be found in the Methodology Assurance Research Panel paper EAP125 – Statistical Disclosure Control (SDC) for 2021 UK Census (direct download link for the Microsoft Word document).
No elephants or other pets were harmed or disclosed in the drafting of this blog post.
Data from the UK censuses from 1971 to 2011 are available from the UK Data Service, including harmonised UK data for 2001 and 2011.
About the author
Stephanie Blanchard is an ONS statistician in the Census and Population Statistics team.
Her role within the Statistical Disclosure Control project team is to research and develop the methods that will protect the 2021 Census outputs. Stephanie has been a member of the Methodology Division since joining the ONS over 11 years ago, working in a number of roles on the 2011 Census.