Introduction
The Samples of Anonymised Records consist of extracts from Census records which are designed to enable researchers to carry out detailed analyses using 2001 Census data for individuals or households.
The SARs produced after the 1991 Census provided a valuable research dataset. The Census Offices have produced an individual file and a household file from the 2001 Census. There is a legal obligation to protect the confidentiality of the individual information that is released in the SARs and to ensure that the data that are released are safe from disclosure risk.
A large amount of analysis has been undertaken to assess the risk of disclosure. To meet their legal requirements the Census Offices have judged that some reductions in detail for certain highly visible or disclosive variables were required in comparison to the 1991 SARs. Consultations were held with the research community in 2002 to assess which bandings of variables were most acceptable to users - see HERE for details.
Contents
The SARs extracted from the 2001 Census consists of three products derived from three separate extracts (No case in one extract appears in either of the others):
The Individual Licensed SAR consisting of around 3 per cent of person records, relating to some 1.84 million people in all. For each person it contains the main demographic, health and socio-economic variables and derived variables such as social class; household information; data on the sex, economic position and social class of the individual's family head; and limited information about other members of the individual's household (e.g. number of pensioners), area identification at GOR level in England, and for the countries of Wales, Scotland and Northern Ireland. The list of variables is shown below.
A Special Licence Household SAR (SL-HSAR) consisting of a 1 per cent hierarchical sample of households and individuals in those households. It contains information for some 245,000 households and covers England and Wales only. The information is given for each individual in households of size up to and including 11 persons.
A 5 per cent sample of Small Area Microdata (SAM) - a new product for 2001 - containing 2.9 million individual records with Local Authority level identified. The variables included are similar to those in the individual SAR, though broader banding has been used to preserve individuals confidentiality.
Full details of the content of the 2001 SARs and means of access can be found HERE.
Availability
The Individual Licensed SAR is now available from CCSR (a charge may apply). To register for access and for further information on guidance and training, please go to the CCSR website www.ccsr.ac.uk/sars/ and follow the link for access and registration.
The SL-HSAR is now available from the UK Data Archive. Researchers wanting access to the data should follow the link for requesting a download of the data on the Special Licence and complete the Special Licence application form, which will then be assessed by both the UKDA and ONS. You will be notified of the outcome as soon as possible. Users should also note the 'Guide to Good Practice: microdata handling and security' contained in the link above and agree to abide by its requirements.
When applying for access to the SL-HSAR users must already be registered with the Economic and Social Data Service (ESDS) or the Census Registration System (CRS) and have an Athens ID number. Users can apply for ESDS registration via the UK Data Archive website.
The SAM is now available from CCSR (a charge may apply). To register for access and for further information on guidance and training, please go to the CCSR website www.ccsr.ac.uk/sars/ and follow the link for access and registration.
Protecting Confidentiality
The Census Offices have a clear, well published, protocol for protecting the confidentiality of individual information:
...In releasing statistics from the Census, all possible steps will be taken to prevent the inadvertent disclosure of information about identifiable individuals and households.
The Registrars General also have a legal obligation not to reveal information collected in confidence in the Census about individual people and households, and have given public assurances about what this means in practice. In presenting very detailed results from the Census, protecting individual information is of key importance. Traditionally the confidentiality of Census output is protected by a combination of disclosure control methods.
As well as the legal aspect of disclosure control ONS has also stated in the 2001 Census Disclosure Control advisory group paper AG0106 that:
"Maintaining the confidentiality of individual data underpins the trust that exists between data suppliers and any agency that acts as custodian of information about them. At ONS we are fortunate that businesses and the public have confidence that their information is securely held and that we do not release any data that could identify an individual. It is essential that this trust be maintained......".
Protecting the confidentiality of details about individual people becomes more difficult with each Census, as the amount of accessible and publicly available information about individuals increases. More information can now be matched statistically with the Census. Alongside this, for the 2001 Census a larger range of small area statistics has been released, notably because some key measures which were previously obtained from 10 per cent samples were available in 2001 for the whole population. A much wider range of small area information is being published through Neighbourhood Statistics, from public records as well as the Census.
Since 1991 the internet has transformed the potential for making census results widely accessible to citizens. Changing attitudes to the trust in which public agencies are held and concerns about the importance of privacy of personal information also place new and more onerous demands on bodies responsible for protecting such information supplied in confidence.
The general strategy for ensuring the statistical confidentiality of 2001 Census output was stated in the Government's March 1999 White Paper The 2001 Census of Population:
"Precautions will be taken so that published tabulations and abstracts of statistical data do not reveal any information about identifiable individuals or households. Special precautions may apply particularly to statistical output for small areas. Measures to ensure disclosure control will include some, or all, of the following procedures:
restricting the number of output categories into which a variable may be classified, such as aggregated age groups;
where the number of people or households in an area falls below a minimum threshold, the statistical output - except for basic headcounts - will be amalgamated with that for a sufficiently large enough neighbouring area; and/or
modifying the data before the statistics are released."
These considerations have led ONS to reassess how much detail could be released from the 2001 Census. Additional measures have been introduced for tabular output and some restrictions in detail have been applied to the SARs.
Disclosure Risk Assessment
The Economic and Social Research Council, through the Cathie Marsh Centre for Census and Survey Research (CCSR), made a request for 2001 SARs. They also asked ONS to consider the following enhancements to the 1991 SARs specification:
reduce the threshold for the Individual SAR from 120,000 to 90,000 population
increase the sample size for the Individual SAR from 2% to 3%
changes in detail given to some of the variables for example ethnic group, family type and professional qualifications to reflect changes in the information collected in 2001
add extra variables (to reflect the new questions asked in the 2001 Census)
These proposals are based on the paper by Dale & Elliot; 'Proposals for the 2001 SARs: an assessment of disclosure risk'. This paper assessed the risk of disclosure from the SARs and concluded that the risk was very low. It suggested that the 1991 assessment of risk was pessimistic and there was scope for a decrease in the threshold and an increase in the sample size of the individual SAR.
ONS carried out further analysis to assess the risk. In particular, ONS recognised that a risk assessment for the country as a whole would not necessarily allow it to meet the commitments it has made to every individual who completed a Census form. In particular, some individuals are more easily recognisable in the population than others. The Census Offices have a responsibility to protect everyone's information, not just the majority.
ONS also considered how an attempt could be made to identify an individual. It considered what additional information and data would be available to users of the SARs (regardless of whether it was in the public domain) and whether this information could be used to identify an individual in the SARs.
The main elements of the analysis were:
an analysis to determine whether or not a variable should be collapsed, similar to the analysis carried out in 1991. See The 1991 Census User's Guide, Chapter 5.4.4
an analysis of the number and proportion of unique individuals in the sample who are also unique in the population. This looked at the total population as well as groups within it.
an assessment of the risk that an individual within the SARs can be identified by matching the SARs against an external dataset.
This analysis showed that grouping of age, ethnic group and occupation substantially reduced the risk of identifying an individual from the sample. It also showed that the sample size could be increased from 2% to 3%.
ONS also looked at the risk of identifying individuals by matching databases against other sources and whether or not some of the variables may be able to help in confirming the identity of individuals. Variables such as the area classification, communal establishment type and family type were all found to increase the risk significantly by substantially narrowing down the location of an individual or groups of individuals in the population. These variables would either need to be excluded from the SARs or grouped into fewer bands.
A small number of uniques remained in the SARs sample once these checks were completed. In order to further reduce the risk of identification of an individual ONS carried out perturbation of the risky records using the PRAM technique (post-randomisation method). This consisted of changes to certain values in these records, applied by means of record swapping or imputation.
Microdata Laboratory
ONS recognises that recoding of variables will have an impact on the extent of analysis that can be carried out using the 2001 SARs and have made both the individual and household SARs files available in much greater detail. These are, known as the Controlled Access Microdata Sample(s) (CAMS) and are accessible in safe settings in all ONS sites, for approved research projects. Applications for access to these files are assessed by the Census Research Access Board (CRAB). It is hoped that this access will be extended to sites in Edinburgh and Belfast. Once CRAB has approved the application any outputs from analyses carried out on the CAMS will be checked for disclosiveness before they can be removed from the safe setting. More information on applying to use the CAMS datasets is available HERE.
References
The 1991 Census User's Guide, Edited by Dale & Marsh, HMSO
Dale, A. and Elliot, M. J. (2001) Proposals for the 2001 SARs: an assessment of disclosure risk Journal of the Royal Statistical Society, Series A; 164(3), pp 1-21
The 2001 Census of Population (Cm 4253)
Contents of 2001 Individual SAR
Person variables
REGION
Region of Usual Residence (Country for W, NI and S)
AGE
Age
SEX
Sex
MSTATUS
Marital Status
STUDENT
Schoolchild or Student in Full-Time Education
TERMTIME
Term time Address of Students or Schoolchildren
COBIRTH
Country of Birth
ETHEW
Ethnic Group (E, W)
ETHNI
Ethnic Group (NI)
ETHS
Ethnic Group (S)
WLSHREAD
Whether reads Welsh (W)
WLSHSPK
Whether speaks Welsh (W)
WLSHSTND
Whether understands Welsh (W)
WLSHWRIT
Whether writes Welsh (W)
IRISREAD
Whether reads Irish (NI)
IRISSPK
Whether speaks Irish (NI)
IRISSTND
Whether understands Iris (NI)
IRISWRIT
Whether writes Irish (NI)
GAELREAD
Whether reads Gaelic (S)
GAELSPK
Whether speaks Gaelic (S)
GAELSTND
Whether understands Gaelic (S)
GAELWRIT
Whether writes Gaelic (S)
RELIGN
Religion (NI)
HEALTH
General Health Over the Last Twelve Months
PROVCARE
Number of Hours Care Provided per Week
LTILL
Limiting Long Term Illness
MIGIND
Migration Indicator
MIGORGN
Migrants: Area of Former Usual Residence
DISTMOVE
Distance of Move for Migrants
QUALEVEL
Level of Highest Qualifications (E, W, NI)
QUALEVELS
Level of Highest Qualifications (S)
PROFQUAL
Professional Qualification (E, W)
ECONPRIM
Economic Activity (last week)
EVERWORK
Ever Worked
LASTWORK
Year Last Worked
WORKFORC
Size of Work Force
OCCUPATN
Occupation
INDUSTRY
Industry
SUPERVSR
Supervisor/Foreman
WORKPLCE
Workplace
DISTWORK
Distance to Place of Work (Including Place of Study in Scotland)
TRANWORK
Transport to Work (Including Place of Study in Scotland)
HOURS
Hours Worked per week
RELAT
Relationship to HRP
RELGEW
Religion (E, W)
RELGS1
Religion (S)
RELIGN
Religion (N)
Household variables
ACCTYPE
Accommodation Type
SELFCONT
Whether Accommodation Self-Contained
ROOMSNUM
Number of Rooms
BATH
Use of Bath/Shower/Toilet
LOWFLOOR
Lowest Floor Level of Living Accommodation
ROOMSFLR
Number of Floors (NI)
CENHEAT
Central Heating
CARS
Cars/Vans Owned or Available for Use
TENURE
Tenure (E, W)
TENURESNI
Tenure (NI, S)
FURNISH
Whether Accommodation Furnished (S)
CESTTYPE
Type of Communal Establishment
CESTSTAT
Status in Communal Establishment
Derived person variables
DEPEDHUK
Education Deprivation (E, W, NI)
DEPEMHUK
Employment Deprivation (E, W, NI)
DEPHDHUK
Health and Disability Deprivation (E, W, NI)
DEPHSHUK
Housing Deprivation (E, W, NI)
GENIND
Generation Indicator
Derived household variables
RESIDENTS
Number of Usual Residents in household
NELDERLY
Number of persons in household aged 65 or over
NCARERS
Number of Carers in household
NUMHLTHG
Number of household members with poor health
LTILLHH
Number in Household with Limiting Long-term Illness
EARNERS
Number of employed adults in household
STAHUK
Household with Students Away During Term Time
MULTETH
Multiple Ethnicity Household Indicator
SOCGRADE
Social Grade of HRP
NFAMHH
Number of Families in households
FAMTYPE
Family Type
FAMDEPCH
Dependent Children in Family
SEXFAMHD
Sex of FRP
ECPOSFHP
Economic Position of FRP
SCLASSFH
NS-SEC of FRP
DENSITY
Persons per Room
OCCUPNCY
Occupancy Rating of Household
ONCPERIMP
ONC imputed person/household
EDISDONORS
Number of EDIS donors
EDISIMPUTED
Indicator marking records that have been imputed
AREAP
Synthetic indicator of LA
Abbreviations
E
England
W
Wales
NI
Northern Ireland
S
Scotland
EDIS
Edit and Donor Imputation System
FRP
Family Reference Person
HRP
Household Reference Person
LA
Local Authority
NS-SEC
National Statistics Socio-economic Classification
ONC
One Number Census
Further detail of the variables can be found HERE.