Variability in the Census Origin-Destination counts
Census counts of the flows of people migrating, or travelling to work, from area to area, were released in the Origin-Destination Statistics for Output Areas in May 2004, for wards in July 2004 and for local authorities in October 2004. As with all Census output, small counts in the tables were adjusted before release to protect confidentiality. This adjustment has a greater effect on the variability of Origin-Destination statistics because of the large number of small counts compared with other output.
This note provides guidance on the level of variability which can be expected in the Origin-Destination statistics, and on how this variability can be minimised. The guidance is based on an analysis of tables MG201 and W206 for several local authorities, and a full description of this analysis is available as a pdf document HERE.
How the counts are adjusted
Small counts are adjusted or 'perturbed' to protect confidentiality. The cells with small values were adjusted independently upwards or downwards according to prescribed probabilities. The scheme was designed so that perturbations should have an expected mean of zero (that is, the adjustment does not introduce systematic biases into the counts) and a variance (a measure of how much the perturbed counts vary from the true value) that is proportional to the number of cells that were adjusted. The more cells adjusted, the larger the possible variations from the true values.
The probability mechanism that was devised for carrying out the perturbations is described in full in the linked note. The method ensures the same adjustment outcome in any future reproduction of the table. One result of this method is that a flow which appears, for example, in one table as 3 people moving from one area to another may be adjusted to zero in another table, and thus 'disappears' from the second table.
Possible bias in the adjusted data
Several matrices relating to migration flows were checked for any evidence of bias - that is, the adjustments to the data tended to be more one way than the other. Statistical tests suggest that, in some cases, there is a very small amount of bias because of the nature of the random processes used in the adjustment. The reasons relate to the adjustments in the Origin-Destination matrices, but do not apply to tother sets of output such as Standard Tables or Census Area Statistics.
Variability in the adjusted data
The adjustments made to the data to protect confidentiality mean that many counts in the Origin-Destination Matrices will differ from the underlying 'true' counts. This explains why counts produced by aggregating several values in an Origin-Destination Matrix (say, all values relating to migration into a particular Output Area) can be different from corresponding counts in the Census Area Statistics tables.
The statistical technique of regression modelling can be used to indicate the extent to which such derived totals can be expected to differ from the underlying counts. Such a regression model was produced by comparing the underlying counts with those produced by aggregating entries in the migration table MG201 for wards within two local authorities and the workplace table W206 for wards within seven local authorities. More detail on how the model was produced is available in the full linked note.
The model allowed the calculation of a predicted 'perturbation interval'. The perturbation interval is the range in which it is possible to be 90 per cent confident the underlying value occurs for any total produced by adding together entries in table MG201.
Where adjusted values are added together to produce a perturbed total T, the perturbation interval can be expressed algebraically as:
For the migration table:
Upper bound of prediction interval = T + 5.519 T 0.302
Lower bound of prediction interval = T - 1.787 T 0.530
For the work place table:
Upper bound of prediction interval = T + 1.717 T 0.462
Lower bound of prediction interval = T - 3.196 T 0.315
Alternatively, the perturbation interval can be approximated using the tables below. Table 1 indicates, for example, that, if a total of 750 is obtained by adding together entries in table MG201, it is 90 per cent sure that the underlying count is between 691 and 791.
Table 1: 90 per cent Perturbation Interval for Specified Perturbed Totals from MG201
Perturbed Total
Prediction Interval
Perturbed Total
Prediction Interval
Lower Bound
Upper Bound
Lower Bound
Upper Bound
5
1
14
380
339
413
10
4
21
400
357
434
15
8
28
420
376
454
20
11
34
440
395
475
30
19
45
460
414
495
40
27
57
480
433
516
50
36
68
500
452
536
60
44
79
550
500
587
70
53
90
600
547
638
80
62
101
650
595
689
90
71
111
700
643
740
100
80
122
750
691
791
120
97
143
800
738
842
140
116
165
850
786
892
160
134
186
900
834
943
180
152
207
950
883
994
200
170
227
1,000
931
1,045
220
189
248
1,500
1,414
1,550
240
207
269
2,000
1,900
2,055
260
226
290
2,500
2,387
2,559
280
245
310
3,000
2,876
3,062
300
263
331
3,500
3,366
3,565
320
282
352
4,000
3,856
4,068
340
301
372
4,500
4,346
4,570
360
320
393
5,000
4,838
5,072
Table 2: 90 per cent Perturbation Interval for Specified Perturbed Totals from W206
Perturbed Total
Prediction Interval
Perturbed Total
Prediction Interval
Lower Bound
Upper Bound
Lower Bound
Upper Bound
5
0
9
500
477
530
10
3
15
550
527
582
20
12
27
600
576
633
30
21
38
650
625
684
40
30
49
700
675
735
50
39
60
750
724
786
60
48
71
800
774
838
70
58
82
850
823
889
80
67
93
900
873
940
90
77
104
950
922
991
100
86
114
1,000
972
1,042
120
106
136
1,250
1,220
1,296
140
125
157
1,500
1,468
1,550
160
144
178
1,750
1,716
1,804
180
164
199
2,000
1,965
2,057
200
183
220
2,250
2,214
2,311
220
202
241
2,500
2,462
2,564
240
222
262
2,750
2,711
2,816
260
242
282
3,000
2,960
3,069
280
261
303
3,500
3,458
3,574
300
281
324
4,000
3,956
4,079
320
300
345
4,500
4,455
4,583
340
320
365
5,000
4,953
5,087
360
340
386
5,500
5,452
5,591
380
359
407
6,000
5,950
6,095
400
379
427
6,500
6,449
6,599
420
399
448
7,000
6,948
7,102
440
418
468
7,500
7,447
7,606
460
438
489
8,000
7,946
8,109
480
458
510
9,000
8,944
9,115
Advice for users
Users of the Origin-Destination Statistics may find the following advice helpful:
Variation from an underlying true value exists in all Census results for a variety of reasons. Whilst the Origin-Destination Statistics are more affected than other tables by adjustments to the data to protect confidentiality, they are more reliable than similar results published in 1991 which were only based on 10 per cent of the total population.
The variability in the aggregated totals can be minimised by using the highest level geography possible - for example, deriving results for a Government Office Region by aggregating counts for local authorities rather than Output Areas.
In addition, the most accurate count of the overall flow between two areas is most likely to be contained in the table with the fewest cells. So, for local authorities the most reliable tables to use to estimate the flow between two areas are MG103 (migration) and W107 (workplace). For wards, the corresponding tables are MG203 (migration) and W206 (workplace). For sub-groups of the population, choose the table that has the fewest possible cells that need to be aggregated to obtain the overall flow.