Methods for Area classification for statistical wards
Introduction
The aim of this paper is to discuss the methodology used for the classification of statistical wards. The classification places each ward in a group with those other wards to which it is most similar in terms of census variables. This enables similar areas to be classified according to their particular combination of characteristics.
Choice of variables
The analysis was carried out using the Key Statistics Tables produced from the census data. The variables are socio-economic and demographic. The chosen variables cover the six dimensions: demographic structure, household composition, housing, socio-economic, employment and industry sector. Strongly correlated variables were removed to avoid duplication of particular factors. This allowed the minimum number of variables to be chosen so that the six main census dimensions were represented using the available data. For full details see the variable selection paper.
Naming Procedure
The names for the supergroups and groups were chosen using the characteristics from the radar charts and detailed examination of the area classification maps super-imposed on to background mapping to enable us to identify specific areas. Three brainstorming meetings were held to examine the maps and radar charts and to discuss ideas. These meetings included experts from ONS Geography who provided a good knowledge of geography and background of specific areas. Preliminary names were also sent to the Advisory Board for their comments and suggestions from the Advisory Board were discussed in the final brainstorming meeting.
Statistical methodology
The ONS classification is a hierarchical classification into supergroups, groups and subgroups using clustering techniques.
Standardising data
All clustering techniques were based on the similarity or dissimilarity of the cases to be clustered. This was measured by constructing a distance matrix reflecting all the variables in the data set for each case. It is clear that problems will occur if there are differing scales or magnitudes among the variables. In general, variables with larger dispersion (i.e. those with larger standard deviations) have more impact on the final similarity measure. It was necessary therefore to make each variable equally represented in the distance measure by standardising the data.
Three methods of standardisation were considered:
Z-score standardisation
This is the most common form of standardisation. Z-score standardisation compares each value of a variable, Xi to the mean, X . This is then divided by the standard deviation of each variable, . Z-score standardisation works well when the data are normally distributed, however, data may not always be normally distributed.
Xi - X
Range standardisation
This method of standardisation was implemented in the 1991 classification, see Wallace and Denham (1996). Range standardisation compares each value of a variable, Xi to the minimum, Xmin . This is then divided by the distance between the minimum, Xmin , and the maximum, Xmax , of the variable. This method does not work well if the data contain outliers.
Xi - Xmin
Xmax - Xmin
Inter-decile range standardisation
This method is a slight variation of the range standardisation method that overcomes the problems associated with outliers. This method compares each value of a variable, Xi, to the median, Xmed, which is then divided by the distance between the 90th percentile, X90th, and the 10th percentile, X10th.
Xi - Xmed
X90th - X10th
Initial experiments using the inter-decile range standardised data revealed that variables with a highly skewed distribution were driving the classification, that is the skewed variables were given too much weight by the inter-decile range standardisation. This problem was resolved when the data were standardised using the range standardisation method. Therefore, we use the range standardisation method to standardise the ward level data. Note that this is different from the standardisation method used for the Local Authority and Health Area level data where inter-decile range standardisation worked best. The z-score method had been shown not to work well when the 1991 classification was done, due to the fact that for many variables the distribution was not normal, see Wallace and Denham (1996).
Defining similarity
Once each of the variables have been appropriately standardised it is necessary to determine how ‘close’ cases are to each other, or how far apart they are. Most methods of cluster analysis begin with a matrix reflecting a quantitative measure of similarity between each pair of objects to be classified. Two cases are said to be ‘close’ when their distance is small or their similarity is large. There are many different measures that can be used to quantify distance, with the Euclidean distance and the squared Euclidean distance being two of the most common for continuous data. We use the squared Euclidean distance as a measure of similarity as this measure is the recommended distance measure for Ward's method, see Kaufman and Rousseeuw (1990) and Everitt, Landau and Leese (2001).
Defining the clustering technique
Although there are various methods of hierarchical cluster analysis available, they fall into two main types of clustering: agglomerative methods, which proceed by a series of fusions, or divisive methods, which separate groups into finer groupings. Agglomerative procedures are probably the most widely used of the hierarchical methods and were used for the ONS classifications.
Agglomerative clustering
Agglomerative techniques are the most common method for forming clusters, and of these, Ward's method is the most commonly used (see Everitt, 1993). It produces spherical clusters which are roughly the same size. The aim is to join objects together into ever increasing sizes of clusters using a measure of similarity of distance. This is a bottom up approach where we start with n groups each containing one object. Two of the cases are then combined into a single cluster. At the next stage, either a third case is added to the cluster containing two cases or two other cases are merged into a new cluster. This process continues until all cases belong to one cluster. Once a cluster is formed it cannot be split, it can only be combined with other clusters. In addition to choosing the similarity or dissimilarity measure to use in comparing two observations, there is also the choice of what should be compared between groups that contain more than one observation. This is referred to as the linkage method. Ward’s method joins the two groups that minimise the error sum of squares (i.e. the within-cluster sum of squares). Due to the agglomerative nature of Ward's method the cluster means change as new cases are added. This might mean that by the end of the process some cases are no longer in the correct cluster. The solution given by Ward's method can be refined using k-means.
K-means refinement
This is a simple non-parametric clustering method that minimises the within-cluster variability and maximises the between cluster variability. The k-means method requires that the number of clusters be specified beforehand. It is an iterative relocation algorithm based on an error sum of squares. The algorithm repeatedly moves a case from one cluster to another to see if the move improves the sum of squares within each cluster. The case is assigned/re-allocated to the cluster to which it brings the greatest improvement. When all cases have been processed, the algorithm moves to the next iteration. A stable classification is reached when there are no more moves in a complete iteration.
Classification of statistical wards
The 1991 classification of wards was based on a sample from all wards. A sample of wards was selected and classified into clusters. The remaining wards were then allocated to whichever of the clusters they were most similar to. The disadvantage with the sampling based method is that there are risks associated with the unknown sampling properties of the resulting classification. There is a risk of bias or types of area being missed.
For the 2001 classification we will adopted a different approach that was recommended by a member of the Advisory Board, see Charlton, Openshaw and Wymer (1985) for further details. The procedure is explained in the following steps:
We started by generating a random classification of all wards into 1000 clusters.
We then used the k-means method with the initial cluster centres from the random classification as the starting point to reach the optimum 1000-cluster solution.
Ward’s method was then applied to the resulting 1000 clusters from k-means.
We determined the number of supergroups, groups and subgroups by examining the agglomeration schedule, see Figure 1 below. This was used when determining the cut-off points of 9,17 and 26.
The subgroups obtained from Ward’s method were refined using k-means to ensure that each ward was assigned to its correct subgroup.
The other levels were obtained using the hierarchy obtained from Ward’s method.
Ideally, the initial number of clusters at step 1 should be as large as possible. Initial experiments using 500 and 1000 clusters were carried out and the 1000 cluster solutions produced more meaningful classifications. Choice of the initial number of clusters is important but usually the results are not too dependent upon it, see Charlton, Openshaw and Wymer (1985).
Further explanation for k-means refinement to the results produced by Ward's method
Due to the agglomerative nature of Ward’s method the subgroup centroids change as new wards are added, but the process does not allow individual wards to be re-allocated to their nearest subgroup. This might mean that by the end of the process, some wards are more similar to wards in other subgroups than they are to wards in their own subgroup. The k-means procedure reallocates wards to their correct subgroup. The centroids used to begin the process will be the centroids obtained using Ward’s method at the subgroup level. The process is iterative and continues until a stable result is achieved. The procedure is not carried out at the group or supergroup level as it is necessary to retain the hierarchical structure. Some wards however may be reassigned to new groups/supergroups if they were reassigned to subgroups that were in different groups/supergroups than they were using Ward’s method.
Agglomeration schedule for the classification of statistical wards
An agglomeration schedule for ward clusters is shown in the chart below. This shows the difference in squared Euclidean distance between clusters at each successive stage of the Ward's method. Optimal levels can be identified on the basis of where there is a natural levelling off in the slope of the line.
References
M. Charlton, S. Openshaw and C. Wymer (1985) Some new classifications of census Enumeration Districts in Britain. A poor man's ACORN. Journal of Economic and Social Measurement Vol 13:69-96.
B. S. Everitt (1993) Cluster Analysis.London: Edward Arnold.
B. S. Everitt, S. Landau and M. Leese (2001) Cluster Analysis.London: Edward Arnold.
L. Kaufman and P. J. Rousseeuw (1990) Finding Groups in Data.New York: John Wiley & Sons.
M. Wallace and C. Denham (1996) The ONS classification of local and health authorities of Great Britain. Studies on Medical and Population Subjects, ONS. Number 59.