Methods for National Statistics 2001 area classification for health areas
Introduction
This paper outlines the methodology used in the 2001 area classification for health areas. This includes the process undertaken to select a variable set and the clustering techniques used to create the classification. The health areas used in the classification are Primary Care Organisations in England, Local Health Boards in Wales, Health Board areas in Scotland and Health and Social Services Boards in Northern Ireland. The same method was used for the updated health area classification in England as for the previous one and the classification remains unchanged in Wales, Scotland and Northern Ireland.
Methods
Variable Selection
The variables chosen for the health area classification were necessarily the same as the local authority classification because we are assigning health areas to clusters based on the twenty-four subgroup centroids of the local authority classification.
Standardising data
All clustering techniques are based on the similarity or dissimilarity of the cases to be clustered. This is measured by constructing a distance matrix reflecting all the variables in the data set for each case. It is clear that problems will occur if there are differing scales or magnitudes among the variables. In general, variables with larger values and greater variation will have more impact on the final similarity measure. It is necessary to therefore make each variable equally represented in the distance measure by standardising the data.
The standardisation method used in the local authority classification was an inter-decile range standardisation and this needs to be taken into account when we are producing the health area classification. Due to the method being used to create the health area classification the denominator in the standardisation formula must be the same as that used in the local authority classification so that the variables are on the same scale as the classification it is being assigned to. The standardisation method compares each health area's value, Xi, for each variable to the UK median for local authorities, XLA- med, and is then divided by the distance between the 90th percentile, XLA-90th, and the 10th percentile, XLA-10th. This method was also used in the 1991 area classification.
The formula is therefore:
Xi - XLA-med
XLA-90th - XLA-10th
This method measures the deviation from the local authority median and this makes the health area data more consistent with the local authority classification.
Distance measure
Once each of the variables have been appropriately standardised it is necessary to determine how 'close' cases are to each other, or how far apart they are. Most methods of cluster analysis begin with a matrix reflecting a quantitative measure of similarity for each case. This is more commonly referred to as a similarity, distance, dissimilarity or proximity matrix. Two cases are said to be ‘close’ when their dissimilarity or distance is small or their similarity is large. There are many different measures that can be used to quantify proximity with the Euclidean distance and the squared Euclidean distance (SED) being two of the most common. We have used the SED as a measure of similarity.
Two health areas X and Y, are said to be similar if the 'distance' between them, based on census characteristics is small. It uses the following formula:
42
Σ
(Xi-Yi)2
where Xi = value of variable i for health area X and Yi = value of variable i for health area Y
i-1
so that the distance between the two health areas is the sum of the squared differences between their values for each and every variable.
Assignment to clusters
The health area classification was produced by assigning each health area to the 'closest' one of the twenty-four local authority subgroups. This was defined as the subroup whose centroid was the smallest squared Euclidean distance from the health authority. The higher levels of the classification were then created using the hierarchy from the local authority classification to allocate cluster members at the group and supergroup level. For example, if a health area belongs to subgroup "Regional Cities - A" then it will also belong to the group "Regional Cities" and the Supergroup "Cities and Services".