Home Page National StatisticsAbout National Statistics & ONS
- Search   - About National Statistics - About ONS - About data  
- Filling in a Survey - Serving the public - Getting users involved  

*geography
- introduction
- geography bulletins
- beginners' guide to
UK geography
(includes UK maps)
- area names & codes
- geographic initiatives
- super output areas
- geographic products
- best practice guidance
- rural/urban classification
- ONS geography
- frequently asked questions
- contacts
- links to other sites
- glossary/index
- Open consultations
* geography
 

Beginners Guide to Geographic Referencing

Preface
Introduction
Postcode Referencing
Geographic Referencing
Conclusions

1. Preface

Geographic referencing (or 'georeferencing') is an increasingly important process in the production of National Statistics, allowing greater data accuracy and facilitating the sharing and aggregation of data. This 'Beginners Guide to Geographic Referencing' describes the process and explains why geographic referencing is an improvement on the existing process of postcode referencing.

2. Introduction

The production of National Statistics involves the collection, processing and output of statistical data. Most data events can be referenced to a known location and this means that most statistics can be output using a geographic classification. For example, we might produce statistics of unemployment rate by electoral ward, or birth rate by local authority district.

Since the late 1970s the approach to data referencing has been to use the event postcode, as described in section 3. Although this has been a valuable method, it is not without its limitations, and we are therefore moving towards a new approach, geographic referencing. This involves referencing events to a specific and fixed point, usually a grid reference; the many advantages of this are described in section 4.

3. Postcode Referencing

The traditional method of referencing data to the event postcode has a number of advantages:

  • Most people know their postcode so can readily supply it when responding to a survey.
  • Postcode directories (such as the National Statistics Postcode Directory) can be used as a ready means of matching each postcode to a range of geographic areas. An example of this is shown opposite. In this case the postcode has been matched to a county, but the directories go down to electoral ward level and also include various non-administrative geographies.
  • postcode referencing example

    3.1. Problems with postcode referencing

    Although postcode referencing is very straightforward, it has a number of key weaknesses:

    3.1.1. Postcodes do not map directly to other geographic areas

    Postcodes areas do not take account of administrative boundaries (or any other geography). This 'straddling' of boundaries means that many postcodes can only be assigned to administrative areas on a 'best fit' basis. The result is that addresses lying close to administrative boundaries are sometimes assigned to the wrong area. For small areas such as electoral wards the resulting statistical errors can sometimes be considerable. Fortunately the errors are less significant for larger areas as:

    • There are proportionally fewer postcodes straddling the boundaries.
    • The errors are more likely to be cancelled out as data which are wrongly allocated to one area may be balanced by an opposite misallocation elsewhere on the area boundary. This cancellation effect is even stronger in datasets with a large number of observations.
    3.1.2. Postcodes can move around

    Royal Mail assigns postcodes to address locations for the sole purpose of providing an efficient mail delivery service. Postcodes may be discarded, reassigned and reused as a result of demolitions and new building activity.

    Although ONS Geography maintains a database of discarded postcodes, this cannot by itself be relied upon to provide an accurate locational reference. Royal Mail may occasionally decide to reuse these discarded postcodes in another part of the same postcode sector and thus the physical location of a postcode may shift (see figure 2). This could cause data to be assigned to the wrong area unless care is taken to use the correct year's directory (note though that Royal Mail will not reuse a postcode for at least 2 years after it has been discarded).
    example of reused postcode

    3.1.3 Area boundaries keep changing

    The UK has a very high level of electoral and administrative boundary change - for example, between 1991 and 2000 there were over 3000 electoral ward/division boundary changes in England and Wales alone. This further complicates postcode to area referencing.

    Take Figure 3 for example. In this case the configuration of the postcodes is such that all properties have been allocated to the correct ward. Once the ward boundary has changed however then the allocation of some properties is incorrect. In addition, when the next version of the postcode directory is released, it will once again be affected by straddling. All properties in the split postcode will end up referenced to either Ward A or Ward B, and this means that a proportion of them are bound to be wrong.

    postcodes and boundary change

    3.2. Postcode referencing: Conclusion

    Postcode referencing is a straightforward approach but has a number of weaknesses relating both to the unstable nature of UK geography and also the fact that postcode boundaries do not match up to those of other geographic areas. In general these problems are relatively insignificant when dealing with large areas, but can be substantial for small areas. With the advent of Neighbourhood Statistics and the associated demand for small area statistics, a better method of referencing is required. We are therefore moving towards geographic referencing.

    4. Geographic Referencing

    As indicated, referencing to postcodes has a number of limitations. If however we can reference to something which is fixed - eg a grid reference, the problems are reduced. There is also better potential for data visualisation as grid-referenced events can be located on a map and viewed in relation to other geographic features including administrative areas and boundaries, as well as physical features such as roads, coastline and buildings. As well as simply viewing the data, we also have the potential to use Geographic Information Systems (GIS) to carry out detailed analysis and modelling. We can also readily link between different datasets, as we simply need to identify events with a common grid reference.

    There are a number of possibilities for geographic referencing:

    4.1.1. Geographic referencing using the postcode centroid

    Under the Gridlink® initiative, ONS Geography's postcode directories now provide the grid reference of the property closest to the postcode centroid (the geographic centre of the postcode). This is a good start, and may be the most accurate reference possible as we may not have any more detailed locational information for the data event.

    However, although we can relate the grid reference of the postcode centroid to a map, and perform detailed analysis on the associated data, this method does not solve the problems of straddling and boundary change. In Figure 4 for example then the postcode centroid is in Ward A but 3 properties in the postcode actually fall in Ward B. Any data collected from these properties will end up wrongly allocated.
    delivery pointallocation to wards using postcode grid references

    4.1.2. Geographic referencing using address-level grid references

    Address-level grid referencing, which we are working towards, is even more powerful. Whereas the postcode centroid gives an approximate location of a data event, the address-level grid reference describes precisely where it occurs. This has several advantages:

  • Straddling is no longer an issue as postcodes are no longer considered. As can be seen in figure 5, event addresses are guaranteed to be allocated to the correct geographic areas.
  • Dealing with administrative boundary change is even easier. We simply load the new boundary set into a GIS and, knowing the events are precisely located, can very quickly produce accurate statistics for the new boundaries.
  • Outputs and analysis can be even more flexible. For example, if we wanted to consider whether there is a relationship between how close people live to a motorway and the incidence of a particular disease, our data is now referenced with the accuracy required to do this.
  • delivery pointallocation to wards using postcode grid references

    Note however that although address-level grid-referencing is powerful, it does have limitations:

    • Not all data can be assigned to an address - see below.
    • Automated assignation of grid references to addresses is more difficult than it is for postcodes. This is because, unlike postcodes, addresses can be lengthy, complicated and inconsistent. For example, the first line of an address may be a building number and street name, the number of a flat within a building, or the name of a property.
    • As data relates to individual addresses, so greater security precautions may be required to protect the confidentiality of individuals.

    4.2. Other forms of locational referencing

    Address-level grid referencing is appropriate for data events that relate to residential and business properties, but some events relate to other types of location. For example, if the data event is the occurrence of a specific type of cereal crop, the location will be a field. Such events can be assigned to land parcels via identifiers such as Ordnance Survey's topographic identifiers (TOIDs) or Land Registry parcel boundaries. Other events (eg the location of a street crime) may simply need a grid reference. An alternative might be to reference to the nearest address. The key point though is that all data needs suitable, consistent and unambiguous geographic identifiers.

    5. Conclusions

    The approach of using postcodes to reference geographic data has been a valuable tool but is subject to a number of serious limitations, especially when trying to produce statistics for small areas. The move towards geographic referencing based on the postcode centroid offers many advantages in terms of facilitating event linkage, data visualisation and data analysis, but doesn't eliminate the problems caused by straddling and boundary change. If a reference can be given at address-level however, something we are working towards, the potential is even greater, allowing for detailed and accurate small-area statistics.

    Different types of data will of course require different types of referencing, and issues such as ensuring confidentiality are crucial. The Office for National Statistics is therefore giving a great deal of attention to ensuring that we utilise geographic referencing in the best possible way. The result should be a major contribution to the quality of UK statistics.

    top

    This page last revised: Tuesday, 25 April 2006

    Feedback or Enquiries | Copyright | Terms and Conditions | Privacy Statement | Link to Directgov