SOC 2000 Backcasting
Introduction
The Standard Occupation Classification (SOC) 2000 that replaces SOC 90 was introduced to the Labour Force Survey (LFS) from spring 2001. For general information about SOC 2000 click HERE.
When new questions and classifications are introduced to the LFS, it is normal practice not to release these for public use until they have been quality assured over a number of quarters. However for SOC 2000, the questions and methods to code the classification are well established, it is only the categories of the classification that are new.
Though the new classification still has nine major groups, there have been considerable changes in the structure and composition of the classification. Therefore a meaningful comparison of results based on one classification with those based on the other is not possible. This is a problem if one wants to compare data over time.
To overcome this problem of comparability two solutions are available. The first solution is to code the historical micro-data to the new classification. However, this is a very time consuming and costly operation. The second solution is to code certain data sources to both the classifications. These dual-coded datasets can be used to estimate the correspondence between the two classifications. These correspondences can then be used to back-cast historical data at an aggregated level. This solution is quicker and easier but has its problems which are outlined HERE.
The ONS made the decision to dual code the LFS summer 2000 quarter to both SOC 90 and SOC 2000. Further details of this dual coding exercise can be found in an article in the July 2001 edition of LMT.
Apart from this dual-coded quarter other dual-coded LFS data were available. Analysis of these various dual-coded data showed that the LFS winter 2000/2001 quarter provided the best estimates to base the backcasting probabilities on.
Methodology
Matrices showing the correspondence between SOC90 and SOC2000 derived from the LFS winter 2000/2001 dual coded quarter have been used to backcast the historical time-series.
Where individuals in the LFS winter 2000/2001 dual coded quarter had codes assigned on both SOC 90 and SOC 2000, the observed relationship was included in a matrix. The cell counts in these matrices were then calculated as percentages, representing the proportional relationship to SOC 2000 of each SOC 90 minor group. Each cell in the resulting matrix shows the probability of how many observations in a given category of the old classification would be classified in a specific category of the new classification.
Separate matrices have been calculated for each economic activity group at the lowest level with a full time / part time and gender split. Using this method preserved the distinct occupational characteristics of each group. For example, the distribution of part-time workers shows a smaller percentage in manager occupations than the equivalent proportion of managers among those who are full-time workers.
The SOC 2000 probability distributions for each SOC90 category were then applied to other datasets as a proxy for what respondents would have been coded to under SOC 2000.
The estimates provided using the matrices from LFS winter 2000/2001 quarter are considered the best available. However, any methodology using only the one time period as a proxy for the relationship in other periods, will be subject to a number of quality issues that users should take into consideration before using the data.
Transformation matrices for SOC 2000 - quality issues
Caution should be exercised when analysing or interpreting the backcasted data series. This section presents a number of issues to be considered in respect to data quality.
Modal differences
The dual coded LFS winter 2000/2001 quarter which produced the matrix with correspondences cannot replicate the exact method of classification that SOC 2000 used for the LFS spring 2001 quarter. Two different coding systems were being used for the two quarters which meant there were minor differences in the on-screen information available to coders. These differences were mainly linked to information on supervisory and managerial duties. This difference may have caused discontinuities. This would be particularly true for areas where the classifications have seen the most change, e.g. major groups 1,4, and 7.
Sampling error
The LFS is a sample survey so the data are subject to sampling error. Estimates based on smaller subgroups tend to have larger relative sampling errors, although sampling errors also depend on the way the sample and population are distributed. Therefore, both the data from previous time periods being transformed into SOC 2000 and the probabilities based on the data from the dual coded dataset are subject to sampling error.
Coder error
In addition to sampling error in the dual coded dataset, the observed relationship in the LFS winter 2000/2001 quarter will be affected by coder variance. Occupational information on the LFS is coded to SOC by interviewers so there will be a certain amount of variation in the way interviewers assign SOC codes. This will affect the distributions in the probability matrix and the historic time series data.
Seasonality
It is also difficult to assess whether any seasonal differences affect the use of a probability matrix based on only one data quarter. The SOC 2000 major group 5 (skilled trade occupations), which includes such occupations as skilled farm and construction workers, does show a seasonal pattern in data produced from a transitional matrix.
The strength of the seasonal pattern is as much dependent on the clarity of the relationship between the categories in the two classifications, as it is on the seasonal changes in numbers for that group. Thus, if a specific SOC 90 category only corresponds to one category in SOC 2000, the seasonal pattern will be replicated in its entirety, even though the relationship was based on data from only one time point ( LFS winter 2000/2001 dual coded quarter). However, if the SOC 90 group is spread over several SOC 2000 groups, then the seasonal pattern will also be diffused. Therefore, basing the relationship on only one time point i.e. LFS winter 2000/2001 quarter may affect the results.
Changing Occupational structure
Over time the structure of industry changes and therefore people's occupations also change. Therefore it is not meaningful to apply a classification with new occupations to data for a time period which did not have these new occupations. This problem will increase the further back in time data are backcasted. In balancing this risk and users' interests in the time series of data, the ONS has estimated the occupations under the new classification from LFS spring 1995 quarter to LFS winter 2000/2001 quarter.
As can be shown in the LFS 1996/1997 winter quarter that was also recoded to SOC 2000, (Further details of this dual coding exercise can be found in an article in the July 2001 edition of LMT.) the distribution of occupational groups has not changed significantly over the intervening period. Therefore, the matrix based on LFS winter 2000/2001 quarter should reasonably reflect, in most cases, the likely relationship between SOC 90 and SOC 2000 for these earlier periods.
Other issues
The probabilities between SOC 90 and SOC 2000 for LFS winter 2000/2001 quarter, were computed based on unweighted data because we want internal correspondences between two classifications. However, backcasting data could be affected if any given relationship between two classifications in the correspondence tables used was over or under represented.
It is possible this could occur because the data used in unweighted form would not correct for response differences in the UK. Such differences in response rates in different parts of the UK may lead to more subtle relationships being affected. E.g. If an area such as the North East, which is rich in energy intensive industries, had a high response rate, while inner London, which has less of these industries, had a lower response rate, then it is possible occupations typical to these energy intensive industries in North East would suppress more subtle relationships for similar occupations, originally in the same SOC 90 group from inner London. This would occur simply because there were a disproportionately high number of people from the North East in the sample.
Observations
When comparing the spring data and the historic datasets, it can be observed from the estimates that there are some discontinuities in distribution. This difference in distribution is in groups 4 (administrative and secretarial) and 7 (sales and customer services) where the historic data is of a lower level.
The majority of these unexplained change in levels from the historic time-series to the LFS spring 2001quarter could be attributable to one or more of the quality issues mentioned above. It could be an unusual movement or sampling error in the spring data. However, the differences are small and the time series is broadly consistent over the time periods.
SOC 2000 backcasting tables
The full range of backcasting tables available includes people in employment, employees, self-employed, full and part-time workers, temporary workers and second jobs. Also, people that are long term unemployed and ILO unemployed by previous occupation are available. These are all split by gender, at the 1 (major) and 2 (sub-major) digit level and cover the periods from LFS spring 1995 quarter - LFS autumn 2001 quarter. This is the only backcast data that is available.
The full range of backcasting tables can be found by clicking on the links below.
1 Digit (major) level (169Kb excel)
2 Digit (sub-major) level (353Kb excel)
Further information
For general information on methodology and background to the LFS, please see User Guide Volume 1.
For information on the structure of the SOC 2000 classifications, click HERE.
If you have any specific comments or questions on the SOC 2000 backcasting please contact Kim Johnson at kimberley.johnson@ons.gov.uk
top
This page last revised: Friday, 8 March 2002