| Abstract |
In this paper, we advance four propositions.
First, the availability of a scanner dataset opens opportunities to an official statistical agency, including :
- improving current CPI methods and practices — continuing to collect price data directly, but using scanner-based research to tune the design of the collection, index construction methods and data treatments (such as editing and imputation)
- data substitution — ceasing direct price collection for some segments of the CPI, and using scanner data instead
- data augmentation — using both directly collected and scanner data to compile segments of the CPI.
Another possibility being investigated by the ABS is using other by-product data (such as price quotes for Internet shopping or supermarket chains' price schedules) to substitute for or to augment the directly collected data. Even if the scanner data are not themselves used in the CPI, scanner-based research may guide the design of such mixed-data CPI compilation.
Second, using scanner data imposes costs on an official statistical agency, such as the costs of :
- acquiring the data — either drawing data directly from individual stores or chains or purchasing data from a commercial clearing house
- redesigning compilation practices — including practices for vetting the data, dealing with missing observations, dealing with quality change and other changes to item specifications, and so on
- retraining price statisticians in the new compilation practices
- reworking the mathematics of CPI construction — such as the microindex formulae and the aggregation tree and aggregation formulae
- redeveloping computer systems
- understanding the effects of all these changes on the published CPI and explaining them to users.
Third, an official statistical agency must undertake (or commission from others) extensive research to understand the best ways of using scanner data and the consequences of doing so. In this paper, we outline some varieties of research projects that might be needed, and summarise some of the understanding achieved so far.
Fourth, research based on scanner data is important, exciting, difficult and expensive. Many official statistical agencies and some academic institutions are undertaking research of this kind, and the findings are being shared through exchange of papers and through discussion at conferences. But, in our view, the time is ripe for a more systematic exchange of research agendas and, perhaps, for more cross-agency collaboration in pursuing those agendas. This would maximise the benefits to all. |