Dataset / Tabular

OHS-LFS Consistent Series Weights 1994-2007 (South Africa)

Abstract

One focus of post apartheid research in South Africa is change. Questions include the progress of South Africa in the economic, social and political arena. National datasets such as the October Household Surveys (OHS) and Labour Force Surveys (LFS) provide a rich source of information on both economic and social variables in a cross sectional framework. These datasets are repeated annually or biannually and therefore have the potential to highlight changes over time. Yet to treat the cross sectional national data as a time series requires that, when stacked side by side, the data produce realistic trends. Since these data were not designed to be used as a time series, there are changes in sample design, the interview process and shifts in the sampling frame which can cause unrealistic changes in aggregates over a short period of time. This raises concerns about the validity of using these datasets as a time
series to examine change.

The aggregate trends calculated from the OHS and LFS show the data to be both temporally and internally inconsistent. Examining the weights given in the datasets, in addition to the public documentation, it is clear that the Statistics South Africa (StatsSA) household and person weights are not simple design weights i.e. inverse inclusion probability weights. StatsSA poststratifies the person design weight to external population totals. Since the data are cross sectional the intention of the post-stratification adjustment is to produce best estimates of the population given the information available at the time and temporal consistency is not considered. This creates problems when the data is used as a time series.

A project was thus undertaken by Nicola Branson at the University of Cape Town, with a scholarship from DataFirst as part of DataFirst's Data Quality Project, funded by the Mellon Foundation. to design a new set of person and household weights for the OHS 1994-1999 and the LFS 2000-2007. These weights are generated using an entropy estimation technique. The new weights result in consistent demographic and geographic trends and greater consistency between person and household level analysis.

This dataset consists of the cross-entrophy weights and the research resources used to construct them, including the syntax files, as well as background documentation on the project, and other research output. These should be used with the OHS and LFS data available from the data portal.