The Use of Discrete Data in PCA: Theory, Simulations, and Applications to Socioeconomic Indices


PDF document icon wp-04-85.pdf — PDF document, 1,896 kB (1,942,401 bytes)

Author(s): Kolenikov S, Angeles G

Year: 2004

The Use of Discrete Data in PCA: Theory, Simulations, and Applications to Socioeconomic Indices Abstract:
The last several years have seen a growth in the number of publications in economics that use principal component analysis (PCA), especially in the area of welfare studies. This paper gives an introduction into the principal component analysis and describes how the discrete data can be incorporated into it. The effects of discreteness of the observed variables on the PCA are overviewed. The concepts of polychoric and polyserial correlations are introduced with appropriate references to the existing literature demonstrating their statistical properties. A large simulation study is carried out to shed light on some of the issues raised in the theoretical part of the paper. The simulation results show that the currently used method of running PCA on a set of dummy variables as proposed by Filmer & Pritchett (2001) is inferior to other methods for analyzing discrete data, both simple such as using ordinal variables, and more sophisticated such as using the polychoric correlations.