Working Paper

Can machine-learning models predict gendered labor statistics using mobile phone and geospatial data?: CGIAR GENDER Impact Platform Working Paper #23

Abstract

High-quality data on rural women’s and men’s labor is imperative for tracking progress on gender equality and women’s empowerment, and for evaluating development interventions aimed at these outcomes. Yet, there remains a general lack of sex-disaggregated data on unpaid care and domestic work, earnings, employment and entrepreneurship. Researchers are increasingly looking to digital technologies, such as mobile phones, as an emerging data source with significant potential for closing gender data gaps. In this paper, we attempt to use mobile phone data and machine-learning models to predict gendered labor-market indicators for a large sample of mobile phone users in Ghana. Although our models predict mobile phone subscribers’ sex with reasonable accuracy, they predict women’s and men’s labor-market outcomes only slightly better than random guessing. The models’ mixed results may be partly attributed to noisiness in the data due to disruptions in mobile phone and employment-related behaviors caused by COVID-19. Our results also point to potential methodological limitations in using machine-learning methods and mobile phone data to estimate gendered labor-market indicators, and more generally suggest that we should proceed cautiously when thinking about leveraging digital technologies and machine learning to close data gaps. We conclude the paper with several recommendations for how the methodology might be refined in future work.