A Bayesian methodology for building consistent datasets for structural modeling
Abstract
Simulation models are powerful tools that help us understand, analyze, and explain dynamic, complex systems. They provide empirical methodologies to explore how systems and agents behave and consider how they may change when responding to shocks and stresses. The power of these tools, however, depends on the quality of the data on which they are built. Many complex systems studied in the social sciences, including economic systems, are characterized by sparseness of available data on behavioral characteristics and system outcomes. Generally, there is no single data source that can provide all the necessary information and detail for building a complex, structural, simulation model. Even where good data are available, few datasets are “model ready” without a lot of processing and cleaning. To populate models with data requires significant effort to stitch together a complete, coherent, and model-consistent dataset from a multitude of sources that vary in scope, time-scale, completeness, and quality. Due to information scarcity and variable quality, this challenge is well-suited to a Bayesian approach to efficiently use all available data. To this end, we present a data management system where we apply information theoretic, cross-entropy estimation methods to various FAO agricultural datasets to generate a complete global database of agricultural production, demand, and trade for use in IFPRI’s IMPACT model, a global agricultural partial equilibrium multi-market model. We will describe the information theory that serves as the foundation of this methodology, as well as the practical implementation for use in IMPACT. This data estimation methodology was developed for a partial equilibrium modeling framework, but the principals presented, are applicable to other data processing problems, where there is sparse and poor-quality data (e.g., data for computable general equilibrium models).