Scientific Publication

Ethiopian Crop Type 2020 (EthCT2020) dataset: Crop type data for environmental and agricultural remote sensing applications in complex Ethiopian smallholder wheat-based farming systems (Meher season 2020/21)

Abstract

Crop type observation is crucial for various environmental and agricultural remote sensing applications including land use and land cover mapping, crop growth monitoring, crop modelling, yield forecasting, disease surveillance, and climate modelling. Quality-controlled georeferenced crop type information is essential for calibrating and validating machine learning algorithms. However, publicly available field data is scarce, particularly in the highly dynamic smallholder farming systems of sub-Saharan Africa. For the 2020/21 main cropping season (Meher), the Ethiopian Crop Type 2020 (EthCT2020) dataset compiled from multiple sources provides 2,793 harmonized, quality-controlled, and georeferenced in-situ samples on annual crop types (7 crop groups; 22 crop classes) at smallholder field level across the complex and highly fragmented agricultural landscape of Ethiopia. The focus was on rainfed, wheat-based farming systems. A nationwide ground data collection campaign (GDCC; Source 1) was designed using a stratification approach based on wheat crop calendar information, and 1,263 in-situ data samples were collected in selected sampling regions. This in-situ data pool was enriched with 1,530 wheat samples extracted from a) the Wheat Rust Toolbox (WRTB; Source 2; 734 samples), a database for wheat disease surveillance data [1] and b) an inhouse farm household survey database (FHSD; Source 3; 796 samples). Obtained field data was labelled according to the Joint Experiment for Crop Assessment and Monitoring (JECAM) guidelines for cropland and crop type definition and field data collection [2] and the FAO Indicative Crop Classification [3]. The EthCT2020 dataset underwent extensive processing including data harmonization, mixed pixel assessment through visual interpretation using 5 m Planet satellite image composites, and quality-control using Sentinel-2 NDVI homogeneity analysis. The EthCT2020 dataset is unique in terms of crop diversity, pixel purity, and spatial accuracy while targeting a countrywide distribution. It is representative of Ethiopia's complex and highly fragmented agricultural landscape and can be useful for developing new machine learning algorithms for land use land cover mapping, crop type mapping, agricultural monitoring, and yield forecasting in smallholder cropping systems. The dataset can also serve as a baseline input parameter for crop models, climate models, and crop disease and pest forecasting systems.