Scientific Publication

Mapping cropland extent of Southeast and Northeast Asia using multi-year time-series Landsat 30-m data using a random forest classifier on the Google Earth Engine Cloud

Abstract

Cropland extent maps are useful components for assessing food security. Ideally, such products are a useful addition to countrywide agricultural statistics since they are not politically biased and can be used to calculate cropland area for any spatial unit from an individual farm to various administrative unites (e.g., state, county, district) within and across nations, which in turn can be used to estimate agricultural productivity as well as degree of disturbance on food security from natural disasters and political conflict. However, existing cropland extent maps over large areas (e.g., Country, region, continent, world) are derived from coarse resolution imagery (250 m to 1 km pixels) and have many limitations such as missing fragmented and\or small farms with mixed signatures from different crop types and\or farming practices that can be, confused with other land cover. As a result, the coarse resolution maps have limited useflness in areas where fields are small (<1 ha), such as in Southeast Asia. Furthermore, coarse resolution cropland maps have known uncertainties in both geo-precision of cropland location as well as accuracies of the product. To overcome these limitations, this research was conducted using multi-date, multi-year 30-m Landsat time-series data for 3 years chosen from 2013 to 2016 for all Southeast and Northeast Asian Countries (SNACs), which included 7 refined agro-ecological zones (RAEZ) and 12 countries (Indonesia, Thailand, Myanmar, Vietnam, Malaysia, Philippines, Cambodia, Japan, North Korea, Laos, South Korea, and Brunei). The 30-m (1 pixel = 0.09 ha) data from Landsat 8 Operational Land Imager (OLI) and Landsat 7 Enhanced Thematic Mapper (ETM+) were used in the study. Ten Landsat bands were used in the analysis (blue, green, red, NIR, SWIR1, SWIR2, Thermal, NDVI, NDWI, LSWI) along with additional layers of standard deviation of these 10 bands across 1 year, and global digital elevation model (GDEM)-derived slope and elevation bands. To reduce the impact of clouds, the Landsat imagery was time-composited over four time-periods (Period 1: January- April, Period 2: May-August, and Period 3: September-December) over 3-years. Period 4 was the standard deviation of all 10 bands taken over all images acquired during the 2015 calendar year. These four period composites, totaling 42 band data-cube, were generated for each of the 7 RAEZs. The reference training data (N = 7849) generated for the 7 RAEZ using sub-meter to 5-m very high spatial resolution imagery (VHRI) helped generate the knowledge-base to separate croplands from non-croplands. This knowledge-base was used to code and run a pixel-based random forest (RF) supervised machine learning algorithm on the Google Earth Engine (GEE) cloud computing environment to separate croplands from non-croplands. The resulting cropland extent products were evaluated using an independent reference validation dataset (N = 1750) in each of the 7 RAEZs as well as for the entire SNAC area. For the entire SNAC area, the overall accuracy was 88.1% with a producer’s accuracy of 81.6% (errors of omissions = 18.4%) and user’s accuracy of 76.7% (errors of commissions = 23.3%). For each of the 7 RAEZs overall accuracies varied from 83.2 to 96.4%. Cropland areas calculated for the 12 countries were compared with country areas reported by the United Nations Food and Agriculture Organization and other national cropland statistics resulting in an R2 value of 0.93. The cropland areas of provinces were compared with the province statistics that showed an R2 = 0.95 for South Korea and R2 = 0.94 for Thailand. The cropland products are made available on an interactive viewer at www.croplands.org and for download at National Aeronautics and Space Administration’s (NASA) Land Processes Distributed Active Archive Center (LP DAAC): https://lpdaac.usgs.gov/node/1281