Interpretation of commercial production information: a case study of lulo (Solanum quitoense), an under-researched Andean fruit
Abstract
Every time a farmer plants and harvests a crop represents a unique event or experiment. Our premise is that if it were possible to characterize the production system in terms of management and the environmental conditions, and if information on the harvested product were collected from a large number of harvesting events under varied conditions, it should be possible to develop data-driven models that describe the production system. These models can then be used to identify appropriate growing conditions and improved management practices for crops that have received little attention from researchers. The analysis and interpretation of commercial production data in the context of naturally occurring variation in environmental and management, as opposed to controlled experimental data, requires novel approaches. Information was available on both variation in commercial production of the tropical fruit, lulo (Solanum quitoense), and the associated environmental conditions in Colombia. This information was used to develop and evaluate procedures for the interpretation of the variation in commercial production of lulo. The most effective procedures depended on expert guidance: it was not possible to develop a simple effective one step procedure, but rather an iterative approach was required. The most effective procedure was based on the following steps. First, highly correlated independent variables were evaluated and those that were effectively duplicates were eliminated. Second, regression models identified those environmental factors most closely associated with the dependent variable of fruit yield. The environmental factors associated with variation in fruit yield were then used for more in depth analysis, and those environmental variables not associated with yield were excluded from further analysis. Linear regression and multilayer perceptron regression models explained 65–70% of the total variation in yield. Both models identified three of the same factors but the multilayer perceptron based on a neural network identified one location as an additional factor. Third, the three environmental factors common to both regression models were used to define three Homogeneous Environmental Conditions (HECs) using Self-Organizing Maps (SOM). Fourth, yield was analyzed with a mixed model with the categorical variables of HEC, location, as a proxy for cultural factors associated with a geographic region, and farm as proxy for management skills. The mixed model explained more than 80% of the total variation in yield with 61% associated with the HECs and 19% with farm. Location had minimal effects. The results of this model can be used to determine the appropriate environmental conditions for obtaining high yields for crops where only commercial data are available, and also to identify those farms that have superior management practices for given environmental conditions