Click on “Download PDF” for the PDF version or on the title for the HTML version.


If you are not an ASABE member or if your employer has not arranged for access to the full-text, Click here for options.

Relative and Unified Skill of Environmental, Edaphic, and Management Factors to Explain Crop Yield Variance Using Machine Learning  Open Access

Published by the American Society of Agricultural and Biological Engineers, St. Joseph, Michigan www.asabe.org

Citation:  Journal of the ASABE. 67(4): 995-1011. (doi: 10.13031/ja.15755) @2024
Authors:   Meetpal Singh Kukal
Keywords:   Climate variability, Drainage, Irrigation, Maize, Nitrogen, Random forest, Soils, Soybean.

Highlights

Random forests equipped with recursive feature elimination were used to predict crop yield variation using diverse predictors.

The predictors belonged to either environmental, soils, or management domains.

The best performing models combined predictors from all three domains and explained 75%-80% spatiotemporal yield variance.

Extreme heat was most important among soils, management, and seasonal environment predictors in maize and soybean.

Irrigation intensity, tile drainage, % silt, and July rain were most important at the sub-seasonal aggregation scale.

Abstract. Crop yields are dictated by a complex interplay between environment, edaphic (soils), and management that are subject to change across space and time. However, to what extent each of these influences and their interactions have been important in explaining yield variance is limitedly understood. The convoluted nature of this question motivates the application of modern machine learning approaches to decipher these influences and elucidate crop yield relations with their critical drivers. Here, we used random forest modeling with recursive feature elimination (RFE) to discern the diverse drivers of historical (1981-2019) county-level maize and soybean yields in the U.S. within the realms of environment (growing and extreme degree days, precipitation, vapor pressure deficit, evaporative demand, crop water use, soil moisture), soils (sand, silt, and clay contents, bulk density, soil organic carbon, available water capacity), and management (irrigation intensity, tile drainage, nitrogen input, depth to groundwater). We found that the most effective models selected predictors from all three realms and achieved remarkable explanatory capability, accounting for 75%-80% of the spatial and temporal variance combined. Environmental predictors exhibited a non-negotiable role in determining model performance, while their combination with either soil or management predictors approached the efficacy of the best-performing model. Specific variables within each individual predictor set and their combinations were analyzed for their relative importance to the model skill. The best performing model that used soils, management, and seasonal environmental predictors evaluated extreme heat as the most influential predictor for both crops. On further inclusion of sub-seasonal environmental variables with soil and management predictors, the relative importance shifted, with irrigation intensity assuming prominence as the most influential predictor, accompanied by tile drainage, silt content, and July precipitation for both crops. RFE revealed that only eight of the most relevant predictors were sufficient to explain >70% of the yield variance, even when the total predictors included were as high as 52. that Visual representations were developed to offer insights into the functional response of crop yield to changes in critical predictors, thereby facilitating predictions across diverse agricultural production systems and over time. This research underscores the value of including soil and management indicators alongside environmental predictors to improve understanding and predictability of the intricate dynamics governing crop yield variability.

(Download PDF)    (Export to EndNotes)