ASABE Logo

Article Request Page ASABE Journal Article

An Improved Crop Yield Prediction Using CNN-BiLSTM Model with Attention Mechanism

Zhaohong Jia1, Kunming Wu1, Haitao Wang1,*, Weihui Zeng1, YangYang Guo1, Dong Liang1


Published in Journal of the ASABE 67(6): 1459-1467 (doi: 10.13031/ja.15629). Copyright 2024 American Society of Agricultural and Biological Engineers.


1 School of Internet, Anhui University, Hefei, Anhui, China

* Correspondence: htwang@ahu.edu.cn

Submitted for review on 13 April 2023 as manuscript number ITSC 15629; approved for publication as a Research Article and as part of the Artificial Intelligence Applied to Agricultural and Food Systems Collection by Associate Editor Dr. Sami Khanal and Community Editor Dr. Yiannis Ampatzidis of the Information Technology, Sensors, & Control Systems Community of ASABE on 8 August 2024.

Citation: Jia, Z., Wu, K., Wang, H., Zeng, W., Guo, Y., & Liang, D. (2024). An improved crop yield prediction using CNN-BiLSTM model with attention mechanism. J. ASABE, 67(6), 1459-1467. https://doi.org/10.13031/ja.15629

Highlights

Abstract. Food is an important source of nutrients for people, and food production is an important basis for food reserves. In this study, a convolutional neural network-bidirectional long short-term memory (CNN-BiLSTM) model combined with an attention mechanism is proposed to predict crop yield. Firstly, the maximum temperature, minimum temperature, and rainfall data of 9 states in the rainfed corn belt of the United States are adopted as the original data, which are converted into cumulative climate parameters to reflect the growth process of corn. Then, the cumulative climate parameters are used as the input of the proposed model. The proposed model mainly consists of 4 parts: CNN module, BiLSTM module, attention module, and fully connected layer, which makes full use of the advantages of CNN in feature extraction and BiLSTM in processing time series. In addition, to ensure the relevance of time series data, the attention mechanism is used to allocate weights and calculate the correlation degrees between data. The experimental results show that BiLSTM exhibits better convergence ability than the standard one-way LSTM, and the crop yield prediction results obtained by the proposed model are better than those of the exponentially weighted average method, DeepCropNet, and CNN-RNN comparative algorithm.

Keywords.Attention mechanism, Bidirectional long short memory network, Convolutional neural network, Crop yield prediction.

Accurate prediction of crop yield forecasts (CYF) based on fundamental data on yield and climate prior to corn harvest plays a pivotal role in facilitating informed decision-making, particularly in corn production and disaster prevention, thereby ensuring regional and national food security. (Wang, 1999; Olasumbo et al., 2019; Luo et al., 2001; Xu et al., 2020). Current corn yield prediction methods are mainly based on traditional yield estimation models, such as empirical statistical (Peng et al., 2018) and crop growth models (Wang et al., 2010), which can achieve high yield prediction accuracy for a specific crop in a region but cannot effectively scale from the simple scene to the complex scene (Filippi et al., 2019; Lobell, 2013; Aghighi et al., 2018).

Compared with traditional statistical methods, machine learning (ML)-based CYF methods such as support vector machine (Liu et al., 2019) and random forest (Liu et al., 2019; Breiman, 2001; Wang et al., 2019a; Wang et al., 2020), which has become one of the research hotspots. In recent years, ML have successfully applied to machine vision, natural language processing and agriculture due to their ability of automatically learning and extracting features (Wang et al., 2019b; Zhou et al., 2017; Zhao et al., 2020). Jiang et al. (2020) proposed a long short-term memory (LSTM) model to integrate crop phenology and meteorological remote sensing data for county-level corn yield prediction, which explained 76% of the corn yield variability. Khaki et al. (2020) proposed a deep learning framework that combines the convolutional neural network (CNN) and recurrent neural network (RNN) for CYF. The model can reveal weather conditions, the accuracy of weather forecasts, soil conditions, and management practices can largely explain crop yield variability. Shahhosseini et al. (2020) developed an exponentially weighted average (EWA) integration method to predict corn yields in three states of the US corn belt. The importance of input features was computationally ranked in the process of predicting yields. Experimental results showed that the method achieves desirable prediction accuracy compared with traditional ML models. Lin et al. (2020) proposed a deep spatiotemporal learning framework called DeepCropNet to capture features for county-level corn yield estimation hierarchically, in which the temporal and spatial features were learned by a module based on an LSTM network combined with attention mechanism and a multi-task learning output layer to improve the prediction accuracy. Meanwhile, Dai et al. (2021) constructed a hybrid model based on CNN and a bidirectional LSTM network (BiLSTM), in which UAV visible light-sensing images were employed in predicting cotton yield. However, in the above study, the correlation of long-term series between the data was ignored. In addition, the weight allocation of features is rarely analyzed in depth. In this study, we propose a corn yield prediction method based on CNN-BiLSTM with an attention mechanism. First, the temperature and precipitation data during the corn growth cycle are converted into the cumulative climate parameters, i.e., growing degree days (GDD), killing degree days (KDD), and precipitation (PRCP). Second, the advantages of CNN in feature extraction are combined with the advantages of BiLSTM in processing time series. Final, to ensure the backward and forward correlation of time series data, weights are assigned via the attention mechanism to calculate the degree of correlation between data and obtain data structure information, which can be applied to solve the problem of semantic dilution at the front end of long series effectively due to the generation of fixed semantic vectors through the BiLSTM network.

Materials and Methods

Overview of the Study Area

This study focuses on county-level rainfed corn production from 1981 to 2020 in eight states in the US Corn Belt region. In particular, the region's corn production in 2020 accounted for 54.2% of the total US corn production in that year. Fine-grained climate and yield open-source data are available for this region, which are used for yield projection. Figure 1 shows the distribution of county-level corn production in 2020 in the 8 states: Minnesota (MN), Wisconsin (WI), Michigan (MI), Iowa (IA), Illinois (IL), Indiana (IN), Ohio (OH), and Missouri (MO). The color shade represents the magnitude of the corn yield value. The darker the color, the higher the corn yield.

Figure 1. Yield distribution of counties in the rain-fed corn belt in 2020.

Data and Preprocessing

Corn yield data, obtained from the USDA National Agricultural Statistics Service (USDA, 2024), are given in tons per hectare (mg/ha). There are a total of 750 counties and 27,248 samples in the raw yield data. Daily county-level meteorological data, obtained from the Applied Climate Information System web service (High Plains Regional Climate Center, 2020), include daily maximum temperature, daily minimum temperature, and daily precipitation.

Generally, temperature and yield data are respectively used as the regression variables and the dependent variable in predictive models. Cumulative temperature is one of the common temperature variables. And, growing cumulative temperature (GDD) is a temperature function that represents the accumulated heat required by a crop to complete a certain reproductive period. GDD is directly related to the growth rate and reproductive stage of the crop. High temperatures reduce yield by accelerating the growth and respiration rates of the crop, and this kind of high-temperature accumulation becomes the heat damage accumulation temperature (KDD).

Based on the temperature data, daily climatic accumulation parameters GDDd and KDDd are calculated as follows (Butler and Huybers, 2015):

(1)

(2)

(3)

where

Thigh = the maximum temperatures appropriate for corn growth

Tlow = the minimum temperatures appropriate for corn growth

Tmax,d = the daily maximum temperatures

Tmin,d = the daily minimum temperatures

T*max,d = the maximum effective temperature of the day, respectively

T*min,d is defined with analogous bounds.

According to the local corn planting records (USDA, 1984), the earliest sowing date was 5 April (Missouri) and the latest harvest date was 10 December (Indiana) in the 8 states in the studied area. Based on these dates, the data collection period was set at April to November as a monitoring period for corn yield prediction.

To ensure data reliability, the following data cleaning and processing steps are implemented.

  1. The input data is normalized.
  2. The stations with 1/3 excess null values are discarded during the climate data collection. Instead, when there are missing values in the current month, the average values are taken. When lots of values of the current month are missed, the average value of the current year is used.
  3. According to agriculture-related knowledge, the suitable temperature for corn growth is set to 9?–29? (Butler and Huybers, 2015).

For the data within this range, the monthly growing cumulative temperature (GDDm), heat damage cumulative temperature (KDDm), and cumulative precipitation (PRCPm) for month m are measured from GDDd, KDDd, and daily precipitation. Moreover, the three monthly-level parameters are used as the input of the yield prediction model. For the data obtained from preprocessing, the training and test data sets are divided at a ratio of 8:2.

Proposed CNN-BiLSTM Model

The proposed corn yield prediction model uses meteorological data as the input. The model is cascaded through multiple modules and learns temporal features from the input data step by step. The structure of the model is shown in figure 2, which consists of four main components: a CNN module (LeCun et al., 2015), a BiLSTM network module (Siami-Namini et al., 2019), an attention module, and a fully connected layer.

Figure 2. Mode structure diagram of CNN-BiLSTM-Attention.

CNN Module

CNN is incorporated into the proposed model to learn potential features to improve the predictive ability of the model. One-dimensional CNNs are typically used to process serial data. The initial structure of the convolutional module is a single-layer, one-dimensional CNN. In this study, different layers of CNN are set up, i.e., 2, 4, and 8. However, according to the experimental results, the performance of multilayer CNNs is not significantly improved compared with single-layer CNNs. Therefore, to balance between performance and computation cost, a single-layer, one-dimensional convolution is used to extract features.

BiLSTM Module

LSTM (Gers et al., 1999), a derivative network of RNN, can effectively handle gradient disappearance and gradient explosion. The standard LSTM structure is shown in figure 3.

The standard LSTM cell is calculated as follows:

(4)

(5)

(6)

(7)

(8)

(9)

where

ot = output gates

ct = the internal memory units

ft = the forgetting gates

it = the input gates

ht = the output vector at moment t

ht–1 = the output vector at moment t-1

W,U = the weight parameters and bias

b = the bias

xt = the input of the current state

xt,s = the activation function (sigmoid).

The traditional LSTM model, a one-way propagation neural network, cannot learn reverse features without utilizing before-and-after features. However, the BiLSTM network can overcome this shortcoming. The main hidden layer structure of BiLSTM is based on LSTM and consists of a stack of the LSTM networks with forward and reverse input operations. With the characteristics of the LSTM module, the BiLSTM network pays more attention to the backward and forward correlation of temporal data to ensure the extraction of temporal features. The dropout layer is added to avoid overfitting, and the dropout ratio is kept consistent with that of the convolutional module.

Attention Mechanism

The attention mechanism, essentially a weight assignment method, can obtain data structural information through calculating the degree of association between data. Hence, semantic dilution may appear at the front end of long sequences caused by the fixed feature vectors generated in BiLSTM. If the model introduces the attention mechanism, the attention mechanism can effectively suppress the problem of semantic dilution when dealing with long time series data. The structure of the attention mechanism used in this study is shown in figure 4.

With ht as the hidden state of the cell, the attention weights of the target are captured as follows:

Figure 3. Structural diagram of long short-term memory networks (LSTMs).

(10)

where w,bv are learning weights. The attention weights are normalized through the softmax function to calculate probability vector Pt as follows:

(11)

Figure 4. Structure of the attention mechanism used in this study.

The attention weights are assigned to the corresponding hidden states, and the attention values are obtained through weighted summation, which is formulated as follows:

(12)

Fully Connected Layers

Fully connected layers (FCs) play the role of a regressor in the whole network. An FC layer is inserted after the attention module, and the activation function is a sigmoid function. The output of the FC layer is the result of prediction.

Experimental Settings

The hardware environment for the experiments is an Intel(R) Core (TM) i7-11800H CPU, 16 GB RAM, and NVIDIA GeForce RTX 3060 graphics cards with 6 GB of video memory. The software environment for the experiments are Windows 10, TensorFlow 1.15, Keras 2.3, and Python 3.6.

In accordance with the period adopted for the dataset, the three climate cumulative parameters were generated in 24 dimensions for a total of eight months from April to November. During the training process, the meteorological data of every 40 years are used as a training sample to predict the yield of the next year. Therefore, the dimension of input data is 40×24. The processed data are inputted to the convolution module for feature extraction.

In the convolution module, the size of convolution kernel is set to 1×24, the number of the convolution kernels is 64, the step size is 1, and the activation function is ReLU. To prevent overfitting, a dropout layer is connected after the convolution layer, and the random discard ratio is set to 0.3.

In the BiLSTM module, the number of hidden layers is set to 4, the number of LSTM units is 128, and the discard ratio of the dropout layer is set to 0.3 (Srivastava et al., 2014).

Model optimization adopts the Adam adaptive gradient descent algorithm with mean square error (MSE) as the model loss function (Kingma and Ba, 2014). Root mean square error (RMSE) and coefficient of determination (R2) are employed as the evaluation metrics for the quantitative evaluation of model accuracy (Zhu et al., 2006).

Results and Discussion

Validation and Analysis

To verify the performance of the BiLSTM model in corn yield prediction, comparative experiments are conducted. To compare BiLSTM and LSTM, the network depth is changed by adjusting the number of hidden layers while ensuring that the input data are identical. Four network structures with four hidden layers are chosen for the LSTM and BiLSTM models. The LSTM and BiLSTM with 1, 2, 4, and 8 hidden layers are labeled as LSTM1, LSTM2, LSTM4, and LSTM8 and BiLSTM1, BiLSTM2, BiLSTM4, and BiLSTM8, respectively. The comparative results of the LSTM and BiLSTM models in terms of convergence are shown in figure 5.

It can be seen from figure 5 that the loss values of all models decrease with the increase of the number of iterations, but the convergence speed of the BiLSTM models is better than that of the LSTM models overall. When the number of iterations reaches about 80, the training loss of the LSTM models almost stops decreasing, whereas the loss of the BiLSTM models continues to decrease, which indicates that the convergence of the BiLSTM models is better than that of the LSTM models.

The values of RMSE and R2 of BiLSTM and LSTM models are given in table 1. Only the results of LSTM1 are listed in table 1 for the LSTM model because its performance is the best among the four LSTM models. The performance of three BiLSTM models is better than that of LSTM, except for the BiLSTM1 model, with worse performance than that of LSTM. That is, BiLSTM4 demonstrates the best performance, with R2 and RMSE equal to 0.817 and 1.057, respectively. Moreover, the number of hidden layers in the BiLSTM module of the CNN-BILSTM-Attention model is set to four in the subsequent experiments. The 2020 rain-fed corn yield in the U.S. is 50.47 mg/ha. According to the experimental results, the BiLSTM4 model is able to control the error of yield prediction for corn to 1.083 mg/ha in the optimal case and 4% reduction in model prediction error compared to the LSTM1 model with 1.129 mg/ha.

Figure 5. Comparison of training errors between LSTM and BiLSTM models with different layers.
Table 1. Comparison of error and determination coefficient between LSTM and BiLSTM models.
ModelRMSE
(mg/ha)
R2
LSTM11.1290.789
BiLSTM11.1780.773
BiLSTM21.1000.802
BiLSTM41.0570.817
BiLSTM81.0830.808

Analysis on the structures of LSTM and BiLSTM networks indicates that the BiLSTM model with one hidden layer is the same as the LSTM model with two hidden layers in terms of network depth, and the temporal features extracted by BiLSTM networks contain temporal data because BiLSTM networks are composed of LSTM networks with forward transmission and LSTM networks with reverse transmission superimposed above and below. In this study, the BiLSTM network extracts temporal features containing before and after the information of temporal data, outperforming the one-way feature extraction of the LSTM network. The growth information of each reproductive period is interrelated because the growth of corn is a process of organic matter accumulation. From the seedling stage to the cob stage and from the flowering stage to the spatulation stage, there exists a connection between the timing characteristics of corn growth. Therefore, the BiLSTM network with bidirectional operations can extract richer and more complete temporal features and thus has better performance.

Ablation and Comparison Experiments

There are three core components in the CNN-BiLSTM model fused with the attention mechanism: convolutional layer, BiLSTM network layer, and attention layer. Ablation experiments are performed to verify the contribution of each component to the improvement of the yield prediction performance. The specific settings and experimental results of the ablation experiments are shown in table 2.

Table 2. Comparison of root mean square error and determination coefficient in the ablation experiments.
ModelRMSE
(mg/ha)
R2
CNN1.115 0.797
BiLSTM1.069 0.813
BiLSTM-Attention1.066 0.814
CNN-BiLSTM1.011 0.833
CNN-BiLSTM-Attention0.9570.850

The following can be observed from table 2:

  1. The performance of using the BiLSTM network alone is better than that of using the CNN network alone. Furthermore, the performance is significantly improved when the two networks are used together. The goal of yield prediction is to learn potential correlations among time series data, that is, the backward and forward correlations of time series data, which is the main goal of the BiLSTM network. However, BiLSTM is incapable of extracting features sufficiently. Since CNN can learn potentially useful features, CNN is combined with BiLSTM to improve the overall performance of the model.
  2. Since the attention mechanism can be applied to assign the weights for feature vectors, CNN-BiLSTM combined with the attention mechanism results in better performance than only CNN-BiLSTM is used. That is, the model uses the attention mechanism to effectively solve the problem of semantic dilution in the front end of long sequences caused by the generation of fixed feature vectors in the CNN-BiLSTM network.
  3. When the three components are used simultaneously, the RMSE value of the model decreases to 0.957 mg/ha, and the R2 value improves to 0.85; that is, the model exhibits the best performance. The experimental results demonstrate the effectiveness of the proposed strategy.

The loss curves during the training in the ablation experiment are shown in figure 6. It can be found from figure 6 that the loss values of the model decrease with the increase of the number of iterations. The training error of the CNN-BiLSTM-attention model at the early stage of training is larger than that of the other models. In the process of training, the model converges rapidly and gradually surpasses the other models. In the 100th batch of training, the loss functions of the other models almost converge, whereas the loss values of the CNN-BiLSTM-attention model still show a decreasing trend. Hence, the proposed model exhibits better convergence performance than the other models.

To further validate the performance of the CNN-BiLSTM-attention model, deep learning models CNN-RNN (Khaki et al., 2020), EWA (Shahhosseini et al., 2020), and DeepCropNet (Lin et al., 2020) are selected as the comparative algorithms. R2 and RMSE are used as the evaluation criteria. The comparative results are shown in table 3. The values of R2 and RMSE obtained by the CNN-BiLSTM-attention model are 0.85 and 0.957, respectively. Moreover, the prediction results of the proposed model are better than those of the three compared algorithms.

Figure 6. Contrast diagram of training errors in ablation experiments.
Table 3. Comparison of the root mean square error and determination coefficient of four algorithms.
ModelRMSE
(mg/ha)
R2
EWA1.148/ [a]
DeepCropNet1.0000.780
CNN-RNN0.9750.750
CNN-BiLSTM-Attention0.957 0.850

    [a] "/" indicates that the corresponding paper of the EWA method is not evaluated by this evaluation index.

Compared with EWA, the deep learning model proposed in this study achieves automatic feature extraction through CNN and BiLSTM. Compared with the DeepCropNet yield estimation model, the CNN-BiLSTM-attention model considers weight assignment among different features during yield prediction. Compared with the CNN-RNN yield estimation model, the proposed model obtains higher accuracy when the same environmental data are used as the experimental data. When a portion of soil data is added as training data again in the comparison study, the prediction results are still improved. Therefore, multi-source data fusion is a promising research direction for yield prediction.

To verify the validity of the experimental data and the validity of the model to deal with time series forecasting experiments, multiple sets of experiments are conducted by dividing the data into different datasets according to a span of every five years to explore the potential time series connection between the data. The results of the experiments are shown in table 4, and the increase in the amount of training data clearly helps to improve the performance of the CNN-BiLSTM-Attention model. The diagonal line of the table indicates the performance of the model in yield prediction in the year before the training set. The RMSE value decreases from 1.56 mg/ha in 1982 to 0.98 mg/ha in 2021. At the same time, the R2 value significantly increases from 0.33 to 0.85 from 1982 to 2021. The results show that the CNN-BiLSTM-Attention constructed in this paper model can compensate for the uncertainty in maize yield prediction through the accumulation of time series data, thus improving the yield prediction accuracy.

Table 4. Experimental results of maize yield prediction in different year datasets.
Training
Data
RMSE (mg/ha) and R2 in Test Year
198219861991199620012006201120162021
19811.56(0.33)1.38(0.39)1.40(0.53)2.13(0.39)1.35(0.57)1.37(0.55)1.45(0.49)1.38(0.57)1.42(0.56)
1981–19851.32(0.42)1.29(0.59)1.58(0.41)1.25(0.63)1.29(0.51)1.37(0.57)1.26(0.64)1.35(0.58)
1981–19901.26(0.66)1.36(0.59)1.29(0.61)1.32(0.61)1.34(0.63)1.29(0.69)1.31(0.66)
1981–19951.29(0.71)1.11(0.76)1.26(0.69)1.20(0.70)1.20(0.70)1.18(0.69)
1981–20001.13(0.69)1.22(0.73)1.18(0.75)1.19(0.73)1.15(0.74)
1981–20051.15(0.78)1.25(0.69)1.08(0.81)0.99(0.83)
1981–20101.08(0.81)1.15(0.79)1.03(0.79)
1981–20151.03(0.81)0.98(0.85)
1981–20200.96(0.85)

As can be seen from table 4, the prediction results become more and more accurate with the gradual increase of the time series data and the gradual decrease of the time difference between the training data and 2021. When predicting the corn yield in 2021 using cumulative data from 1981 to 2020, the prediction results are much better than those obtained when predicting with the data in the first year. Moreover, the value of RMSE gradually decreases from the highest value of 1.42 mg/ha when using training data in one year to the lowest value of 0.96 mg/ha when using training data from 1981 to 2020, whereas the value of R2 is steadily increasing. Therefore, it is concluded that RMSE is negatively correlated with the size of the training data, while R2 is positively correlated with the size of the training data.

Based on the corn yield problem in rain-fed corn growing areas in the United States, this study experimentally demonstrates the superior performance of the proposed CNN-BiLSTM-Attention model in dealing with the corn yield prediction problem. The above experiments mainly focus on the optimization of the model and pay less attention to the geographical characteristics of the planting areas. However, in actual agricultural production, due to the latitude and longitude of each region, regional planting habits, etc., the yield may be affected by several more potential factors. Hence, the data of eight states in the study area are divided separately, and yield prediction experiments for each state are conducted separately to obtain more accurate yield prediction results. And the reasons for different prediction results are analyzed. The specific experimental results are shown in table 5.

As can be seen from table 5, the individual prediction experiments for each state showed different prediction results compared to the corn yields predicted for the entire Corn Belt. Among them, the prediction accuracy of Iowa and Wisconsin exceeded the overall prediction accuracy, with the optimal result decreasing the error by 0.097 tonnes/ha compared to the overall prediction result. However, the prediction accuracy of the remaining six states is lower than that of the overall prediction result, with the worst result being Missouri, which showed an increase in the error of the prediction result by 0.458 mg/ha. These increases and decreases in error for individual forecast results can be explained by a localized model.

Table 5. Yield prediction results for different states.
StateRMSE
(mg/ha)
R2MSE
(mg2/ha2)
MN1.0150.8621.030
WI0.9450.7940.893
MI1.1700.6851.369
IA0.8600.8700.739
IL1.1910.7881.419
IN1.1300.7191.278
OH0.9680.7620.938
MO1.4150.6072.000
ALL0.9570.9160.850

As an example, in Michigan in 2020, the average annual precipitation was 881.4 mm, which is below the region's long-term average. The region suffers from uneven distribution of precipitation throughout the year, with high precipitation in the first half of the year and low precipitation in the second half of the year. This has led to flooding in some areas in the spring, but drought in some areas in the latter half of the year, and drought can lead to a decline in crop yields and economic losses for farmers in these areas.

With the overall yield forecasts, it is difficult for the model to capture such anomalies in a small area; whereas, with the separate experiment for Michigan, the model prediction accuracy is greatly improved, which confirms the necessity of localized yield forecasting experiments. Through such localized yield prediction experiments, the model can capture different climatic characteristics of different regions, such as excessive rainfall, drought, extreme heat, and other factors, as a means of improving the accuracy of yield prediction.

As shown in figure 7, the predicted values obtained from separate prediction experiments in Missouri, Iowa, Minnesota, and Wisconsin are fitted to the true yield values. Among them, the linear fit of Missouri state is poor, the outliers are significantly more compared to the prediction results of the other three states, and most of the outliers are distributed above the function, which implies that the model overestimates the real yield value in the prediction experiment of Minnesota. In contrast, in the overall prediction, the model neutralizes some of the effects caused by the disaster data and the yield anomalies caused by extreme weather are not reflected.

Figure 7. Plot of linear fit of yield forecasts for different states.

Iowa, Minnesota, and Wisconsin show a better linear fit. The R2 values of these three states are 0.870, 0.862, and 0.794, while their corresponding error values, e.g., RMSE: 0.860 mg/ha, 1.015 mg/ha, and 0.945 mg/ha, are the best three sets of predicted results among the eight states. All three states are in the Upper Midwest region of the United States and share many similarities in climate characteristics. Iowa, Minnesota, and Wisconsin all experience abundant precipitation throughout the year, with the most precipitation occurring during the summer months. Mean annual precipitation ranges from about 760~890 mm in Iowa and southern Wisconsin, and 640~760 mm in Minnesota and northern Wisconsin, and all have a humid continental climate with cold winters and warm summers. Overall, the climates of Iowa, Minnesota, and Wisconsin are characterized by regular year-round temperatures and high precipitation, and are among the regions with fewer heat and drought hazards, thus allowing for better yield prediction results.

Conclusions

A crop yield prediction model based on CNN-BiLSTM combined with attention mechanism is constructed in this study. Meteorological data, such as temperature and precipitation, in a corn growing area are used as raw data and converted into climate cumulative parameters (GDD, KDD, and PRCP). Then, the climate cumulative parameters are utilized as inputs for yield prediction of local corn. The experimental results show that the prediction results of the proposed model are better than those of existing algorithms. Compared with LSTM, BiLSTM is based on the effective extraction of time series features of data. BiLSTM pays more attention to the backward and forward correlation of time series data, so that more accurate features are extracted. When CNN is used to extract the spatial features before the time-series features are extracted, it can effectively reduce the interference of redundant information in the data to the BiLSTM network. Combined with the attention mechanism, which can assign weights to different features, the model can effectively solve the problem of semantic dilution at the front end of long sequences. Furthermore, the three modules improve the performance of the proposed method for corn yield prediction.

The CNN-BILSTM-attention model is compared with several good yield prediction models, namely CNN-RNN, EWA, and DeepCropNet, and the experimental results show that the CNN-BiLSTM-Attention model in this paper has the best yield prediction results. In the experiment, the data set is divided into several groups according to the span of five years, to verify the validity of the data and the validity of the model processing time series prediction experiment. Finally, the experiment obtained a higher accuracy than the overall prediction by conducting separate experiments in different regions. This result confirms the necessity of local yield prediction experiments, and the application of the yield prediction model to actual regional production can make an important contribution to local crop cultivation.

Acknowledgments

This study was supported by the National Natural Science Foundation of China (71971002), Major Projects of Anhui Province, China (202003a06020016), and Key Research and Development Programs of Anhui Province, China (202004a070 20050).

References

Aghighi, H., Azadbakht, M., Ashourloo, D., Shahrabi, H. S., & Radiom, S. (2018). Machine learning regression techniques for the silage maize yield prediction using time-series images of landsat 8 OLI. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 11(12), 4563-4577. https://doi.org/10.1109/JSTARS.2018.2823361

Breiman, L. (2001). Random forests. Mach. Learn., 45(1), 5-32. https://doi.org/10.1023/A:1010933404324

Butler, E. E., & Huybers, P. (2015). Variations in the sensitivity of US maize yield to extreme temperatures by region and growth phase. Environ. Res. Lett., 10(3), 034009. https://doi.org/10.1088/1748-9326/10/3/034009

Dai, J. G., Jiang, N., Xue, J. L., Zhang, G. S., & He, X. L. (2021). Method for predicting cotton yield based on CNN-BiLSTM. Trans. CSAM, 37(17), 152-159. https://doi.org/10.11975/j.issn.1002-6819.2021.17.017

Filippi, P., Jones, E. J., Wimalathunge, N. S., Somarathna, P. D., Pozza, L. E., Ugbaje, S. U.,... Bishop, T. F. (2019). An approach to forecast grain crop yield using multi-layered, multi-farm data sets and machine learning. Precis. Agric., 20(5), 1015-1029. https://doi.org/10.1007/s11119-018-09628-4

Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural Comput., 12(10), 2451-2471. https://doi.org/10.1162/089976600300015015

High Plains Regional Climate Center (HPRCC). (2020). County level data. Retrieved from https://hprcc.unl.edu/datasets.php?set=CountyData

Jiang, H., Hu, H., Zhong, R., Xu, J., Xu, J., Huang, J.,... Lin, T. (2020). A deep learning approach to conflating heterogeneous geospatial data for corn yield estimation: A case study of the US Corn Belt at the county level. Global Change Biol., 26(3), 1754-1766. https://doi.org/10.1111/gcb.14885

Khaki, S., Wang, L., & Archontoulis, S. V. (2020). A CNN-RNN framework for crop yield prediction. Front. Plant Sci., 10. https://doi.org/10.3389/fpls.2019.01750

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint: arXiv 1412.6980. https://doi.org/10.48550/arXiv.1412.6980

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. https://doi.org/10.1038/nature14539

Lin, T., Zhong, R., Wang, Y., Xu, J., Jiang, H., Xu, J.,... Li, H. (2020). DeepCropNet: A deep spatial-temporal learning framework for county-level corn yield estimation. Environ. Res. Lett., 15(3), 034016. https://doi.org/10.1088/1748-9326/ab66cb

Liu, J. M., He, X. T., Wang, P. X., & Huang, X. J. (2019). Early prediction of winter wheat yield with long time series meteorological data and random forest method. Trans. CSAE, 35(6), 158-166. https://doi.org/10.11975/j.issn.1002-6819.2019.06.019

Lobell, D. B. (2013). The use of satellite data for crop yield gap analysis. Field Crops Res., 143, 56-64. https://doi.org/10.1016/j.fcr.2012.08.008

Luo, X., Zhang, T., & Hong, T. (2001). Technical system and application of precision agriculture. Trans. CSAM, 32(2), 103-106. Retrieved from https://www.nstl.gov.cn/paper_detail.html?id=54cae119b6fd6124867e5ddc41f2f9f7

Olasumbo, M., Tebogo, M., Thomas, M., & Michae, A. (2019). State-of-the-art and recommended developmental strategic objectives of smart agriculture. Smart Agric., 1(1). https://doi.org/10.12133/j.smartag.2019.1.1.201812-SA005

Peng, J. L., Wang, J., Kim, M. J., Jo, M. H., Kim, B. W., & Sung, K. I. (2018). Construction of a yield prediction model for whole crop maize on the basis of climate data in South Korea. Pratacult. Sci., 35(4), 857-866. Retrieved from https://doc.taixueshu.com/journal/20180213caoyekx.html

Shahhosseini, M., Hu, G., & Archontoulis, S. V. (2020). Forecasting corn yield with machine learning ensembles. Front. Plant Sci., 11. https://doi.org/10.3389/fpls.2020.01120

Siami-Namini, S., Tavakoli, N., & Namin, A. S. (2019). The performance of LSTM and BiLSTM in forecasting time series. Proc. 2019 IEEE Int. Conf. on Big Data (Big Data) (pp. 3285-3292). IEEE. https://doi.org/10.1109/BigData47090.2019.9005997

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., 15(1), 1929–1958.

USDA. (1984). Usual planting and harvesting dates for U.S. field crops. Agriculture Handbook, No. 628. Retrieved from https://swat.tamu.edu/media/90113/crops-typicalplanting-harvestingdates-by-states.pdf

USDA. (2024). Quick stats, 1980-2020. USDA-NASS. Retrieved from http://quickstats.nass.usda.gov

Wang, C. J., Zhao, Q. Z., Ma, Y. J., & Ren, Y. Y. (2019b). Crop identification of drone remote sensing based on convolutional neural network. Trans. CSAM, 50(11), 161-168. Retrieved from https://www.nstl.gov.cn/paper_detail.html?id=bd72135d46a3f3c19921c8770cbf1f3e

Wang, M. (1999). Development of precision agriculture and innovation of engineering technologies. Trans. CSAE, 15(1). Retrieved from http://www.tcsae.org/nygcxb/article/abstract/19990101

Wang, P. X., Hu, Y. J., Li, L., & Xu, L. X. (2020). Estimation of maize yield based on ensemble kalman filter and random forest for regression. Trans. CSAM, 51(9), 135-143. https://doi.org/10.6041/j.issn.1000-1298.2020.09.016

Wang, P. X., Qi, X., Li, L., Wang, L., & Xu, L. X. (2019a). Estimation of maize yield based on random forest regression. Trans. CSAM, 50(7), 237-245. https://doi.org/10.6041/j.issn.1000-1298.2019.07.026

Wang, T., Lü, C. H., & Yu, B. H. (2010). Assessing the productivity of winter wheat using WOFOST in the Beijing-Tianjing-Hebei Region. J. Nat. Resour., 25(3), 475-487. Retrieved from https://www.nstl.gov.cn/paper_detail.html?id=01c68d30f751847bdeb774cb37f399f8

Xu, Q., Guo, P., & Qi, J. (2020). Construction of SEGT cotton yield estimation model based on UAV image. Trans. CSAE, 36(16), 44-51. Retrieved from https://www.nstl.gov.cn/paper_detail.html?id=b485431f37ad4f22b28669b23172383d

Zhao, H., Wang, L., & Wang, W. J. (2020). Text sentiment analysis based on serial hybrid model of bi-directional long short-term memory and convolutional neural network. J. Comput. Appl., 40(1), 16-22. https://doi.org/10.11772/j.issn.1001-9081.2019060968

Zhou, Y. C., Xu, T. Y., Zhao, W., & Deng, H. B. (2017). Classification and recognition approaches of tomato main organs based on DCNN. Trans. CSAE, 33(15), 219-226. https://doi.org/10.11975/j.issn.1002-6819.2017.15.028

Zhu, Y., Li, Y., Feng, W., Tian, Y., Yao, X., & Cao, W. (2006). Monitoring leaf nitrogen in wheat using canopy reflectance spectra. Can. J. Plant. Sci., 86(4), 1037-1046. https://doi.org/10.4141/p05-157