![]()
Article Request Page ASABE Journal Article Method for Zoning Corn Based on the NDVI and the Improved SOM-K-Means Algorithm
Xiaodong Di1, Xi Wang1,*
Published in Journal of the ASABE 66(4): 943-953 (doi: 10.13031/ja.15081). Copyright 2023 American Society of Agricultural and Biological Engineers.
1College of Engineering, Heilongjiang Bayi Agricultural University, Daqing, Heilongjiang, China.
*Correspondence: ndwangxi@163.com
Submitted for review on 3 March 2022 as manuscript number ITSC 15081; approved for publication as a Research Article and as part of the Artificial Intelligence Applied to Agricultural and Food Systems Collection by Associate Editor Dr. Garey Fox and Community Editor Dr. Yiannis Ampatzidis of the Information Technology, Sensors, & Control Systems Community of ASABE on 30 May 2023.
Highlights
- A partitioning method based on the NDVI and improved SOM-K-MEANS algorithm is proposed.
- The optimal number of partitions is determined according to the DBI, silhouette coefficient and silhouette analysis.
- This method provides a new approach for the real-time variable-rate fertilization of maize.
Abstract. To solve problems such as low nitrogen use efficiency during corn intertillage and topdressing and the presence of spatial differences in corn growth, a method for zoning and division based on the normalized difference vegetation index (NDVI) and the improved self-organizing map (SOM)-K-means algorithm was proposed. First, the GreenSeeker spectrum sensor was utilized to acquire the NDVI of the corn canopy in the vegetative V6-V10 stage during the intertillage period. Second, the acquired data were screened, and the SOM-K-means algorithm was used to perform a cluster analysis of the processed data. Finally, the clustering performance was analyzed. The initial clustering center was acquired with an SOM neural network, the clustering center was used in K-means clustering, and zoning was performed. The optimal number of zones was 4 according to the Davies?Boulding Index (DBI), the silhouette coefficient, and a silhouette analysis of the differences in corn growth. With four zones, the DBI and the silhouette coefficient were 0.569 and 0.537, respectively. Clusters 1 and 4 and clusters 2 and 3 in the silhouette map displayed similar thicknesses, with large differences between clusters and small differences within clusters. A comparison of the SOM-K-means algorithm, K-means algorithm, and SOM neural network algorithm revealed that the run time of the SOM-K-means clustering algorithm was 4.880 s, its inertia was 9.2118, and the run time of the SOM neural network was 4.715 s. The overall coefficient of variation of corn growth in the unzoned test area was 15.24%, and the overall coefficient of variation of corn growth after zoning was 6.94%. This method provides a new approach for understanding issues related to variable-rate fertilization zoning and online real-time zoning during corn intertillage based on the NDVI.
Keywords. Clustering, Crop growth, NDVI, SOM-K-means algorithm, Zoning.The scientific and rapid acquisition of crop growth information during the growing season is important for the production management and early yield prediction of corn crops. Currently, in agricultural production in China, when farmland managers adjust field management measures, crop growth uniformity is often evaluated based on planting experience. Due to the influence of environmental conditions, topography, and geomorphology, the variability in corn growth displays spatial consistency. At large scales, crop growth exhibits a sheet-like distribution. However, even at smaller, moderate scales, there is a high degree of spatial heterogeneity. Therefore, it is important for farmland managers to understand field crop growth, the degree of spatial variability in growth, and the specific scheme used for management zone delineation (Delin and Stenberg, 2014). Crop groups with consistent growth are generally obtained through variable-rate fertilization and other methods that maximally increase crop yield and mitigate the environmental load in farmland areas and the hidden costs to society (Hyytiäinen et al., 2011; Rodriguez et al., 2011).
In the past decade, the traditional methods of acquiring crop growth parameters from field sampling or laboratory analysis have shifted to remote sensing methods based on aerial vehicles/low orbital satellites and the characterization of crop growth parameters based on sensors (Nawar et al., 2017; Shi et al., 2020; Guerrero et al., 2021). Field crop sampling can provide accurate information about the various growth parameters of a crop, such as plant height, nitrogen content, the number of leaves, and other parameters, with relatively high precision. However, this method requires considerable manpower and material resources, is time-consuming, and has a low cost performance ratio. These shortcomings can lead to irreversible damage to crops, and the changes implemented are characterized by hysteresis. This approach for data collection and evaluation is unsuitable for the acquisition of growth information for large-scale crops in the Heilongjiang reclamation area. The remote sensing technology developed in the 1960s does not make direct contact with the target object. However, it does provide valuable information on the characteristics of the target. Sensing equipment is utilized to facilitate the collection, processing, and imaging of electromagnetic wave information radiated and reflected by the target object for use in the identification, monitoring, and analysis of the target object (Akinbile and Liu, 2022; Bai et al., 2017; Panda et al., 2010). Due to its considerable advantages of providing real-time information, generating minimal damage, and providing a wide scope of coverage, remote sensing technology has been broadly applied to monitor crop growth (Moran et al., 1997). In addition, scholars have proposed the utilization of spectral detection technology to acquire the normalized difference vegetation index (NDVI) of the crop canopy to evaluate crop growth for targeted zoning and topdressing. This method offers advantages such as a simple operation and timeliness (Honkavaara et al., 2013; Reynolds et al., 2015). At present, the sensors used to detect the NDVI of crop canopies are mainly satellite-borne canopy spectral sensors, airborne canopy spectral sensors, and vehicle-borne canopy spectral sensors, which are categorized based on their different carrying platforms. Satellite-borne canopy spectral sensors and airborne canopy spectral sensors use sunlight as the light source to acquire the agricultural crop NDVI over large areas (Shi et al., 2020). However, the products produced by satellite-borne canopy spectral sensors are temporally variable and cannot always meet the requirements of crop field management during the growing season. Although airborne spectral sensors can provide relatively high-resolution spectral images, they are extremely susceptible to the effects of weather conditions and are limited in cases with variable topdressing (Cao et al., 2012; Liu et al., 2019; Sankaran et al.,2015). The vehicle-borne canopy spectral sensor GreenSeeker carries its own light source, is robust to various weather conditions, and is effective during the day and at night.
Inman et al. (2008) examined the relationship between NDVI management division and the relative maize yield by acquiring remote sensing images by aircraft in the early growing season of corn. The results showed that the early NDVI of corn had the potential to be used for crop management. Therefore, the NDVI is also often used to divide farmland management areas. Nahry et al. (2011) derived the NDVI based on remote sensing and geographic information technology to divide a corn crop into four management zones and tested the effects of fertilizer applications prescribed for each zone. The results showed that the NDVI of corn based on remote sensing image detection could be used to determine the optimal amount of chemical fertilizer used in several areas (23.566 tons/test area), and the data were used to illustrate that the NDVI was significantly correlated with the crop yield. The NDVI reflects the biomass of crops, and it can be used to precisely define management areas. Honkavaara et al. (2013) utilized drones that carried passive spectral image sensors to acquire the NDVI data for a wheat
canopy, generated a global distribution map of the nitrogen content of wheat based on an NDVI model, performed weight matching to estimate the fertilization level of each plot based on a grid map classification method, the yield over time, and the total amount of fertilization, and established a management zoning distribution map.
Fu (2017) studied winter wheat and performed management zoning based on soil and remote sensing image data. Two methods for delimiting the management zones of three types of crops were evaluated: fuzzy C-means clustering, fuzzy C-means clustering with integrated spatial location information, and Lark's method. The results showed that the fuzzy C-means clustering method with integrated location information based on remote sensing images was the best choice for crop management zoning. Liu et al. (2019) used an object-oriented multiscale segmentation method to synthesize the management areas, which were delimited by the NDVI into the four phases of the crop growth period in the same year, and applied Moran’s index in the evaluation. The results showed that the zoning precision of the NDVI with multiple phases was higher than the precision of interpolation based on soil organic matter. Santos et al. (2019) proposed a system that used the K-means clustering algorithm to delimit management areas based on the crop NDVI captured by unmanned aerial vehicles to achieve the real-time, rapid, and accurate division of farmland management areas. Liu and Wang (2019) utilized the NDVI of corn determined by drone remote sensing and ground remote sensing to establish an optimized zoning method for management areas based on the NDVI, plant height, and biomass of corn. Then, reasonable management zones were established using the NDVI based on a clustering algorithm.
Traditional crop sampling and zoning methods are impractical and involve sample collection, transportation, preparation, and other processes. Those methods are prone to errors, expensive, time-consuming, and slow, and they require operators with abundant experience (Zhang et al., 2014; Guerrero et al., 2021). In addition, they release chemicals into the environment, although on-site and online measurement modes can overcome these shortcomings. However, when satellite remote sensing and drones acquire NDVI and other spectral data, there remain numerous shortcomings that limit the applicability of the acquired data. Multisource satellite images acquired by the utilization of satellite platforms can be used to extract the phenotypic parameters of crops; however, due to the limited temporal resolution of data collection, it is difficult to apply these images in analyses of small areas and for high-frequency dynamic monitoring. Data collection with airborne platforms is costly and requires extensive technical maintenance efforts; it is also limited by the intensity of sunlight and the influence of the cruise duration of drones (Shi et al., 2020). One future development direction is the avoidance of cumbersome offline processing operations and the formation of a system that performs decision-making to enhance variable-rate fertilization after real-time online zoning.
In this study, corn is selected for investigation, and a zoning mode based on the NDVI of the corn canopy detected by a vehicle-borne spectral sensor is investigated. A zoning and division method based on NDVI data and the improved self-organizing map (SOM)-K-means algorithm is proposed to solve problems related to high demands for manpower and material resources, high time consumption, and the damaging nature and hysteresis of field investigations in traditional management zoning.
Materials and Methods
Study Area
The field test date was June 18, 2019, and the field test and NDVI data collection for the corn canopy were conducted at the No. 11 plot of the 17th operating post in the 4th management area of Zhaoguang Farm (126°26'-127°6' east longitude, 47°54'-48°12' north latitude) in Beian city, Heilongjiang Province. The data collection area for this test is shown in figure 1, where the red frame represents the detection area of the spectral sensor. The corn planting mode was double-row planting on the ridge, the ridge spacing was 1.1 m, the working width was 6.6 m, the total area of corn collection in the ridge test area was 9.4 hm2, and the growth stage of the corn was 6-10 leaves. This farm was between 240 and 330 m above sea level, and the plot was at mid- to high latitudes. The annual average air temperature was generally 0.5°C, the frost-free period was approximately 120 days, the annual rainfall was 570 mm, and the annual average amount of sunshine was more than 2,700 hours.
Figure 1. Study area. Data Collection and Preprocessing
The NDVI data collection system for the corn canopy comprised spectral sensors, a vehicle-borne intelligent terminal, a global navigation satellite system (GNSS) receiver, a controller area network (CAN) bus data logger, and other equipment. The NDVI data collection system for the corn canopy based on near-surface remote sensing is shown in figure 2.
Six GreenSeeker spectral sensors were selected. Each spectral sensor could detect the NDVI data for the corn canopy on one ridge. The sensors were produced by the Trimble Company in the United States. The GreenSeeker RT200C model was used, with a standard 12 V DC (11.0 V to 15.5 V) power supply, standard 300 mA current (peak value of 600 mA), and CAN (controller area network) data format. The scope of NDVI measurement was 0.00 to 0.99, and the collection system obtained the mean value of the NDVI data from the six spectral sensors. The NDVI was calculated as follows:
(1)
where NIR is the reflectance value of the near-infrared band and R is the reflectance value of the red light band.
The spectral sensor used in this study is commonly used to estimate the nitrogen content of wheat, corn, and other crops, and it is broadly applied in variable-rate fertilization operations for these crops. GreenSeeker is an active sensor equipped with its own light source; therefore, it can obtain measurements at any time regardless of the available light, including at night. GreenSeeker uses two light-emitting diodes, one of which emits red light (671±6 nm) and the other near-infrared light (780±6 nm). Using these two bands, the NDVI of the canopy for corn and other crops is calculated. The sensor is installed at the front end of the tractor at a distance of 0.6 m to 0.8 m from the corn canopy. Considering the execution response time of the NDVI data collection system for the corn canopy and given the uniformity and coefficient of variation of the spectral data for corn, the sampling frequency of the GreenSeeker spectral sensor used in the data collection platform was set to 1 Hz.
The GNSS can provide latitude, longitude, altitude, and travel speed information at any location on Earth. In this study, the travel speed measurement of the GNSS was adopted. The system is characterized by high precision, good real-time performance, and strong antijamming capability, thus providing adequate support for data collection. The Trimble GNSS receiver was used, as it can process multiple signals and be combined with a variety of real-time GNSS differential service equipment; additionally, it is convenient to carry and provides high precision and good reliability.
Based on the current state of the Heilongjiang reclamation area and the actual agricultural production situation, an industrial tablet computer with a 10-inch capacitance screen was adopted. This vehicle-borne computer was equipped with a CAN communication interface to allow plug-and-play operation. This interface was used in detection and automatic control tasks, among others, and the system can operate stably in harsh environments, such as farmland areas, with high reliability and good stability.
Figure 2. NDVI data collection system. The CAN bus communication mode was used in the data collection system to enable information sharing among equipment, free communication at each node, and enhanced control and coordination of the NDVI data collection system. Therefore, the CAN bus logger was used to record the test data; it contained two independent CAN channels. A total of 12,447 records were collected, and the collection time totaled 3 hours and 41 minutes. Table 1 shows a portion of the data collected by the NDVI data collection system.
Table 1. A portion of the collected NDVI data. No. Date Time Longitude Latitude Elevation
(m)Speed
(km/h)Mean
NDVI1 2019/6/18 14:41:10 126.6302140 48.0369749 291.63 7.58 0.389 2 2019/6/18 14:41:11 126.6302400 48.0369813 291.63 7.38 0.417 3 2019/6/18 14:41:12 126.6302663 48.0369876 291.75 7.50 0.402 4 2019/6/18 14:41:13 126.6302921 48.0369942 291.75 7.44 0.423 5 2019/6/18 14:41:14 126.6303181 48.0370013 291.63 7.41 0.397 6 2019/6/18 14:41:15 126.6303443 48.0370085 291.63 7.69 0.37 76 2019/6/18 14:41:16 126.6303706 48.0370157 291.63 7.57 0.424 8 2019/6/18 14:41:17 126.6303966 48.0370226 291.50 7.89 0.397 9 2019/6/18 14:41:18 126.6304222 48.0370296 291.50 7.40 0.395 10 2019/6/18 14:41:19 126.6304479 48.0370369 291.63 7.52 0.411 Data Processing
NDVI Data Clustering
The SOM algorithm was used in this test to optimize the K-means algorithm, and an improved SOM-K-means clustering algorithm was proposed. After the acquired NDVI data for the corn canopy were processed, the SOM-K-means algorithm was used to conduct data clustering and obtain fertilization zoning results.
According to statistical analysis, the overall variation rate of the NDVI in this study area was 15.24%, which satisfied the requirement of regional management (greater than 10%) (Liu et al., 2019). Therefore, regional management was conducted. The NDVI time series data were used to reflect the differences in corn growth, the NDVI data were normalized, and cluster analysis was directly conducted.
K-Means Algorithm
The K-means clustering algorithm is a commonly used clustering algorithm in data mining. Its separation ability and easy implementation make it applicable to many fields of research (Herrera et al., 2010; Godin et al., 2005; Brentan et al., 2018). Distance was used as the evaluation index for similarity; that is, the smaller the distance between objects was, the higher their similarity and the more likely they were to belong to the same cluster. The final goal was to obtain compact and independent clusters of highly similar objects within clusters and dissimilar objects in different clusters. The steps for the K-means algorithm are as follows:
- Any K samples from the input sample set are selected as the initial clustering centers a = a1, a2,…ak.
- For each sample xi in the data set, its distance to the K clustering centers is calculated, and division is performed based on the smallest distance from the clustering centers.
- For each category aj, the corresponding clustering center
(that is, the center of mass of all samples in this category) is recalculated.
- Steps (2) and (3) are repeated until a certain stopping condition is reached (e.g., the maximum number of iterations or the minimum error variance), at which point the algorithm is terminated.
The advantages of this algorithm are that it is fast and simple and offers rapid convergence; the disadvantages are that the number of clustering centers K must be specified in advance and that it is particularly sensitive to the selection of the initial K value. Moreover, for different initial clustering centers, the algorithm may generate different clustering results, it is prone to convergence to the local optimum, and a small amount of data in the sample set can have a considerable impact on the final clustering effect.
SOM Neural Network Algorithm
An SOM is a neural network with unsupervised training that was proposed by Kohonen (Kohonen, 1997; Alok et al., 2017; Luo et al., 2018; Vesanto and Alhoniemi, 2000). An SOM simulates the self-organized feature mapping functions of the brain and nervous system to automatically classify input patterns according to learned rules (Teles et al., 2015). In an unsupervised situation, self-organized learning is conducted based on input patterns, with repeated adjustment of the connection weight coefficients to best reflect the relationships among the input samples. The classification results are expressed in the competition layer. A typical topological structure of an SOM neural network is shown in figure 3.
Figure 3. Typical topological structure of an SOM neural network. The steps of the SOM algorithm (Singh and Dixit, 2013) and the training process of the SOM network algorithm are as follows.
If the input is an n-dimensional vector, let x = [x1, x2, …, xn]T, and establish a two-dimensional grid with m output nodes. The connection weight value between the i-th input neuron node and the j-th output neuron node is wij, and the training process for the algorithm is as follows:
Step 1: Initialization of the connection weight values. For all initial weight values wij, a random value between [0,1] is selected, with the only restriction being that wij values are different from each other.
Step 2: Input a sample pattern to the network. An n-dimensional vector x is selected from the sample input space at a certain probability to represent the activation pattern applied to the grid.
Step 3: Calculate the distance at moment t. The distance from the input vector at moment t to all output nodes (the definition of the Euclidean distance is used in this study) is calculated as follows:
, where x1(t) is the value of the input vector at moment t.
Step 4: Select the neuron i(x) that wins the competition. The node that generates the smallest dj as the most well-matched neuron is selected, and
. Neuron i(x) is the winning neuron.
Step 5: Adjust the connection weight vector for the output node. The weight vector of the neuron is adjusted with following updating formula:
(2)
where ?t is the learning efficiency 0 < ?(t) < 1, which decreases monotonically with time t, thereby ensuring the convergence of the learning process, and hj,i(x)(t) is the neighborhood function around the winning neuron, as determined by the Gauss neighborhood function.
(3 )
where
rj and ri(x) = positions of the output nodes j and i(x), respectively
s = scope of the neighborhood
hj,i(x)(t) = monotonically decreasing function for the distance between the two points.
To obtain the best result, ?(t) and hj,x(t)(t) both vary dynamically in the learning process.
Step 6: Repeat Steps 3 and 4 until the training of the SOM neural network is completed after all samples are learned. The final network topography obtained approximately describes the distribution of the input vectors.
Improved SOM-K-Means Algorithm
The SOM algorithm can automatically cluster input patterns without needing to specify the number of categories in advance, but in certain cases, some neurons may never win during training, leading to inaccurate classification results. During SOM neural network training, the set training pace is positively correlated with the final effect of clustering, which results in a relatively long network convergence time. In contrast, the advantage of the K-means algorithm is its fast convergence speed, but the initial clustering centers and the size of K are very difficult to determine in advance. Based on the steps of the SOM algorithm and the K-means algorithm and their respective advantages and disadvantages, the two algorithms are combined to optimize the K-means algorithm.
In NDVI spectral datasets, the amount of data directly affects the clustering efficiency of the algorithm. In an SOM neural network, the scale of the weight value matrix in the output layer affects the degree of complexity of the algorithm, but if the matrix is too simple, imprecise clustering information may be produced. Therefore, in cases with clusters with abundant data, although the precision of the results must be ensured, the degree of complexity and convergence time of the algorithm must be minimized.
Therefore, the SOM-K-means algorithm is proposed in this study, and the flowchart of the SOM-K means algorithm is shown in figure 4. This combined clustering algorithm maintains the self-organization characteristics of the SOM approach and the efficiency of K-means clustering, thus overcoming the overly long convergence time of the SOM method and the influence of the selection of the initial clustering centers in K-means clustering (Balakrishnan et al., 1994; Gutkin et al., 2011). The K-means algorithm has the advantages of a fast speed, a simple algorithm, and the ability to effectively process large datasets. Combining this algorithm with an SOM network can increase the accuracy of K-means clustering and reduce the number of nodes in the SOM output layer, thereby forming a twofold clustering method. The K-means approach is affected by the selection of the initial centers and by noise in the data. After the SOM neural network is combined with the K-means algorithm, the SOM network is used to perform the initial clustering to obtain the cluster centers, which are used as the initial cluster centers in the K-means Algorithm. Then, the K-means algorithm is used to cluster the data. Thus, the advantages of the two algorithms are combined.
Figure 4. Flowchart of the SOM-K-means algorithm. Clustering Performance Evaluation Indicators
The methods used to assess the effectiveness of clustering can be divided into two types: performance measurement and distance calculation methods. Clustering performance metrics are also referred to as effectiveness indicators, which can be external or internal. Internal indicators utilize the inherent features and magnitudes of data sets to evaluate the results of a clustering algorithm. In this study, two commonly used internal indicators for performance measurement, namely the Davies?Boulding Index (DBI) and the silhouette coefficient, are used to evaluate clustering performance.
Davies?Boulding Index
The DBI, also referred to as the classification accuracy index, was proposed by Davies and Bouldin (1979) to assess the advantages and disadvantages of clustering algorithms. The purpose of the DBI is to measure the mean value of the maximum similarity of each cluster; a small DBI represents a high degree of separation among clusters, a short distance between objects within a cluster, and an overall good clustering effect. The specific calculation of the DBI is as follows:
(4 )
where
Rij measures similarity, as defined by the DBI
si = average distance from the data in the cluster to the center of mass of the cluster
dij = distance between the centers of mass of clusters i and j.
Therefore, the DBI is defined as follows:
(5 )
Silhouette Coefficient
The silhouette coefficient (SC) is a metric for evaluating whether the clustering effect is good or poor. It was first proposed by Peter J. Rousseeuw (1987). The SC combines the degree of cohesion and the degree of separation. It can be used to evaluate the impact on the clustering results of different algorithms or methods within algorithms based on the same original data. The SC is a measure of whether the clustering result is reasonable and effective, where a high SC indicates a good clustering effect. The steps for calculating the SC of the samples are as follows:
- Calculate the average distance ai from sample i to other samples in the same cluster. A small ai indicates that sample i should be added to this cluster. ai reflects the degree of intracluster dissimilarity of sample i.
- Calculate the average distance bij from sample i to all samples in another cluster Cj, which is the degree of dissimilarity between sample i and cluster Cj. The degree of intercluster dissimilarity of sample i is defined as bi = min{bi1, bi2, …, bik}.
- The SC of sample i is defined according to the degree of intracluster dissimilarity ai and the degree of intercluster dissimilarity bi of sample i:
(6 )
- Determine whether the sample clustering result is reasonable based on s(i).
The mean value of s(i) for all samples is referred to as the SC of the clustering results. When the SC of the clustering results alone is not sufficient for determining whether the number of clusters is sufficient, a visualization method can be used, e.g., the silhouette analysis mode for a silhouette graph. Silhouette analysis can be used to study the separation distance between clusters. A silhouette graph shows the distance between each point in a cluster and the points in a neighboring cluster, thereby providing a method for visualizing evaluation parameters (such as the number of clusters). An SC near 1 indicates that sample i is far from the neighboring clusters, and the clustering effect is reasonable at this time. When the value is close to -1, then sample i should be classified into another cluster; when the value is 0, sample i is situated at or very close to the decision boundary between two neighboring clusters.
Results
Number of Clusters Determined by Silhouette Analysis
Silhouette analysis was used to select the number of clusters, as shown by the silhouette graphs in figure 5. The silhouette graphs show that five clusters were a comparatively poor choice for the given NDVI data because there were clusters in the fourth category with lower-than-average silhouette scores. In addition, the size of the clusters is indicated by the thickness of the silhouette graphs; with two clusters, cluster 1 was greater in quantity because the three corresponding subclusters formed a large cluster. Moreover, with four clusters, clusters 1 and 4 and clusters 2 and 3 displayed similar thicknesses. Therefore, under the premise of ensuring large differences between the clusters and small differences within the clusters, K=4 was determined to be the optimal number of clusters.
(a) (b) (c) (d) Figure 5. Silhouette maps: (a), (b), (c), and (d) are the silhouette maps with two, three, four, and five clusters, respectively; in the figures, the red dotted line indicates the average silhouette score for the corresponding number of clusters. Comparison of the SOM Neural Network, K-Means, and SOM-K-Means Clustering Algorithms
The SCs and DBI values of the SOM, K-means, and SOM-K-means algorithms were determined, and a line graph was drawn to find the optimal SC and DBI and compare the three algorithms, as shown in figure 6.
In figure 6a, the SC of the SOM neural network reaches an optimum with two clusters, but the DBI is the largest, and the overall coefficient of variation is 9.8% (only 0.2% lower than the value that satisfies the difference requirements for regional management). Therefore, it is still necessary to further conduct regional management. With three, four, or five clusters, the SC values are all lower than those of the SOM-K-means clustering algorithm, and with seven clusters, the SOM method yields the largest SC, but its DBI is the smallest. Figure 6b shows that with two to seven clusters, the overall DBI of the SOM neural network is larger than that of the SOM-K-means algorithm, and the clustering effect is poor. Therefore, based on the use of the two clustering effect evaluation indicators to evaluate the clustering effect of the SOM neural network and SOM-K-means algorithms, the SOM-K-means algorithm clearly performs better than the SOM neural network algorithm.
We compared the performance of the SOM and SOM-K-means algorithms, as shown in figure 7. For the SOM neural network, under the premise of ensuring a good clustering effect (the length of time is affected mainly by the number of iterations used in neural network training), the greater the amount of data and number of training iterations there are, the longer the time needed. Regarding the two clustering methods studied in this study, although the data volume was relatively large, the number of training iterations was low; therefore, the run time of the SOM-K-means algorithm was shorter than the times of the other methods. We performed clustering with the NDVI data; SOM-K-means was used first, followed by the SOM neural network, and the results were plotted. The NDVI dataset included 12,650 records; in a case with 10 training iterations in the SOM neural network, one step in the input layer, and four steps in the output layer, the run time was 40.094 s. With one training iteration, the run time of the SOM neural network algorithm was 4.715 s,
(a) SC (b) DBI Figure 6. Comparison of the SCs and DBIs of the three algorithms.
and its inertia value was 10.8909. The run time of the improved SOM-K-means algorithm was only 4.880 s, and its inertia value was 9.2118. The SOM-K-means algorithm performed a rapid computation for a large amount of data and did not fall to a local optimum due to improper initial value selection, which would have produced a poor clustering effect.
(a) SOM-K-means clustering results and clustering centers (b) SOM neural network clustering results and clustering centers Figure 7. Comparison of the SOM neural network and SOM-K-means algorithms. The NDVI clusters obtained with the SOM-K-means algorithm were 0.1980-0.3220, 0.3230-0.3790, 0.3800-0.4370, and 0.4380-0.8910. The algorithm run time was 4.880 s, and the inertia value was 9.2118. The NDVI data for the SOM neural network algorithm were divided into clusters from 0.1980-0.3510, 0.3520-0.3890, 0.3900-0.4260, and 0.4270-0.8910. Additionally, the algorithm run time was 4.715 s, and the inertia value was 10.8909.
Figure 8. Distribution map of variations in corn growth. Visual processing of the zoning results was conducted for the test data based on the number of clusters determined. A point on the map in a single color represents the numerical value of the NDVI, and colored clusters are 0.198-0.322, 0.323-0.379, 0.380-0.437, and 0.438-0.891. Establishment of the Growth Variation Zoning Map
Based on the aforementioned analysis, the SOM-K-means algorithm was employed to generate a distribution map representing the variations in corn growth, as shown in figure 8. Through the optimization process, it was determined that four clusters best captured the spatial differences in corn growth, as depicted in the accompanying figure. This map served as a basis for making informed decisions regarding variable-rate fertilization, enabling the precise and on-demand application of fertilizers. The integration of this figure facilitated efficient and targeted fertilization strategies.
Discussion
The NDVI has been used to characterize the degree of correlation between the nitrogen content of corn in the early growth stage and parameters such as the plant height and biomass of the crop. Based on this information, the NDVI was applied for the variable topdressing of corn, and crop growth zoning was conducted. Four was the optimal number of zones obtained based on the clustering performance evaluation indicators; this result was consistent with the findings of Chen et al. (2021) and Liu et al. (2019). However, the algorithm that Liu et al. (2019) used in management zoning had certain disadvantages, as the initial clustering centers in K means clustering were randomly selected, which had a considerable impact on the final result and run time. The shortcomings of K-means clustering can be avoided by acquiring the initial centers of mass for clustering with the SOM neural network and then running the standard K-means clustering algorithm. Moreover, Liu and Wang (2019) used K-means clustering and achieved an overall mutation rate of 12%, while the improved algorithm in this study only achieved an overall mutation rate of 6.94% after clustering.
The clustering performance evaluation indicators SC and DBI are used to evaluate the zoning effect, and these metrics express the degree of intracluster cohesion and the degree of intercluster separation for corn during zoning, thus effectively expressing population differences. Chen et al. (2021) also used the silhouette coefficient when evaluating clustering performance. They reported that when the data volume was greater than 8000, the lowest silhouette coefficient was 0.543, indicating good clustering performance. Through an improved algorithm, when the number of partitions was 4, the silhouette coefficient was only 0.537, indicating better clustering performance than that of Chen et al. (2021). The NDVI acquisition and management zoning methods in this study based on near-surface remote sensing are different from those in other studies, such as those involving management zoning methods based on drone remote sensing and the traditional preparation of physical maps, which can involve complex processes such as large-scale crop sampling. Because this study employs near-surface remote sensing, the results are minimally affected by the influence of soil reflection on remote sensing with sensors on drones, and the precision is high. Due to the large data volume, limited online processing capabilities of terminals, and high sensor prices, there are still many challenges associated with achieving large-scale and accurate real-time fertilization. One future research direction for variable-rate fertilization may include multisensor information fusion. In addition, crop growth and soil attributes are combined based on sensor detection to improve clustering quality and efficiency. The data collection system used in this study is based on six GreenSeeker sensors, each of which is used to detect a corn crop on a ridge. The cost of large-scale implementation may be high for this approach. Future research directions may include reducing the number of GreenSeeker sensors needed or using 4 or 2 GreenSeekers to achieve the same zoning effect while reducing costs. To lay a foundation for exploring methods for online, real-time zoning and variable-rate fertilization, one must maintain synchronicity with data collection regarding crop growth cycles, reduce errors, decrease lag, abandon complex operational processes, create an intelligent variable-rate fertilization process, and enhance online zoning with real-time variables.
In the test analysis, corn growth displayed spatial differences associated with factors such as geographic location, sunlight, and water stress. In the study by Lawton (1946), crop growth at the edges of the plot was better than at other locations. However, although the crops at the edges of the test plot in this study received sufficient air flow and sunlight, growth in these areas was comparatively poor, possibly due to being affected by the height of the terrain and damage from pesticides (excessive spraying of pesticides). The test plot had a comparatively low elevation in the upper-right topographical position, and the maximum elevation difference of the plot reached 3 m, which led to corn growth being affected by water stress; therefore, corn growth in the lower-left corner was better than that in the upper-right corner. There was a depression in the middle of the test plot, and the topography and water limitations resulted in stressed crop growth; therefore, the growth was also relatively poor. This result illustrated that the NDVI zoning results based on the SOM-K-means algorithm effectively expressed spatial differences in corn growth.
Conclusions
With the northeastern agricultural area in China as an example, an improved SOM-K-means clustering algorithm model is proposed based on the SOM neural network clustering algorithm and K-means clustering algorithm for the division of corn growth. To verify the effectiveness of the algorithm, a comparative analysis of the three algorithms was conducted using the DBI and SC indicators. The results show that the improved SOM-K-means clustering algorithm yields a faster convergence speed and shorter run time overall than other methods, and the effects of intergroup and intragroup clustering on growth were significantly better than those for other clustering algorithms. The inertia of the SOM-K-means approach was 9.2118, with a run time of 4.880 seconds, with notable advantages in maize clustering. Moreover, through silhouette analysis, it was ultimately determined that the SOM-K-means algorithm yielded a reasonable management zoning scheme when the number of clusters was 4. Therefore, the SOM-K-means method was suitable for dividing corn management areas, with a total coefficient of variation of 6.94% after division. This indicated the feasibility and rationality of the spatial clustering and corn growth partitioning methods based on the SOM-K-means algorithm. This approach provides a reference for the development of real-time, online, and intelligent variable fertilization partitioning methods based on big data.
Acknowledgments
We express our gratitude for the funding provided by China’s “13th Five-Year Plan” National Key Research and Development Project (No. 2016YFD020060802) and the Heilongjiang Province Farms and Land Reclamation Administration Project (No. HKKY190504). We thank the editors and anonymous reviewers for their helpful suggestions, which improved the quality of this article.
References
Akinbile, D. S., & Liu, Z. (2022). Daily burned area mapping for prescribed rangeland burning in the Flint Hills region using MODIS data. J. ASABE, 65(5), 1097-1105. https://doi.org/10.13031/ja.14611
Alok, A. K., Saha, S., & Ekbal, A. (2017). Semi-supervised clustering for gene-expression data in multiobjective optimization framework. Int. J. Mach. Learn. Cybern., 8(2), 421-439. https://doi.org/10.1007/s13042-015-0335-8
Bai, G., Blecha, S., Ge, Y., Walia, H., & Phansak, P. (2017). Characterizing wheat response to water limitation using multispectral and thermal imaging. Trans. ASABE, 60(5), 1457-1466. https://doi.org/10.13031/trans.11967
Balakrishnan, P. V., Cooper, M. C., Jacob, V. S., & Lewis, P. A. (1994). A study of the classification capabilities of neural networks using unsupervised learning: A comparison with K-means clustering. Psychometrika, 59(4), 509-525. https://doi.org/10.1007/BF02294390
Brentan, B., Meirelles, G., Luvizotto, E., & Izquierdo, J. (2018). Hybrid SOM+k-Means clustering to improve planning, operation and management in water distribution systems. Environ. Model. Softw., 106, 77-88. https://doi.org/10.1016/j.envsoft.2018.02.013
Cao, Q., Cui, Z., Chen, X., Khosla, R., Dao, T. H., & Miao, Y. (2012). Quantifying spatial variability of indigenous nitrogen supply for precision nitrogen management in small scale farming. Precis. Agric., 13(1), 45-61. https://doi.org/10.1007/s11119-011-9244-3
Chen, H., Wang, X., Zhang, W., Wang, X. Z., Di, X. D., & Qi, L. Q. (2021). A new soybean NDVI data-based partitioning algorithm for fertilization management zoning. Appl. Ecol. Env. Res., 19(2), 1391-1405. https://doi.org/10.15666/aeer/1902_13911405
Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell., PAMI-1(2), 224-227. https://doi.org/10.1109/TPAMI.1979.4766909
Delin, S., & Stenberg, M. (2014). Effect of nitrogen fertilization on nitrate leaching in relation to grain yield response on loamy sand in Sweden. Eur. J. Agron., 52, 291-296. https://doi.org/10.1016/j.eja.2013.08.007
Fu, Y. (2017). Remote sensing data based crop growth parameters retrieval and crop management zone delineation research. PhD diss. Zhejiang, China: Zhejiang University, College of Environment and Resources.
Godin, N., Huguet, S., & Gaertner, R. (2005). Integration of the Kohonen’s self-organising map and k-means algorithm for the segmentation of the AE data collected during tensile tests on cross-ply composites. NDT & E Int., 38(4), 299-309. https://doi.org/10.1016/j.ndteint.2004.09.006
Guerrero, A., De Neve, S., & Mouazen, A. M. (2021). Chapter One - Current sensor technologies for in situ and on-line measurement of soil nitrogen for variable rate fertilization: A review. In D. L. Sparks (Ed.), Advances in Agronomy (Vol. 168, pp. 1-38). Academic Press. https://doi.org/10.1016/bs.agron.2021.02.001
Gutkin, R., Green, C. J., Vangrattanachai, S., Pinho, S. T., Robinson, P., & Curtis, P. T. (2011). On acoustic emission for failure investigation in CFRP: Pattern recognition and peak frequency analyses. Mech. Syst. Sig. Process., 25(4), 1393-1407. https://doi.org/10.1016/j.ymssp.2010.11.014
Herrera, M., Canu, S., Karatzoglou, A., Pérez-García, R., & Izquierdo, J. (2010). An approach to water supply clusters by semi-supervised learning. 5th Int. Congress on Environmental Modelling and Software.
Honkavaara, E., Saari, H., Kaivosoja, J., Pölönen, I., Hakala, T., Litkey, P.,... Pesonen, L. (2013). Processing and assessment of spectrometric, stereoscopic imagery collected using a lightweight UAV spectral camera for precision agriculture. Remote Sens., 5(10), 5006-5039. https://doi.org/10.3390/rs5105006
Hyytiäinen, K., Niemi, J. K., Koikkalainen, K., Palosuo, T., & Salo, T. (2011). Adaptive optimization of crop production and nitrogen leaching abatement under yield uncertainty. Agric. Syst., 104(8), 634-644. https://doi.org/10.1016/j.agsy.2011.06.006
Inman, D., Khosla, R., Reich, R., & Westfall, D. G. (2008). Normalized difference vegetation index and soil color-based management zones in irrigated maize. Agron. J., 100(1), 60-66. https://doi.org/10.2134/agronj2007.0020
Kohonen, T. , Schroeder, M. R. , & Huang, T. S. (1997). Self-Organizing Maps. Springer Berlin Heidelberg.
Lawton, K. (1946). The influence of soil aeration on the growth and absorption of nutrients by corn plants. Soil. Sci. Soc. Am. J., 263-268.
Liu, H., & Wang, X. (2019). Assessing NDVI spatial pattern related to management zones. Appl. Ecol. Env. Res., 17(3). https://doi.org/10.15666/aeer/1703_62696285
Liu, H., Bao, Y., & Xu, M. (2019). Comparison of precision management zoning methods in black soil area based on SOM and NDVI. Trans. CSAE, 35(13), 177-183.
Luo, Q., Guo, C., Zhang, Y. J., Cai, Y., & Liu, G. (2018). Algorithms designed for compressed-gene-data transformation among gene banks with different references. BMC Bioinf., 19(1), 230. https://doi.org/10.1186/s12859-018-2230-2
Moran, M. S., Inoue, Y., & Barnes, E. M. (1997). Opportunities and limitations for image-based remote sensing in precision crop management. Remote Sens. Environ., 61(3), 319-346. https://doi.org/10.1016/S0034-4257(97)00045-X
Nahry, A. H. E., Ali, R. R.,& Barody, A. A. E. (2011) An approach for precision farming under pivot irrigation system using remote sensing and gis techniques. Agric. Water. Manag., 98 (4), 517-531.
Nawar, S., Corstanje, R., Halcro, G., Mulla, D., & Mouazen, A. M. (2017). Chapter Four - Delineation of soil management zones for variable-rate fertilization: A review. In D. L. Sparks (Ed.), Advances in agronomy (Vol. 143, pp. 175-245). Academic Press. https://doi.org/10.1016/bs.agron.2017.01.003
Panda, S. S., Panigrahi, S., & Ames, D. P. (2010). Crop yield forecasting from remotely sensed aerial images with self-organizing maps. Trans. ASABE, 53(2), 323-338. https://doi.org/10.13031/2013.29563
Reynolds, A. G., Brown, R., Kotsaki, E., & Lee, H.-S. (2015). Utilization of proximal sensing technology (greenseeker) to map variability in Ontario vineyards. Proc. 19th Int. Symp. GiESCO, (pp. 593-597).
Rodriguez, H. G., Popp, J., Gbur, E., & Chaubey, I. (2011). Environmental and economic impacts of reducing total phosphorous runoff in an agricultural watershed. Agric. Syst., 104(8), 623-633. https://doi.org/10.1016/j.agsy.2011.06.005
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math., 20, 53-65. https://doi.org/10.1016/0377-0427(87)90125-7
Sankaran, S., Khot, L. R., Espinoza, C. Z., Jarolmasjed, S., Sathuvalli, V. R., Vandemark, G. J.,... Pavek, M. J. (2015). Low-altitude, high-resolution aerial imaging systems for row and field crop phenotyping: A review. Eur. J. Agron., 70, 112-123. https://doi.org/10.1016/j.eja.2015.07.004
Santos, S. G., Melo, J. C., Constantino, R. G., & Brito, A. V. (2019). A solution for vegetation analysis, separation and geolocation of management zones using aerial images by UAVs. Proc. 2019 IX Brazilian Symp. on Computing Systems Engineering (SBESC), (pp. 1-8). https://doi.org/10.1109/SBESC49506.2019.9046079
Shi, Y., Zhu, Y., Wang, X., Sun, X., Ding, Y., Cao, W., & Hu, Z. (2020). Progress and development on biological information of crop phenotype research applied to real-time variable-rate fertilization. Plant Methods, 16(1), 11. https://doi.org/10.1186/s13007-020-0559-9
Singh, M. P., & Dixit, R. S. (2013). Optimization of stochastic networks using simulated annealing for the storage and recalling of compressed images using SOM. Eng. Appl. Artif. Intell., 26(10), 2383-2396. https://doi.org/10.1016/j.engappai.2013.07.003
Teles, L. O., Fernandes, M., Amorim, J., & Vasconcelos, V. (2015). Video-tracking of zebrafish (Danio rerio) as a biological early warning system using two distinct artificial neural networks: Probabilistic neural network (PNN) and self-organizing map (SOM). Aquat. Toxicol., 165, 241-248. https://doi.org/10.1016/j.aquatox.2015.06.008
Vesanto, J., & Alhoniemi, E. (2000). Clustering of the self-organizing map. IEEE Trans. Neural Networks, 11(3), 586-600. https://doi.org/10.1109/72.846731
Zhang, Z., Lu, X., Lu, N., Chen, J., Li, X. W., Feng, B. (2014). Defining agricultural management zones using remote sensing and GIS techniques for drip-irrigated cotton fields. Trans. CSAM, 45 (7), 125-132.