Soil Fertility Classification Using Machine Learning-Based Approach
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
National Institute of Technology Karnataka, Surathkal
Abstract
Agriculture is the main source of economy and survival in many countries. To ensure sustainable agricultural development, it is crucial to promptly acquire soil fertility and apply accurate fertilizers. However, traditional laboratory methods for analyzing soil samples make it challenging to estimate soil fertility. Therefore, this research aims to develop a reliable Machine Learning (ML)-based classifier that can classify soil fertility as LOW, or MEDIUM, or HIGH. Additionally, prescribes fertilizers based on the classification results. Soil fertility classification approach based on laboratory chemical parameters such as Electrical Conductivity (EC), Organic Carbon (OC), potential of hydrogen (pH), boron (B), copper (Cu), iron (Fe), manganese (Mn), phosphorus (P), potassium (K), sulphur (S), and zinc (Zn) have been proposed using ML approaches. The classifiers used in this study included Random Forest (RF), bagging, Boosted Regression Tree (BRT), J48 Decision Tree (J48), Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machine (SVM). The experiments were conducted with a split dataset (75% of data for training and 25% for testing) and 10-fold cross-validation. The tree-based classifier RF, outperformed the other classifiers by producing an accuracy of 99.99% with 10-fold cross-validation test and a split dataset. To avoid the need for laboratory analysis and obtain soil parameters specific to the site, this research relied on Sentinel-2 spectral data to determine EC, pH, OC, and N. The generated dataset was labeled using various clustering methods such as canopy, density-based, expectation-maximization, farthest-first, fuzzy C-means, and k-means and then compared with manual labeling. Among these, the canopy clustering approach achieved the highest accuracy of 75.99% on labeling dataset. Therefore, the proposed method for labeling the dataset uses the canopy-centered fuzzy C-means clustering. It was found that the proposed canopy-centered fuzzy-C-means clustering method achieved the highest accuracy of 78.42% in labeling the dataset. Furthermore, the performance of several ML-based classifiers, such as NB, SVM, J48, and RF were compared using datasets labeled with different clustering approaches. The RF classifier achieved the highest classification accuracy of 99.69% using the proposed approach and on 10-fold cross-validation. To determine the best fertilizer for a given soil, a new fertilizer prescription approach was proposed. It uses an ensemble filter-based feature selection to classify soil fertility and prescribe the appropriate fertilizer. It was tested on two datasets from regions with varying climate conditions. Various tree-based classifiers, such as classification and regression tree, extra tree, reduced error pruning tree, RF, NB, and SVM, were compared using the first dataset with relevant soil parameters. The results showed that the RF classifier with relevant soil parameters was the most accurate, achieving a 99.96% i accuracy with dataset-1 and a 99.90% accuracy with dataset-2. A soil fertility classifier and fertilizer prescription approach was proposed by utilizing 2D Convolutional Neural Networks (CNNs). The experiments were conducted on a split dataset with varying kernel sizes of 3×3 to 7×7 and input grid sizes from 11×11 to 13×13. The classifier showed an impressive accuracy of 97.24% and kappa statistics of 0.0938 with an input grid size of 11×11 and a kernel size of 3×3. To further improve the accuracy, the training data was oversampled using the Synthetic Minority Oversampling Technique (SMOTE). The proposed approach using oversampling achieved an accuracy of 97.52% and kappa statistics of 0.1397, with an input grid size of 12×12 and a kernel size of 3×3. A 1D-CNN-based soil fertility classification approach was developed to simplify the 2D CNN-based classifier used for soil fertility classification. To improve the performance of the model, the dataset was normalized using Min-Max normalization, and training data was oversampled using SMOTE. The proposed approach was compared with the soil fertility classifiers based on Extreme Learning Machine (ELM) and Multi- Layer Perceptron (MLP). The proposed approach, with normalization and SMOTE, achieved an accuracy of 97.90% and kappa statistics of 0.2358. A new method to classify soil fertility and prescribe fertilizers using symbolic deterministic finite automata, to overcome the limitations of traditional ML-based classifiers, which require large, unbiased datasets and are prone to errors. The proposed method was compared using ML-based classifiers using data from Sentinel-2 satellite imagery and laboratory-measured soil health data of Belgaum district. The data consisted of two sets: one with four soil parameters (Soil-health-1 dataset) and the other with twelve soil parameters (Soil-health-2 dataset). The results showed that the new approach was able to classify soil fertility with 100% accuracy using the Sentinel-2 and Soil-health-1 datasets, and with 98.37% accuracy using the Soil-health-2 dataset. Satellite revisits to a specific site location are infrequent, hence, soil sensors are used to collect real-time values of EC, pH, N, P, and K in this study. The collected real-time data is tested using trained and saved ML-based classifiers, such as Classification and Regression Tree (CART), J48, RF, Reduced Error Pruning (REP), NB and SVM which were trained using the Soil-health dataset of Belgaum district. For the real-time test data RF and REP classifiers achieved highest test accuracy of 100%.
Description
Keywords
Classification, Convolutional Neural Networks, Feature Selection, Fertilizer Prescription, Machine Learning, Precision Agriculture, Soil Fertility
