Agriculture remains a cornerstone of economic stability and food security for many nations around the world. However, plant diseases caused by various pathogens such as fungi, bacteria, and viruses continue to threaten crop productivity, leading to severe financial losses and food supply issues. Early and precise detection of these diseases is critical for implementing timely intervention measures and preserving both yield quality and quantity. This study focuses on leveraging machine learning techniques to automate the classification of plant leaf images into healthy and diseased categories. Since symptoms of infection are most commonly observed on the leaves, the research emphasizes the analysis of leaf imagery. A comprehensive, preprocessed dataset comprising images of leaves from multiple plant species, affected by various diseases, was utilized for training and evaluation purposes. Multiple machine learning algorithms, including Random Forest, Naive Bayes, and XGBoost, were applied and assessed using performance metrics such as accuracy and confusion matrices. The primary goal was to identify the model that offers the most reliable and scalable solution for real-time plant disease detection. Among the models tested, XGBoost exhibited the highest classification accuracy, reinforcing its potential in agricultural monitoring systems.This research underlines the importance of technological advancements, particularly in machine learning, in transforming traditional farming practices. It paves the way for the development of intelligent, automated plant health monitoring tools that could assist farmers in mitigating crop losses and enhancing sustainable agricultural practices
Introduction
Context & Problem
India, with its massive population, faces a dual crisis: food shortage and rising food prices. A major cause is plant leaf diseases, often caused by fungi and bacteria, which reduce crop yields and damage soil. Traditional methods detect diseases too late, leading to widespread crop failure. Therefore, early and accurate detection using machine learning (ML) is crucial to prevent losses and support sustainable agriculture.
Objective
The research aims to apply and compare supervised ML models—Random Forest (RF), Naive Bayes (NB), and XGBoost—to detect and classify plant leaf diseases using image data. The goal is to determine which model provides the most accurate and reliable predictions to assist farmers in timely disease detection and treatment.
Dataset
Source: Public datasets like PlantVillage
Size: 5,000+ images of healthy and diseased leaves
Classes: Multiple disease types + healthy leaves
Plants: Tomato, Potato, Bell Pepper, etc.
Image Features: RGB format with visible symptoms like blight, mold, spots
Preprocessing Steps
Image resizing to 128×128 pixels
Normalization to scale pixel values [0, 1]
Data augmentation (flip, rotate, zoom) to increase diversity
Label encoding to convert classes into numeric form
Train-test split: 80% training, 20% testing (stratified for class balance)
Models Compared
1. Random Forest (RF)
Ensemble decision trees (bagging)
High robustness and accuracy
Accuracy: 92.5%
2. XGBoost
Gradient boosting model
Most accurate and efficient
Accuracy: 94.1%
3. Naive Bayes (NB)
Simple probabilistic model
Fast but less accurate
Accuracy: 79.6%
Evaluation Metrics
Accuracy
Precision (avoids false positives)
Recall (avoids false negatives)
F1-Score (balance between precision and recall)
ROC-AUC
Confusion Matrix: TP, TN, FP, FN
Results Overview
Model
Accuracy
Precision
Recall
F1-Score
ROC-AUC
XGBoost
94.1%
94.8%
92.9%
93.8%
0.97
Random Forest
92.5%
93.0%
91.8%
92.4%
0.95
Naive Bayes
79.6%
78.5%
80.2%
79.3%
0.82
XGBoost outperformed all other models in every metric.
Confusion Matrix Highlights
Model
TP
TN
FP
FN
XGBoost
9,400
9,750
350
500
Random Forest
9,350
9,500
600
550
Naive Bayes
8,800
8,200
1,750
1,100
XGBoost: Best at correctly identifying both healthy and diseased leaves
Naive Bayes: Highest false positives and negatives → less reliable
Conclusion
In this research, multiple supervised machine learning models — Naive Bayes, Random Forest, and XGBoost — were experimented with and compared for the purpose of detecting plant leaf disease. The labeled leaf image dataset was meticulously preprocessed and augmented for enhancing model performance and generalization of the tested models, XGBoost performed best in overall accuracy and AUC-ROC value, which indicates its excellent ability to deal with non- linear relationships and noisy data common in agricultural environments. This research introduces an efficient machine learning method for the detection of leaf diseases in crops based on a real-world data set and various classification methods. The comparison of evaluation metrics and confusion matrices ensures the model’s potential to assist farmers in diagnosing the disease at an early stage. The suggested framework can be further enhanced through data set extension and the use of cutting-edge deep learning structures to improve detection accuracy. The system presents a promising base for smart agricultural solutions to enhance productivity and sustain- ability. An AI-based crop plant leaf disease detection sys- tem was designed and tested in this research using various machine learning algorithms and performance metrics. The system showed adequate classification with a Precision of 60 percentage, Recall of 75 percentage, and an F1 Score of 67 percentage, together with well-defined confusion matrix results that showed strengths in identifying diseased leaves correctly while pointing out areas to improve on in the reduction of false alarms. These results validate the system’s ability to help farmers detect and deal with leaf diseases in early stages in order to avoid crop loss and enhance farm productivity. The balanced performance of the model reflects its usability in real-life situations where pinpointing the disease accurately and timely is important. For future research, increasing the dataset with additional diverse images of leaves, hyperparameter tuning, and inclusion of more sophisticated deep learning architectures could further increase detection accuracy and lower misclassi- fication rates. The system presented herein offers an excellent platform for creating resilient smart agriculture solutions that enable sustainable agriculture farming.
References
[1] Prodeep, A.R.; Hoque, A.M.; Kabir, M.M.; Rahman, M.S.; Mridha, M.F. Plant Disease Identification from Leaf Images using Deep CNN’s EfficientNet. In Proceedings of the 2022 International Conference on Decision Aid Sciences and Applications (DASA), Chiangrai, Thailand, 23–25 March 2022; pp. 523–527.
[2] Suma V.R.; Amog Shetty; Rishab F. Tated; Sunku Rohan; Triveni S. Pujar. “CNN based Leaf Disease Identification and Remedy Recommendation System,” IEEE Conference Paper, 2019.
[3] Li, L.; Zhang, S.; Wang, B. Plant Disease Detection and Classification by Deep Learning—A Review. IEEE Access 2021, 9, 56683–56698. Scientist, D.; Bengaluru, T.M.; Nadu, T. Rice Plant Disease Identification Using Artificial Intelligence. Int. J. Electr. Eng. Technol. 2020, 11, 392–402.
[4] Omkar Kulkarni, “Crop Disease Detection Using Deep Learning,” IEEE Access, 2018.
[5] Ruchi Rani; Jayakrushna Sahoo; Sivaiah Bellamkonda; Sumit Kumar; Sanjeev Kumar Pippal. “Role of Artificial Intelligence in Agriculture: An Analysis and Advancements with Focus on Plant Diseases.” IEEE, 2023.