Plant diseases remain a major concern for global food security as they cause heavy losses in crop every year. Accurately detecting these diseases early isthought to be of crucial help for improving yield in crops. Manual detection approaches as well as CNN based have turned out quite promising, but their long-range relations and overall context lead them to fail when it comes to leaf image representation. To address the problems described above, we propose plant leaf disease detection basedon Vision Transformer model in this paper. In contrast to most models that adopt convolutional neural networks for image feature extraction, our model uses a transformer for vision which has self-attention mechanism and learns global information on features. Therefore, it extracts more informative features.
Our model will have several components: Image Preprocessing -> Image Patches -> Feature learning (Transformer Encoder) -> Classification (Fully Connected Layer). Experimental results confirm that the proposed research method reaches an accuracy of 99.8% in classification, which is better than most existing deep learning approaches. In addition, confusion matrix analysis of our model performance suggests a minimal error rate on each kind of disease category.The Transformer-based model holds promise for realistic practical application in precision agriculture. It is capable of automatically detecting plant diseases in real time. The researchers also believe that the promising future of this transformer-based approach could mean a development in low-computing power environments for efficient systems of agriculture image analysis, depending on reproducibility and generalization.
Introduction
The text discusses the growing importance of accurate plant disease detection in ensuring global food security, especially as agricultural productivity faces challenges from population growth, climate change, and declining crop efficiency. Plant diseases are responsible for nearly 40% of annual crop losses worldwide, making early and accurate diagnosis essential for sustainable agriculture. Traditional disease diagnosis methods rely on plant pathology experts, which are time-consuming, labor-intensive, and sometimes inconsistent due to human judgment. These limitations make traditional approaches unsuitable for large-scale farming environments.
With the advancement of Artificial Intelligence (AI) and deep learning, automated plant disease detection systems have become highly effective. Convolutional Neural Networks (CNNs) have been widely used for image classification tasks because they can detect local visual features such as edges, textures, and disease patterns in leaf images. However, CNNs have limitations in capturing long-range spatial relationships and global contextual information, which can reduce performance in real-world agricultural conditions where disease symptoms may appear across different parts of the leaf.
To overcome these limitations, the study proposes the use of Vision Transformer (ViT) models for plant disease detection. Unlike CNNs, Vision Transformers use self-attention mechanisms that allow the model to learn both local and global relationships within an image. The ViT model divides images into smaller patches and processes them as sequences, enabling it to understand long-distance dependencies and complex disease patterns more effectively. Recent advancements between 2017 and 2026 show that transformer-based and hybrid models outperform traditional CNNs in plant disease detection tasks.
The literature review explains the evolution of plant disease detection methods from conventional machine learning techniques such as Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN) to deep learning models like AlexNet, VGG, and ResNet. While CNNs achieved high accuracy in controlled environments, they struggled with global context understanding. Transformer-based models such as Vision Transformers and Swin Transformers improved performance by capturing long-range dependencies and global image context. Hybrid models combining CNNs and transformers further enhanced robustness and accuracy in real-world agricultural settings.
The study also highlights recent trends in lightweight and mobile-friendly Vision Transformers, which reduce computational costs while maintaining high accuracy. Additional techniques such as GAN-based data augmentation and attention mechanisms further improve model generalization and reliability. Although transformer models require large datasets and high computational resources, ongoing research aims to overcome these challenges and enable practical deployment in precision agriculture.
The methodology of the proposed system includes image preprocessing, patch generation, patch embedding, transformer encoding, and classification. The PlantVillage dataset is used, containing images of healthy and diseased plant leaves. Images are resized to 224×224 pixels, normalized, and augmented using rotation, flipping, zooming, and brightness adjustments to improve generalization and reduce overfitting.
In the Vision Transformer architecture, images are divided into smaller non-overlapping patches (such as 16×16 pixels). These patches are flattened into vectors, embedded into a latent space, and combined with positional encoding to preserve spatial information. The embedded sequences are processed through transformer encoder layers using Multi-Head Self-Attention (MHSA) and Feed Forward Networks (FFN). Unlike CNNs, the self-attention mechanism allows the model to understand relationships between distant image regions, making it highly effective for detecting scattered disease symptoms.
The classification process uses a special classification token ([CLS]) whose output is passed through a softmax layer to predict disease categories. The model is trained using the Cross-Entropy Loss function with the AdamW optimizer. Performance evaluation metrics include accuracy, precision, recall, and F1-score.
The results demonstrate outstanding performance of the Vision Transformer model. After training for 15 epochs, the model achieved approximately 99.8% accuracy with continuously decreasing loss values, indicating effective learning and minimal overfitting. The confusion matrix shows very few misclassifications, with healthy leaf classification accuracy around 98% and disease class accuracy between 97–98%. Precision, recall, and F1-scores are close to 1.00, confirming highly reliable disease classification.
The discussion concludes that Vision Transformers provide superior performance compared to conventional CNNs due to their ability to capture global contextual information and complex disease patterns. The proposed model demonstrates strong accuracy, stability, and suitability for real-time plant disease detection systems. By integrating transformer-based models into automated agricultural tools, farmers can detect diseases early, reduce crop losses, improve resource management, and increase overall agricultural productivity, contributing significantly to precision agriculture and global food security.
Conclusion
The results of the present investigation were able to utilize a Vision Transformer (ViT) model for plant disease identification that could overcome the limitations of conventional CNNs through the application of self-attention mechanisms, which led to an accuracy score of 99.8% coupled with high precision, recall, and F1-scores. This high-performing and easy-to-use instrument for early detection helps to improve the practice of precision agriculture. Future research efforts will, however, aim to overcome the huge computational demands and add different types of data such as environmental and climatic variables. Consequently, ViT based models offer a sustainable and cutting edge solution for strengthening the issue of food security across the globe.
References
[1] S. Savary et al., “The global burden of pathogens and pests on major food crops,” Nature Ecology & Evolution, vol. 3, no. 3, pp. 430-439, 2019, doi: 10.1038/s41559-018-0793-y.
[2] S. P. Mohanty, D. P. Hughes, and M. Salathé, “Using deep learning for image-based plant disease detection,” Frontiers in Plant Science, vol. 7, Art. no. 1419, 2016, doi: 10.3389/fpls.2016.01419.
[3] S. Sladojevic, M. Arsenovic, A. Anderla, D. Culibrk, and D. Stefanovic, “Deep Neural Networks Based Recognition of Plant Diseases by Leaf Image Classification,” Computational Intelligence and Neuroscience, vol. 2016, Art. no. 3289801, 2016, doi: 10.1155/2016/3289801.
[4] K. P. Ferentinos, “Deep learning models for plant disease detection and diagnosis,” Computers and Electronics in Agriculture, vol. 145, pp. 311-318, 2018, doi: 10.1016/j.compag.2018.01.009.
[5] E. C. Too, Y. Li, S. Njuki, and Y. Liu, “A comparative study of fine-tuning deep learning models for plant disease identification,” Computers and Electronics in Agriculture, vol. 161, pp. 272-279, 2019, doi: 10.1016/j.compag.2018.03.032.
[6] A. Abade, P. A. Ferreira, and F. de B. Vidal, “Plant diseases recognition on images using convolutional neural networks: A systematic review,” Computers and Electronics in Agriculture, vol. 185, Art. no. 106125, 2021, doi: 10.1016/j.compag.2021.106125.
[7] A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” in Proc. Int. Conf. Learn. Representations (ICLR), 2021.
[8] H. Touvron et al., “Training data-efficient image transformers & distillation through attention,” in Proc. 38th Int. Conf. Machine Learning (ICML), vol. 139, pp. 10347-10357, 2021.
[9] Z. Liu et al., “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021, pp. 9992-10002, doi: 10.1109/ICCV48922.2021.00986.
[10] M. Shoaib et al., “An advanced deep learning models-based plant disease detection: A review of recent research,” Frontiers in Plant Science, vol. 14, Art. no. 1158933, 2023, doi: 10.3389/fpls.2023.1158933.
[11] A. Jafar, N. Bibi, R. A. Naqvi, A. Sadeghi-Niaraki, and D. Jeong, “Revolutionizing agriculture with artificial intelligence: plant disease detection methods, applications, and their limitations,” Frontiers in Plant Science, vol. 15, Art. no. 1356260, 2024, doi: 10.3389/fpls.2024.1356260.
[12] S. Hemalatha and J. J. B. Jayachandran, “A Multitask Learning-Based Vision Transformer for Plant Disease Localization and Classification,” International Journal of Computational Intelligence Systems, vol. 17, Art. no. 188, 2024, doi: 10.1007/s44196-024-00597-3.
[13] U. Barman et al., “ViT-SmartAgri: Vision Transformer and Smartphone-Based Plant Disease Detection for Smart Agriculture,” Agronomy, vol. 14, no. 2, Art. no. 327, 2024, doi: 10.3390/agronomy14020327.
[14] A. K. Singh et al., “Effective plant disease diagnosis using Vision Transformer trained with leafy-generative adversarial network-generated images,” Expert Systems with Applications, vol. 254, Art. no. 124387, 2024, doi: 10.1016/j.eswa.2024.124387.
[15] M. Xu, J.-E. Park, J. Lee, J. Yang, and S. Yoon, “Plant disease recognition datasets in the age of deep learning: challenges and opportunities,” Frontiers in Plant Science, vol. 15, Art. no. 1452551, 2024, doi: 10.3389/fpls.2024.1452551.
[16] A. Upadhyay et al., “Deep learning and computer vision in plant disease detection: a comprehensive review of techniques, models, and trends in precision agriculture,” Artificial Intelligence Review, vol. 58, Art. no. 92, 2025, doi: 10.1007/s10462-024-11100-x.
[17] S. Murugavalli and R. Gopi, “Plant leaf disease detection using vision transformers for precision agriculture,” Scientific Reports, vol. 15, Art. no. 22361, 2025, doi: 10.1038/s41598-025-05102-0.
[18] S. Yu, L. Xie, and L. Dai, “ST-CFI: Swin Transformer with convolutional feature interactions for identifying plant diseases,” Scientific Reports, vol. 15, Art. no. 25000, 2025, doi: 10.1038/s41598-025-08673-0.
[19] P. S. Roy and V. Kukreja, “Vision transformers for rice leaf disease detection and severity estimation: a precision agriculture approach,” Journal of the Saudi Society of Agricultural Sciences, vol. 24, no. 3, pp. 1-15, 2025, doi: 10.1007/s44447-025-00007-w.
[20] Z. Salman, A. M. Muhammad, and D. Han, “Plant disease classification in the wild using vision transformers and mixture of experts,” Frontiers in Plant Science, vol. 16, Art. no. 1522985, 2025, doi: 10.3389/fpls.2025.1522985.
[21] S. Aboelenin, F. A. Elbasheer, M. M. Eltoukhy, W. M. El-Hady, and K. M. Hosny, “A hybrid framework for plant leaf disease detection and classification using convolutional neural networks and vision transformer,” Complex & Intelligent Systems, vol. 11, Art. no. 142, 2025, doi: 10.1007/s40747-024-01764-x.
[22] A. Vyas, D. Patel, I. Kalal, and B. Patel, “Agro-Detect: A CNN Driven Early Detection of Leaf Diseases,” International Journal of Innovative Science and Research Technology, vol. 10, no. 7, pp. 855–862, Jul. 2025, doi: 10.38124/ijisrt/25jul707.
[23] A. Punitha, S. Syedakbar, and S. Jeyasudha, Eds., Intelligent and Sustainable Systems: AI, Green IoT, and Adaptive Automation in Electrical and Communication Technologies, 1st ed. Boca Raton, FL, USA: CRC Press, 2026. doi: 10.1201/9781003773801.