This paper presents a novel multimodal deep learning framework for maple plant disease detection by integrating visual and semantic information. Traditional plant disease detection systems rely primarily on visual features extracted from leaf images, which often leads to misclassification in cases of visually similar disease symptoms. To address this limitation, the proposed approach combines EfficientNet-based convolutional neural networks for visual feature extraction with transformer-based language models, including BERT and FLAN-T5, for semantic feature encoding. A Multilayer Perceptron (MLP)-based fusion mechanism is employed to integrate visual and textual features, enabling effective cross-modal learning. The proposed model is evaluated on a balanced dataset of 2,000 maple leaf images and associated disease descriptions. Experimental results demonstrate that the multimodal framework achieves an accuracy of 94.8%, outperforming vision-only and text-only models by a significant margin. Ablation studies and comparative analysis confirm the effectiveness of multimodal fusion and transformer-based semantic encoding. The proposed framework provides a robust and scalable solution for intelligent plant disease detection and has potential applications in smart agriculture systems.
Introduction
It explains that agriculture suffers major losses due to plant diseases, and traditional manual diagnosis is slow, subjective, and error-prone. While deep learning models like CNNs (e.g., VGGNet, ResNet, EfficientNet) have improved image-based disease detection, they still struggle with visually similar diseases and environmental variations.
To overcome this limitation, the study proposes a multimodal framework that integrates:
EfficientNet for extracting visual features from leaf images
Transformer models (BERT and FLAN-T5) for extracting semantic information from textual disease descriptions
MLP-based feature fusion to combine both modalities for final classification
The model is trained and evaluated on a dataset of maple leaf images with corresponding disease descriptions. Experimental results show that the multimodal system achieves higher accuracy (about 94.8%) compared to image-only or text-only approaches, especially in cases where diseases look visually similar.
The literature review highlights the evolution from traditional image processing to deep learning and then to multimodal learning. It identifies key gaps such as reliance on visual-only methods and weak fusion strategies. The proposed work addresses these issues using deeper cross-modal integration.
Conclusion
This paper presented a novel multimodal deep learning framework for maple plant disease detection by integrating visual and semantic information. The proposed approach addresses the limitations of traditional vision-based methods, which often struggle to distinguish between diseases with similar visual characteristics.
The framework combines EfficientNet-based convolutional neural networks for visual feature extraction with transformer-based language models, including BERT and FLAN-T5, for semantic feature encoding. These heterogeneous features are effectively fused using a Multilayer Perceptron (MLP), enabling the model to learn complex cross-modal relationships.
Experimental evaluation on a balanced dataset of 2,000 maple leaf images demonstrated that the proposed multimodal model achieves superior performance compared to unimodal approaches. Specifically, the model achieved an accuracy of 94.8%, outperforming the vision-only EfficientNet model (89.2%) and text-only models (81.6%–83.4%). The improvement of approximately 5–6% highlights the effectiveness of integrating semantic knowledge with visual features.
The confusion matrix analysis showed strong diagonal dominance, indicating high classification accuracy across all disease classes. Minor misclassifications were primarily observed between visually similar diseases such as anthracnose and tar spot. The ablation study further confirmed that both semantic information and feature fusion play a critical role in improving performance.
Overall, the results demonstrate that multimodal learning provides a robust and effective solution for plant disease detection. The proposed framework enhances classification accuracy, reduces ambiguity, and improves generalization capability, making it suitable for real-world agricultural applications.
References
[1] G. Gupta and S. Kumar Pal, “Applications of AI in precision agriculture,” Discov. Agric., 2025, doi: 10.1007/s44279-025-00220-9.
[2] Olabimpe Banke Akintuyi, “AI in agriculture: A comparative review of developments in the USA and Africa,” Open Access Res. J. Sci. Technol., 2024, doi: 10.53022/oarjst.2024.10.2.0051.
[3] G. Hampel and Z. Fabulya, “The Risks of AI in Agriculture,” Analecta Tech. Szeged., 2024, doi: 10.14232/analecta.2024.4.32-44.
[4] P. Sharma, A. Sanghi, G. Agarwal, and R. Agarwal, “AI in Agriculture: the Future of Farming.,” Grenze Int. J. Eng. & Technol., 2025.
[5] A. A. Mana, A. Allouhi, A. Hamrani, S. Rahman, I. el Jamaoui, and K. Jayachandran, “Sustainable AI-based production agriculture: Exploring AI applications and implications in agricultural practices,” Smart Agric. Technol., 2024, doi: 10.1016/j.atech.2024.100416.
[6] U. A. Okengwu, L. N. Onyejegbu, L. U. Oghenekaro, M. O. Musa, and A. O. Ugbari, “Environmental and ethical negative implications of AI in agriculture and proposed mitigation measures,” Sci. Africana, 2023, doi: 10.4314/sa.v22i1.13.
[7] T. Miller, G. Mikiciuk, I. Durlik, M. Mikiciuk, A. ?obodzi?ska, and M. ?nieg, “The IoT and AI in Agriculture: The Time Is Now—A Systematic Review of Smart Sensing Technologies,” 2025. doi: 10.3390/s25123583.
[8] Adebunmi Okechukwu Adewusi, Onyeka Franca Asuzu, Temidayo Olorunsogo, Temidayo Olorunsogo, Ejuma Adaga, and Donald Obinna Daraojimba, “AI in precision agriculture: A review of technologies for sustainable farming practices,” World J. Adv. Res. Rev., 2024, doi: 10.30574/wjarr.2024.21.1.0314.
[9] M. Karnawat, S. K. Trivedi, D. Nagar, and R. Nagar, “Future of AI in Agriculture,” Biot. Res. Today, 2020.
[10] M. Javaid, A. Haleem, I. H. Khan, and R. Suman, “Understanding the potential applications of Artificial Intelligence in Agriculture Sector,” Adv. Agrochem, 2023, doi: 10.1016/j.aac.2022.10.001.
[11] A. Ahmad et al., “AI can empower agriculture for global food security: challenges and prospects in developing nations,” 2024. doi: 10.3389/frai.2024.1328530.
[12] Olabimpe Banke Akintuyi, “Adaptive AI in precision agriculture: A review: Investigating the use of self-learning algorithms in optimizing farm operations based on real-time data,” Open Access Res. J. Multidiscip. Stud., 2024, doi: 10.53022/oarjms.2024.7.2.0023.