Automatic flower classification plays a significant role in agriculture, biodiversity monitoring, medicinal plant identification, and smart farming applications. However, fine-grained flower recognition remains challenging due to high intra-class similarity, inter-class overlap, background complexity, and illumination variations. Traditional single Convolutional Neural Network (CNN) architectures often fail to capture both global semantic information and dense local discriminative features effectively. In this paper, we propose a Hybrid Attention-Based Deep Ensemble Network (HADE-Net) for multi-class flower recognition by integrating EfficientNetV2-S and DenseNet201 with an adaptive attention-based feature fusion mechanism. Unlike conventional ensemble approaches that rely on static feature concatenation or Support Vector Machine (SVM)-based classification, the proposed method employs learnable attention gating for dynamic feature weighting and Generalized Mean (GeM) pooling for improved feature aggregation. A two-stage transfer learning strategy with selective fine-tuning, label smooth-ing, and Test-Time Augmentation (TTA) is further applied to improve robustness and generalization. Experimental evaluation on the Kaggle Flower Recognition dataset containing five flower classes demonstrates that the proposed model achieves 95.10% validation accuracy and 98.50% Top-2 accuracy, outperforming conventional CNN and prior ensemble-based approaches. The results confirm that attention-driven hybrid architectures sig-nificantly enhance classification performance and robustness in fine-grained image recognition tasks.
Introduction
The text discusses the development of an improved deep learning approach for flower species classification, which is challenging due to high visual similarity between species and the limitations of manual identification.
Traditional CNN-based models perform well in image classification, but single architectures struggle with fine-grained tasks like flower recognition. Although previous ensemble models (e.g., combining EfficientNet and DenseNet) achieved around 95% accuracy, they are limited by static feature fusion and lack adaptability.
To address these issues, the proposed method introduces a Hybrid Attention-Based Deep Ensemble Network (HADE-Net) that incorporates dynamic attention-based feature fusion, Generalized Mean (GeM) pooling, and Test-Time Augmentation (TTA). These enhancements improve feature extraction, adaptability, and classification robustness.
The literature review highlights the evolution from handcrafted feature methods to deep learning models such as AlexNet, VGGNet, GoogLeNet, ResNet, DenseNet, and EfficientNet. It also discusses advances in ensemble learning, attention mechanisms, and pooling strategies, all contributing to improved image classification performance. However, existing methods still lack adaptive fusion and advanced feature integration.
Experimental results show that the proposed model achieves 95.10% validation accuracy and 98.50% Top-2 accuracy, with strong class-wise performance, especially for visually distinct flowers like dandelions and sunflowers. Some misclassification occurs between similar classes like roses and tulips. Overall, the model demonstrates improved accuracy, stability, and generalization compared to baseline methods.
Conclusion
This paper presents a Hybrid Attention-Based Deep En-semble Network for flower recognition. By integrating EfficientNetV2-S and DenseNet201 with attention-based fu-sion and GeM pooling, the proposed model achieves 95.10% validation accuracy and 98.50% Top-2 accuracy. Compared to previous ensemble models, the proposed method provides improved feature representation and robustness.
References
[1] M. F. Rabbi et al., “An Ensemble-based Deep Learning Model for Multi-class Flower Recognition,” 2023.
[2] M. Tan and Q. Le, “EfficientNet,” ICML, 2019.
[3] K. He et al., “Deep Residual Learning,” CVPR, 2016.
[4] G. Huang et al., “DenseNet,” CVPR, 2017.
[5] D. Kingma and J. Ba, “Adam Optimizer,” ICLR, 2015.
[6] N. Kumar et al., “Leafsnap: A Computer Vision System for Automatic Plant Species Identification,” ECCV, 2012.
[7] M. Tan and Q. Le, “EfficientNetV2: Smaller Models and Faster Train-ing,” ICML, 2021.
[8] Z.-H. Zhou, “Ensemble Methods: Foundations and Algorithms,” 2012.
[9] A. Vaswani et al., “Attention Is All You Need,” NeurIPS, 2017.
[10] S. Woo et al., “CBAM: Convolutional Block Attention Module,” ECCV, 2018.
[11] F. Radenovic´ et al., “Fine-tuning CNN Image Retrieval with No Human Annotation,” IEEE TPAMI, 2019.
[12] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei,“ImageNet: A Large-Scale Hierarchical Image Database,” Proc. IEEE Conf. Com-puter Vision and Pattern Recognition (CVPR), 2009, pp. 248–255.
[13] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al.,“Going Deeper with Convolutions,” Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9.
[14] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” International Conference on Learning Representations (ICLR), 2015.
[15] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classifica-tion with Deep Convolutional Neural Networks,” Advances in Neural Information Processing Systems (NeurIPS), 2012, pp. 1097–1105.