Performance Benchmarking of CNN Architectures for Fine-Grained Image Classification on CIFAR-100

Authors: Tanisha Khagram, Priyanshi Chejara, Harshpratapsingh Rathore, Dharam Avaiya, Pramit Savaliya, Jitendrakumar B. Upadhyay

DOI Link: https://doi.org/10.22214/ijraset.2025.75713

Certificate: View Certificate

Abstract

Fine-grained image classification on small-resolution datasets, such as CIFAR-100, remains challenging given the high intra-class similarities and limited visual detail. To understand the different deep learning models dealing with such constraints, this work benchmarks three popular CNN architectures: ResNet-18, GoogLeNet, and EfficientNet, strictly under the same conditions. All three were trained from scratch on the CIFAR-100 dataset using uniform data pre-processing and augmentation, optimization settings, and evaluation metrics for fairness in comparison. Performance evaluations are made in terms of test accuracy, precision, recall, and F1-score. Observations of training stability and generalization behaviors were also considered. Experimental results indicate that among the three models, ResNet-18 achieved the highest test accuracy, EfficientNet provided a good trade-off between accuracy and computational efficiency, while GoogLeNet showed the lowest performance due to optimization instability on small-sized images. With this, there is a clear insight into how architectural differences underpin performance diversity among models for handling fine-grained classification tasks and provide, for the first time, a clear baseline on how to select between CNN models based on accuracy requirements and resource constraints.

Introduction

Image classification is central to computer vision applications like object detection, autonomous driving, and medical imaging. Convolutional Neural Networks (CNNs) dominate this field due to their ability to hierarchically learn from raw pixel data. However, challenges arise when handling fine-grained, low-resolution datasets like CIFAR-100, where classes have subtle visual differences and images are only 32×32 pixels.

Dataset

CIFAR-100: 60,000 images, 100 classes, 32×32 resolution.
Fine-grained classes make it a strong benchmark for evaluating generalization under constrained visual information.

CNN Architectures Compared

ResNet-18: Uses residual connections for stable gradient flow and deeper networks.
GoogLeNet: Employs Inception modules for multi-scale feature extraction.
EfficientNet: Uses compound scaling to balance depth, width, and resolution for accuracy-efficiency trade-offs.

These models were trained from scratch under identical conditions to isolate architectural effects.

Methodology

Training conditions kept uniform: optimizer (Adam), batch size, loss function (categorical cross-entropy), augmentation (random crop, flip, normalization), and training epochs.
Experiments run on Google Colab GPU environment using TensorFlow/Keras.
No pretrained weights or aggressive augmentation were used to ensure pure architecture comparison.

Experimental Results

Model	Train Accuracy	Test Accuracy
ResNet-18	96%	65%
EfficientNet	82%	62%
GoogLeNet	52%	42%

ResNet-18: Highest test accuracy, smooth training curves, slight overfitting.
EfficientNet: Strong accuracy, moderate overfitting; benefits from heavier augmentation.
GoogLeNet: Lowest accuracy, unstable training, sensitive to learning rate and regularization.

Insights

ResNet-based architectures consistently outperform others on CIFAR-100 due to effective gradient flow and representational capacity.
EfficientNet performs well but requires pretraining or strong augmentation for optimal results.
GoogLeNet’s multi-scale design is less effective on low-resolution, fine-grained datasets.
Architectural differences alone significantly impact accuracy, stability, and generalization, highlighting the importance of model selection for fine-grained tasks in resource-constrained settings.

Key Takeaway:
For fine-grained, low-resolution image classification on CIFAR-100, ResNet-18 is the most reliable architecture, offering the best trade-off between accuracy and stability, followed closely by EfficientNet, while GoogLeNet lags due to sensitivity to resolution and weak generalization.

Conclusion

This study compared three widely used CNN architectures—ResNet-18, EfficientNet-B0, and GoogLeNet—under identical training conditions to identify the most effective model for fine-grained image classification on the CIFAR-100 dataset. Literature consistently reports that ResNet-based models achieve stronger accuracy than EfficientNet-B0 and GoogLeNet when trained without heavy augmentation or transfer learning. Our experimental results aligned with these findings: ResNet-18 achieved the highest test accuracy (65%), followed by EfficientNet (62%), while GoogLeNet performed the lowest (42%). By combining insights from prior research and our own implementation, it is clear that ResNet-18 is the most reliable and best-performing architecture among the three for CIFAR-100 in standard training settings. Its residual connections allow stable optimization and better generalization compared to the other models. EfficientNet provides a good balance of accuracy and efficiency, while GoogLeNet, despite being lightweight, struggles on fine-grained classes without extensive tuning. Overall, ResNet-18 stands out as the recommended model when aiming for strong performance on small-resolution, fine-grained image classification tasks.

References

[1] Krizhevsky, Learning Multiple Layers of Features from Tiny Images, Technical Report, University of Toronto, 2009. (CIFAR-100 Dataset) [2] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016. (ResNet) [3] S. Zagoruyko and N. Komodakis, “Wide Residual Networks,” British Machine Vision Conference (BMVC), 2016. (ResNet variants on CIFAR-100) [4] M. S. Hanif et al., “Competitive Residual Neural Networks for Image Classification,” 2020. (ResNet improvements on CIFAR) [5] C. Szegedy et al., “Going Deeper with Convolutions,” CVPR, 2015. (GoogLeNet) [6] N. Sharma, V. Jain, and A. Mishra, “An Analysis of Convolutional Neural Networks for Image Classification,” Procedia Computer Science, 2018. (GoogLeNet CIFAR-100 Analysis) [7] M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” International Conference on Machine Learning (ICML), 2019. (EfficientNet) [8] P. Jain et al., “Performance Evaluation of EfficientNet Models on CIFAR-10 and CIFAR-100,” 2022. (EfficientNet-B0/B1 CIFAR-100 Results) [9] X. Li and S. Kim, “Comparative Evaluation of CNN Architectures on CIFAR-10 and CIFAR-100,” 2021. (ResNet vs GoogLeNet) [10] A. Kumar and S. Reddy, “Performance Study of EfficientNet and ResNet Models for Fine-Grained Image Classification,” 2023. (EfficientNet vs ResNet on CIFAR-100)

Copyright

Copyright © 2025 Tanisha Khagram, Priyanshi Chejara, Harshpratapsingh Rathore, Dharam Avaiya, Pramit Savaliya, Jitendrakumar B. Upadhyay. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET75713

Publish Date : 2025-11-22

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here