Fine-grained image classification on small-resolution datasets, such as CIFAR-100, remains challenging given the high intra-class similarities and limited visual detail. To understand the different deep learning models dealing with such constraints, this work benchmarks three popular CNN architectures: ResNet-18, GoogLeNet, and EfficientNet, strictly under the same conditions. All three were trained from scratch on the CIFAR-100 dataset using uniform data pre-processing and augmentation, optimization settings, and evaluation metrics for fairness in comparison. Performance evaluations are made in terms of test accuracy, precision, recall, and F1-score. Observations of training stability and generalization behaviors were also considered. Experimental results indicate that among the three models, ResNet-18 achieved the highest test accuracy, EfficientNet provided a good trade-off between accuracy and computational efficiency, while GoogLeNet showed the lowest performance due to optimization instability on small-sized images. With this, there is a clear insight into how architectural differences underpin performance diversity among models for handling fine-grained classification tasks and provide, for the first time, a clear baseline on how to select between CNN models based on accuracy requirements and resource constraints.
Introduction
Image classification is central to computer vision applications like object detection, autonomous driving, and medical imaging. Convolutional Neural Networks (CNNs) dominate this field due to their ability to hierarchically learn from raw pixel data. However, challenges arise when handling fine-grained, low-resolution datasets like CIFAR-100, where classes have subtle visual differences and images are only 32×32 pixels.
Fine-grained classes make it a strong benchmark for evaluating generalization under constrained visual information.
CNN Architectures Compared
ResNet-18: Uses residual connections for stable gradient flow and deeper networks.
GoogLeNet: Employs Inception modules for multi-scale feature extraction.
EfficientNet: Uses compound scaling to balance depth, width, and resolution for accuracy-efficiency trade-offs.
These models were trained from scratch under identical conditions to isolate architectural effects.
Methodology
Training conditions kept uniform: optimizer (Adam), batch size, loss function (categorical cross-entropy), augmentation (random crop, flip, normalization), and training epochs.
Experiments run on Google Colab GPU environment using TensorFlow/Keras.
No pretrained weights or aggressive augmentation were used to ensure pure architecture comparison.
Experimental Results
Model
Train Accuracy
Test Accuracy
ResNet-18
96%
65%
EfficientNet
82%
62%
GoogLeNet
52%
42%
ResNet-18: Highest test accuracy, smooth training curves, slight overfitting.
EfficientNet: Strong accuracy, moderate overfitting; benefits from heavier augmentation.
GoogLeNet: Lowest accuracy, unstable training, sensitive to learning rate and regularization.
Insights
ResNet-based architectures consistently outperform others on CIFAR-100 due to effective gradient flow and representational capacity.
EfficientNet performs well but requires pretraining or strong augmentation for optimal results.
GoogLeNet’s multi-scale design is less effective on low-resolution, fine-grained datasets.
Architectural differences alone significantly impact accuracy, stability, and generalization, highlighting the importance of model selection for fine-grained tasks in resource-constrained settings.
Key Takeaway:
For fine-grained, low-resolution image classification on CIFAR-100, ResNet-18 is the most reliable architecture, offering the best trade-off between accuracy and stability, followed closely by EfficientNet, while GoogLeNet lags due to sensitivity to resolution and weak generalization.
Conclusion
This study compared three widely used CNN architectures—ResNet-18, EfficientNet-B0, and GoogLeNet—under identical training conditions to identify the most effective model for fine-grained image classification on the CIFAR-100 dataset. Literature consistently reports that ResNet-based models achieve stronger accuracy than EfficientNet-B0 and GoogLeNet when trained without heavy augmentation or transfer learning. Our experimental results aligned with these findings: ResNet-18 achieved the highest test accuracy (65%), followed by EfficientNet (62%), while GoogLeNet performed the lowest (42%).
By combining insights from prior research and our own implementation, it is clear that ResNet-18 is the most reliable and best-performing architecture among the three for CIFAR-100 in standard training settings. Its residual connections allow stable optimization and better generalization compared to the other models. EfficientNet provides a good balance of accuracy and efficiency, while GoogLeNet, despite being lightweight, struggles on fine-grained classes without extensive tuning. Overall, ResNet-18 stands out as the recommended model when aiming for strong performance on small-resolution, fine-grained image classification tasks.
References
[1] Krizhevsky, Learning Multiple Layers of Features from Tiny Images, Technical Report, University of Toronto, 2009. (CIFAR-100 Dataset)
[2] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016. (ResNet)
[3] S. Zagoruyko and N. Komodakis, “Wide Residual Networks,” British Machine Vision Conference (BMVC), 2016. (ResNet variants on CIFAR-100)
[4] M. S. Hanif et al., “Competitive Residual Neural Networks for Image Classification,” 2020. (ResNet improvements on CIFAR)
[5] C. Szegedy et al., “Going Deeper with Convolutions,” CVPR, 2015. (GoogLeNet)
[6] N. Sharma, V. Jain, and A. Mishra, “An Analysis of Convolutional Neural Networks for Image Classification,” Procedia Computer Science, 2018. (GoogLeNet CIFAR-100 Analysis)
[7] M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” International Conference on Machine Learning (ICML), 2019. (EfficientNet)
[8] P. Jain et al., “Performance Evaluation of EfficientNet Models on CIFAR-10 and CIFAR-100,” 2022. (EfficientNet-B0/B1 CIFAR-100 Results)
[9] X. Li and S. Kim, “Comparative Evaluation of CNN Architectures on CIFAR-10 and CIFAR-100,” 2021. (ResNet vs GoogLeNet)
[10] A. Kumar and S. Reddy, “Performance Study of EfficientNet and ResNet Models for Fine-Grained Image Classification,” 2023. (EfficientNet vs ResNet on CIFAR-100)