Pneumonia continues to be a serious health problem worldwide, especially for children under the age of five. As reported by the WHO, almost 740,000 children in this age group died from pneumonia in 2019, with most of these cases coming from rural areas where proper diagnostic facilities are limited. Chest X-rays are commonly used to detect pneumonia, but the accuracy of this method depends heavily on the radiologist’s experience. Early diagnosis is important, but not always available in low-resource regions.
Because of this, deep learning has gained attention as a possible alternative for automated pneumonia detection. Many models have been proposed, but it is still unclear which one works best on a low-cost, low-power device that can be deployed in rural setups. In this work, we compare several deep learning models—including a simple Sequential CNN, VGG16, ViT, MobileViT Hybrid, EfficientNet-V2S, and MobileNet-V3S—by running and evaluating them on a Raspberry Pi 4B. The goal is to find a model that gives good accuracy while still being efficient enough to run on a small device that can support healthcare in underserved areas.
Introduction
This study addresses the challenge of pneumonia detection from chest X-rays (CXRs) in low-resource settings, where early diagnosis is critical but trained radiologists and diagnostic tools may be scarce. It explores deep learning (DL) models for automated detection, emphasizing the need for architectures that are both accurate and efficient enough to run on low-cost devices like a Raspberry Pi 4B.
Key Points:
Problem and Motivation
Pneumonia is a leading cause of mortality, especially in children under five.
Rapid, automated detection can assist healthcare workers in rural or low-resource areas.
Models must balance diagnostic accuracy with edge deployability (low latency, small model size, high throughput).
Literature Review
Early efforts relied on conventional CNNs; newer work uses lightweight CNNs, hybrid CNN-Transformer models, and Vision Transformers (ViTs) to improve performance and efficiency.
Studies show lightweight models like MobileNet and hybrid architectures like MobileViT are promising for edge deployment.
Datasets such as CheXpert, ChestX-ray14, and Kermany are widely used benchmarks.
Prior research emphasizes that accuracy alone is insufficient; deployment metrics like latency, memory usage, and throughput are critical.
Dataset
5,856 CXR images from Women and Children Medical Centre, Guangzhou, China, labeled as Normal or Pneumonic.
Data split into training, validation, and test sets, with preprocessing (resizing, normalization) and augmentation (rotation, flipping, zoom) applied.
Methodology
Six models were evaluated: Sequential CNN (baseline), VGG16 Autoencoder, ViT, MobileNetV3-Small, EfficientNet-V2S, MobileViT Hybrid.
All models trained on a Tesla P-100 GPU with standard procedures (Adam optimizer, binary cross-entropy, class weighting, learning rate decay).
Models were converted to TensorFlow Lite (TFLite) and benchmarked on Raspberry Pi 4B for inference latency, throughput (FPS), model size, and Pi-side accuracy.
A weighted scoring system combined accuracy, latency/FPS, and model size to rank deployment readiness.
Results
On standard test datasets: VGG16 Autoencoder and MobileViT Hybrid showed strong accuracy (~83% and 81%), with high recall (~98%).
Raspberry Pi deployment:
MobileNetV3 Small and MobileViT Hybrid had the lowest latency (22–29 ms) and highest throughput (34–45 FPS).
ViT and large models were too big to run efficiently on the Pi.
VGG16 had high Pi-side accuracy (91.86%) but was slower and larger.
Weighted scoring balanced accuracy, efficiency, and size to determine the most practical model for real-world deployment.
Conclusion
This study evaluated six deep learning models for pneumonia detection, considering both classification performance and deployment feasibility on Raspberry Pi 4B. While models like VGG16 Autoencoder and ViT achieve high accuracy, their larger size and slower inference limit practical deployment. Lightweight models such as MobileNet V3S and MobileViT Hybrid offer a better balance, and based on our weighted scoring of accuracy, latency, throughput, and model size, MobileViT Hybrid is identified as the most suitable for edge deployment.
Future work could explore model optimization techniques like pruning, quantization, or knowledge distillation to further reduce latency and memory footprint. Incorporating attention-based pre-processing, multi-modal patient data, or evaluating other low-cost hardware platforms can enhance robustness and real-world applicability of pneumonia screening systems.
References
[1] Roy, S., Suresh, A., Sahu, P., & Gupta, T. R. (2025). Novel pooling-based VGG-Lite for pneumonia and Covid-19 detection from imbalanced chest X-ray datasets. arXiv preprint arXiv:2504.07468
[2] Xia, Zunhui & Li, Hongxing & Lan, Libin. (2025). MedFormer: Hierarchical Medical Vision Transformer with Content-Aware Dual Sparse Selection Attention. 10.48550/arXiv.2507.02488.
[3] Saber, A., Parhami, P., Siahkarzadeh, A., Fateh, M., & Fateh, A. (2024). Efficient and accurate pneumonia detection using a novel multi-scale transformer approach. arXiv preprint arXiv:2408.04290
[4] Bukhari, M. T. (2024). Efficacy of lightweight Vision Transformers in diagnosis of pneumonia. medRxiv.
[5] Siddiqi R, Javaid S. Deep Learning for Pneumonia Detection in Chest X-ray Images: A Comprehensive Survey. J Imaging. 2024 Jul 23;10(8):176. doi: 10.3390/jimaging10080176. PMID: 39194965; PMCID: PMC11355845.
[6] Al Reshan, R., Alzubaidi, L., Santamaría, J., Albahri, O. S., Fadhel, M. A., & Albahri, A. S. (2023). A comprehensive review of deep learning-based chest X-ray analysis for detecting thoracic diseases. Artificial Intelligence in Medicine, 140, 102483.
[7] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., & Xie, S. (2022). A ConvNet for the 2020s (ConvNeXt). Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11976–11986. [Often cited across 2022–2023 for medical imaging backbones.
[8] Pandey, S., Gunjan, V. K., & Nandi, G. C. (2022). Lightweight deep learning approach for pneumonia detection on embedded devices using quantized MobileNetV2. IEEE Access, 10, 120345–120357.
[9] Mehta, H., Pandey, S., & Nandi, G. C. (2022). Real-time low-power pneumonia detection on Raspberry Pi using quantized convolutional neural networks. Journal of Real-Time Image Processing, 19, 1781–1795.
[10] Almutairi, A., El Bashir, M. K., Alharbi, R., & Alsubaie, N. (2022). Deep learning models for COVID-19 and pneumonia detection from chest X-ray images: A comparative study. Computers in Biology and Medicine, 141, 105153.
[11] Tan, M., & Le, Q. V. (2021). EfficientNetV2: Smaller models and faster training. Proceedings of the 38th International Conference on Machine Learning (ICML), 10096–10106. PMLR.
[12] Apostolopoulos, I. D., & Mpesiana, T. A. (2020). COVID-19: Automatic detection from X-ray images utilizing transfer learning with convolutional neural networks. Physical and Engineering Sciences in Medicine, 43, 635–640.
[13] Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund, H., Haghgoo, B., Ball, R. L., Shpanskaya, K., Seekins, J., Mong, D. A., Halabi, S. S., Sandberg, J. K., Jones, R., Larson, D. B., Langlotz, C. P., Patel, B. N., Lungren, M. P., & Ng, A. Y. (2019). CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 590–597.
[14] Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., & Summers, R. M. (2017). ChestX-ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3462–3471.
[15] Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., Bagul, A., Ball, R. L., Langlotz, C., Shpanskaya, K., Lungren, M. P., & Ng, A. Y. (2017). CheXNet: Radiologist-level pneumonia detection on chest X-rays with deep learning. arXiv:1711.05225.
[16] Joshua, C., Karkala, S., Hossain, S., Krishnapatnam, M., Aggarwal, A., Zahir, Z., Pandhare, H. V., & Shah, V. (2025). Latency-Accuracy Trade-off Analysis in Edge-Based Object Detection Pipelines. Published June 12, 2025.
[17] Kermany, Daniel; Zhang, Kang; Goldbaum, Michael (2018), “Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification”, Mendeley Data, V2, doi: 10.17632/rscbjbr9sj.2