Skin lesion is one of the most common types of cancer globally, and early recognition is essential for a better prognosis for patients. In this paper, we built a machine learning-based image classification system to classify dermoscopic skin lesion images into a range of diseases. We adopted the publicly available HAM10000 dataset and preprocessed it for model optimization and the prevention of overfitting, including image resizing, normalization, and data augmentation. We also investigated class distribution to see the data imbalance and give a hint for model training. To demonstrate the classification ability, we conducted the experiment with four models, including SVM, CNN, VGG16, and ResNet50.The SVM achieved an accuracy of approximately 80%, while the CNN improved this to 92%. The VGG16 model further increased the accuracy to 94%. ResNet50 outperformed all other models, achieving the highest accuracy of 95%. Our results demonstrate that deep learning models, particularly ResNet50 and VGG16, are highly effective in skin lesion classification and have significant potential for supporting early skin cancer diagnosis and aiding healthcare professionals in clinical decision-making.
Introduction
Skin lesion cancer is a widespread and deadly disease, where early diagnosis significantly improves treatment outcomes. Manual diagnosis by doctors can be time-consuming and subjective, so AI and deep learning systems have been developed to automatically classify skin lesions from images, enhancing diagnostic speed and accuracy.
This study uses the HAM10000 dataset, consisting of over 10,000 dermoscopic images across seven lesion types. After preprocessing and data augmentation to address class imbalance, four models were trained and compared: Support Vector Machine (SVM), Convolutional Neural Network (CNN), ResNet50, and VGG16.
CNN achieved about 93% accuracy with strong generalization.
ResNet50 (using transfer learning) performed best, with over 95% accuracy due to its deep architecture and residual connections.
VGG16 (also transfer learning-based) showed comparable high accuracy (~94.6%).
SVM performed worst (~80%), even when using features extracted from CNN layers.
Challenges remain including dataset imbalance, similarity between lesion types, and the high computational cost of deep models, limiting real-time clinical use. Nonetheless, these AI models show promise in assisting early, reliable skin cancer diagnosis by automating lesion classification from images.
Conclusion
This paper demonstrates the use of machine learning and deep learning models with great success for automatic classification of skin lesions. To leverage the HAM10000 dataset and appropriately preprocess the data including addressing data imbalances and the use of different training strategies, we have been successful in optimizing the use of the models for training. The models performed similarly to one another in various ways: SVM typically gave us baseline performance, however, CNN and VGG16 showed noteworthy improvements. ResNet50 achieved the highest accuracy, which suggests its superior ability to extract features, generalization to unseen data, and model regularization. Therefore, we have shown that the deep learning techniques, especially used through transfer learning and pretrained models like ResNet50 with a good dataset like HAM10000 can be a valuable tool in early skin cancer detection, even as an assistive technology, to enable faster and more accurate diagnoses by medical practitioners.
References
[1] S. S. B. T. Sathvika, N. Anmisha, V. Thanmayi, M. Suchetha, D. E. Dhas, S. Sehastrajit, and S. N. Aakur, “Pipelined structure in the classification of skin lesions based on AlexNet, CNN and SVM model with bi-sectional texture features,” IEEE Access, vol. 12, pp. 39842–39856, Apr. 2024.
[2] Mengistu, A. and Dagnachew Melesew Alemayehu. “Computer Vision for Skin Cancer Diagnosis and Recognition using RBF and SOM.” (2015).
[3] K. Scott Mader, \"Deep Learning Skin Lesion Classification,\" Kaggle, 2018. [Online]. Available: https://www.kaggle.com/code/kmader/deep-learning-skin-lesion-classification
[4] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, MA: MIT Press, 2016. ISBN: 978-0262035613.
[5] Mahbod, A. et al. “Transfer learning using a multi-scale and multi-network ensemble for skin lesion classification. Compute”. Methods Program. Biomed. 193, 105475 (2020).
[6] He, K., Zhang, X., Ren, S., & Sun, J. (2016). “Deep Residual Learning for Image Recognition”. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778.
[7] Agarwal, P., & Hanmandlu, M. (2021). “Deep transfer learning for skin lesion classification with class imbalance handling”. Biomedical Signal Processing and Control, 66, 102419.
[8] Datta, S. K., Shaikh, M. A., Srihari, S. N., & Gao, M. (2021). “Soft-attention improves skin cancer classification performance using VGG, ResNet, and InceptionResNetV2 models trained on the HAM10000 dataset”. Proceedings of the International Conference on Machine Learning and Data Engineering (ICMLDE).
[9] Tschandl, P., Rosendahl, C., & Kittler, H. (2018).The HAM10000 dataset: “A large collection of dermatoscopic images for pigmented skin lesion classification”. Scientific Data, 5, 180161
[10] Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S. (2017).“Dermatologist-level classification of skin cancer with deep neural networks”. Nature, 542(7639).
[11] Khan, M. A., Akram, T., Sharif, M., Javed, K., Raza, M., & Saba, T. (2020). “Skin lesion classification using deep learning and transfer learning architectures”. Health Informatics Journal, 26(4), 2733–2749
[12] Revathy, B. D., Vijay, R., Ullas, B., S. N. H., & Dhyan M. (2023). “Skin Cancer Detection Using CNN (Convolution Neural Network) with AI Medical Assistant”. International Journal For Multidisciplinary Research (IJFMR), 5(3), 45–51. E-ISSN: 2582-2160