Classification and Prediction of Genetic Disorder

Authors: G K SHARMILA, Gari Sai Ganesh, Nandakumar , Sathish Kumar Gali

DOI Link: https://doi.org/10.22214/ijraset.2025.71742

Abstract

Through the analysis of medical records that contain vital health information, this initiative aims to anticipate genetic abnormalities in children. As the population grows,genetic disorders—which are frequently caused by chromosomal abnormalities or DNA mutations— become more common. To reduce the prevalence of inherited diseases, early discovery through genetic testing is essential for prevention and treatment, especially during pregnancy. This work intends to increase diagnostic accuracy, raise awareness of the value of genetic testing, and assist prompt medical interventions to lower the child mortality linked to these disorders by applying machine learning techniques to extensive datasets.

Introduction

The team expresses gratitude to their Principal, Dr. T. Hanumantha Reddy, HOD Dr. H. Girisha, and faculty member G.K. Sharmila for their support, guidance, and encouragement throughout the project.

1. Introduction & Background

Genetic disorders result from DNA abnormalities and can include single-gene mutations, chromosomal issues, or complex conditions. As such conditions grow more common, early diagnosis is essential. However, many cases go undetected due to lack of awareness and access to genetic testing—particularly in prenatal care.

The project proposes using machine learning (ML) to predict genetic disorders from medical data, helping improve early detection, support preventive healthcare, and enhance patient outcomes.

2. Problem Statement

Traditional diagnostics are limited and often inaccessible. The goal is to build a machine learning model that can predict the likelihood of genetic disorders in children using historical and clinical data, assisting in earlier and more informed medical decisions.

3. Objectives

Understand genetic disorders (types, causes, patterns).
Enable early diagnosis and risk assessment.
Build predictive ML models to identify patterns in medical/genomic data.
Address ethical concerns (e.g., privacy, consent, discrimination).
Enhance public health through targeted genetic screening and counseling.

4. Literature Survey

The survey references key works in:

Preprocessing medical data (Zhang et al.)
Predictive ML models (Smith et al., Clark & Harris)
Ensemble and deep learning methods (Patel et al., Gupta et al.)
Practical clinical implementation (Walker et al.)
Feature selection & interpretability (Davis & White, SHAP, LIME)

These studies show that combining data preprocessing, ensemble learning, and interpretability techniques enhances prediction accuracy and model trust.

5. Proposed Methodology (Phases)

a) Data Collection

Gather diverse, reliable medical/genomic datasets, including demographics, family history, and clinical results.

b) Data Preprocessing

Clean, normalize, and transform data. Handle missing values, encode categories, detect outliers, and select important features.

c) Model Selection

Use multiple ML models:

Decision Trees
Random Forest
Support Vector Machines
Neural Networks
XGBoost (Ensemble Learning)

Models are trained with cross-validation for reliability.

d) Model Evaluation

Metrics: Accuracy, Precision, Recall, F1-score, ROC/AUC. Confusion matrix is used to analyze false positives/negatives.

e) Feature Importance

Apply SHAP or LIME to ensure predictions are interpretable and transparent for medical professionals.

f) Model Deployment

Build a web/mobile interface for clinicians to input data and receive real-time predictions. Include feedback loop for continual improvement.

g) Data Privacy & Security

Comply with standards like HIPAA. Use encryption and secure storage to protect sensitive medical data.

6. System Functional Specifications

A. User Functions

Registration/Login: Secure signup with hashed passwords, stored in SQLite.
Prediction Interface: Input medical/genetic data to receive ML-based predictions.
Results Page: Displays predicted disorder type, subtype, and description.

B. Prediction System

Backend uses a pre-trained model (likely Random Forest) to analyze inputs and return disorder predictions.
Results are mapped and shown clearly to users for easy understanding.

7. Database & File Structure

Database (SQLite)

Users Table: Stores user credentials and contact info.
Predictions Table: Stores input data, prediction results, and descriptions linked to the user.

File Structure Highlights

Organized for clarity, maintainability, and scalability.
Includes model files (GNB.pkl), helper scripts (create_help.py), and HTML pages for user interface.

Conclusion

A novel strategy is being developed to predict hereditary genetic disorders using machine learning models, offering more accurate and reliable predictions compared to traditional diagnostic methods. This system processes patient data through advanced preprocessing, feature engineering, and model training to deliverprecise outcomes. It holds significant potential for early diagnosis, prevention, and management of genetic disorders, contributing to improved patient outcomes and better healthcare decision-making. The system continually evolves by integrating feedback and refining models iteratively, ensuring its ongoing relevance and effectiveness. By providing healthcare professionals with insightful data, the initiative showcases how data- driven approaches can revolutionize healthcare, particularly in genetic testing. Early prediction of genetic disorders—before birth or in childhood—opens up new opportunities for preventative care, allowing for timely interventions and informed treatment plans. Although the project demonstrates significant potential, it also presents challenges, such as the need for comprehensive, high-quality datasets and the complexities of refining machine learning models with real-world data. As healthcare technologies advance and access to medical data increases,these hurdlescanbe addressed.Overall,thissystemrepresentsa significantsteptowardharnessing AI to enhancediagnosticaccuracy,supportpreventativemeasures,andimproveglobalhealthcareoutcomes.

References

[1] L. Zhang, P. Li, and S. Wang, “A review on data preprocessing techniques in medical data mining,” IEEE Transactions on Medical Imaging, vol. 40, no. 6, pp. 1450-1465, Jun. 2021. [2] J. Smith, A. Brown, and R. Wilson, “A machine learning approach for predicting genetic disorders,” Journal of Medical Genetics, vol. 45, no. 3, pp. 234-240, Mar. 2022. [3] R. Clark and F. Harris, “Predictive models for hereditary diseases using ensemble learning techniques,” Proceedings of the IEEE International Conference on Data Science and Machine Learning, pp. 450-455, Apr. 2022. [4] N. Davis and T. White, “Feature selection methods in predictive healthcare models,” Health Informatics Journal, vol. 27, no. 4, pp. 350-360, Nov. 2022. [5] T. Nguyen, L. Green, and J. Parker, “Big data and machine learning in genetic disorder prediction,” IEEE Access, vol. 10, pp. 55890-55902, Dec. 2022. [6] M. Patel, R. Sharma, and D. Kumar, “Application of deep learning in genetic disorder prediction,” International Journal of Health Informatics, vol. 18, no. 2, pp. 112-120, Feb. 2023. [7] C. Johnson, K. Patel, and H. Lee, “Machine learning models for predicting genetic mutations in children,” IEEE Transactions on Bioinformatics, vol. 39, no. 2, pp. 98-104, Feb. 2023. [8] S. Thompson and J. Moore, “Genetic testing for rare diseases: A predictive model approach,” Journal of Biomedical Informatics, vol. 35, no. 5, pp. 601-612, May 2023. [9] A. Gupta, M. Desai, and K. Singh, “Improving genetic disorder diagnosis using machine learning,” International Journal ofArtificialIntelligence in Medicine,vol.15, no. 1, pp. 65-75, Jan. 2024. [10] J. Walker, P. Hall, and M. Rogers, “Implementing machine learning for genetic disorder prediction in clinical settings,” Journalof Healthcare Technology, vol. 10, no. 3, pp. 180-192, Mar. 2024. [11] ElavarasiT,MariappanP,Adeeplearningapproachtodetectgeneticbaseddiseaseinpregnancyperiod [12] A Sangeetha, a., Ananthi, b. (2020). Genetic algorithm for feature selection to improve heart disease prediction by support vector machine. volume: 07 issue: 01, january 2020. [13] Singh, S., Shukla, G., Agrawal, R., Dhule, C., Allabun, S., Alqahtani, M. S., Othman, M., Abbas, M., Soufiene, B. O. 2024. Enhancing Genomic Disorder Prediction Through Feynman Concordance AndInterpolated Nearest Centroid Techniques. National Library of Medicine [14] Rathod, S. V. K., Maruthiram, B. 2024. Advance genome disorder prediction model empowered with machine learning. IJCRT, 12(7), July 2024. ISSN: 2320-2882 [15] Janssens, A. C. J. W., van Duijn, C. M. (2009). Genome-based prediction of common diseases: Methodological considerations for future research. Genome Medicine [16] Raza, A., Rustam, F., Siddiqui, H. U. R., de laTorre Diez, I., GarciaZapirain, B., Lee, E., Ashraf, I. (2022). Predictinggeneticdisorderandtypesofdisorderusingchainclassifierapproach.NationalLibraryofMedicine [17] Atta-Ur-Rahman,M.,Zubair,M.,Nasir,M.U.,Gollapalli,M.,Saleem,M.A.,Mehmood,S.,Khan,M.A., [18] Mosavi,A.(2022).Advancegenomedisorderpredictionmodelempoweredwithdeeplearning [19] Sudha, V. P., M S, V. (2019). Deep learning based prediction of autism spectrum disorder using codon encoding of gene sequences. Journal Name, 9(1), October 2019. ISSN: 2249-8958 [20] Gayathri, T. T. (2017). Analysis of genomic sequences for prediction of cancerous cells using wavelet technique.April 201 [21] Sudha, V., Girijamma, D., Pragati, S. (2017). Classification of health disorder based on DNA technology. InternationalResearchJournalofEngineeringandTechnology(IRJET),4(5), May2017.p-ISSN:2395- 0072 [22] Bele, A. D., Suryawanshi, V. K., Sharma, R. A., Deore, M. N. (2020). Cancer disease prediction using machine learning over big data. International Research Journal of Engineering and Technology , 7(3), March 2020. p-ISSN: 2395-0072 [23] Hemavathy, J., Jaya, B., Ananthi, T., Thomas, R. (2018). Disease identification using proteins values and regulatorymodules.InternationalResearchJournalofEngineeringandTechnology(IRJET),5(3),March2018 [24] Singh, S. M., Hanchate, D. B. (2018). Improving disease prediction by machine learning. International Research Journal of Engineering and Technology (IRJET), 5(6), June 2018. [25] Hemavathy, J., Jaya, B., Ananthi, T., Thomas, R. (year). Disease identification using proteins values and regulatory modules.International Research Journal of Engineering and Technology 5(3),2018

Copyright

Copyright © 2025 G K SHARMILA, Gari Sai Ganesh, Nandakumar , Sathish Kumar Gali. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET71742

Publish Date : 2025-05-28

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here