Statistical and Exploratory Analysis of Student Academic Performance Using Socio-Demographic Factors

Authors: Jitendra Kumar Gupta, Abhinav Shukla , Vanita Jain, Ayush Kumar Agrawal

DOI Link: https://doi.org/10.22214/ijraset.2026.78059

Abstract

Educational data analysis plays a significant role in evaluating student achievement and improving academic decision-making processes. This study proposes a structured Student Performance Analysis Framework that integrates mathe- matical modeling, percentage computation, grading classification, and descriptive statistical evaluation. The dataset consists of student scores in Mathematics, Reading, and Writing, from which overall academic percentage is computed using an average- based formulation. A systematic grading function is applied to categorize performance into standardized grade levels. Statisti- cal analysis indicates an overall mean percentage of 67.77%, reflecting moderate academic performance across the dataset. Among the subjects, Reading records the highest average score (69.16), followed by Writing (68.05) and Mathematics (66.08). Standard deviation analysis reveals moderate variability, partic- ularly in Mathematics and Writing. Visualization of subject-wise averages supports comparative interpretation of performance trends. The proposed framework provides a transparent and scalable analytical model that can assist educational institutions in monitoring student progress and identifying areas requiring targeted academic intervention.

Introduction

Educational data analysis plays an important role in modern academic systems by helping institutions understand student learning patterns, performance gaps, and the impact of socio-demographic factors. Factors such as parental education, nutrition, gender, and participation in preparatory programs can influence academic outcomes. This study uses Educational Data Mining (EDM) and statistical techniques to analyze student performance and support data-driven academic planning.

The dataset used in the research contains 1000 student records with demographic attributes and scores in Mathematics, Reading, and Writing. Since the dataset has no missing values, it provides reliable data for analysis. The main objectives of the study are to analyze subject-wise performance, develop a percentage-based grading system, examine socio-demographic influences, and visualize academic trends.

A mathematical performance model is developed to convert raw scores into a standardized academic performance indicator. Each student’s overall percentage is calculated using the arithmetic mean of scores in Mathematics, Reading, and Writing, giving equal weight to all subjects. A grading function then categorizes students into performance levels such as Outstanding (O) to Fail (F) based on their percentage. Statistical measures such as mean and standard deviation are used to analyze the overall dataset performance and measure variability among students. Additionally, a pass–fail rule ensures that students must achieve a minimum score (35) in each subject to pass.

The proposed methodology follows four stages: data collection, data preprocessing, percentage and grading computation, and statistical analysis. Since all scores are within the valid range and no missing values exist, minimal preprocessing was required. Descriptive statistics were then applied to summarize the dataset.

The statistical analysis shows that Reading has the highest average score (69.16), followed by Writing (68.05) and Mathematics (66.08). The overall average percentage is 67.77, indicating relatively balanced performance across subjects. Standard deviation values around 14–15 suggest moderate variability in student performance. Mathematics and Writing show slightly higher variability than Reading, indicating greater differences in performance levels among students.

Visualization of the results confirms that students perform slightly better in Reading, while Mathematics has the lowest average score, though the differences between subjects are small. Overall, the study demonstrates that combining mathematical modeling, statistical analysis, and visualization provides a systematic way to evaluate student performance and can be extended to predictive analytics and machine learning for educational improvement.

Conclusion

The analysis reveals that demographic and socio-economic factors significantly influence academic outcomes. Students completing test preparation courses and those with higher parental education levels demonstrate better academic per- formance. Nutritional factors, indicated by lunch type, also contribute to learning effectiveness. The statistical grading model provides a structured approach for academic evaluation and monitoring.

References

[1] R. S. Baker and K. Yacef, “The state of educational data mining in 2009: A review and future visions,” Journal of Educational Data Mining, vol. 1, no. 1, pp. 3–17, 2009. [2] C. Romero and S. Ventura, “Educational data mining: A review of the state of the art,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 40, no. 6, pp. 601–618, Nov. 2010. [3] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed., Morgan Kaufmann, 2012. [4] S. B. Kotsiantis, “Educational data mining: A case study for predicting student dropout,” Int. J. Artif. Intell. Appl., vol. 3, no. 2, pp. 1–14, 2012. [5] P. Cortez and A. Silva, “Using data mining to predict secondary school student performance,” in Proc. 5th Int. Conf. Predictive Models in Educ., 2008, pp. 5–12. [6] M. Z. Alam, M. R. Islam, and M. R. Ahmed, “A hybrid machine learning approach for predicting student academic performance,” Educ. Inf. Technol., vol. 26, pp. 567–586, 2021. [7] S. S. D. Xu, Y. Wang, and J. Liu, “Performance prediction in education using ensemble learning,” IEEE Access, vol. 8, pp. 112789–112799, 2020. [8] H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1263–1284, 2009. [9] L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. [10] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995. [11] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1798–1828, Aug. 2013. [12] J. Platt, “Probabilistic outputs for support vector machines and compar- isons to regularized likelihood methods,” in Advances in Large Margin Classifiers, MIT Press, 1999, pp. 61–74. [13] A. Ge´ron, Hands-On Machine Learning with Scikit-Learn and Tensor- Flow, O’Reilly Media, 2019. [14] E. Alpaydin, Introduction to Machine Learning, 3rd ed., MIT Press, 2014. [15] A. M. Shahiri, W. Husain, and N. A. Rashid, “A review on predicting student performance using data mining techniques,” Procedia Comput. Sci., vol. 72, pp. 414–422, 2015. [16] J. Pen˜a-Ayala, “Educational data mining: A survey and a data mining- based analysis of recent works,” Expert Syst. Appl., vol. 41, pp. 1432–1462, 2014. [17] K. Polyzou and G. Karypis, “Feature extraction for next-term prediction of student performance,” IEEE Trans. Learn. Technol., vol. 12, no. 2, pp. 237–248, 2019. [18] N. Thai-Nghe, L. Drumond, and T. Horva´th, “Predicting student perfor- mance using personalized models,” User Model. User-Adapted Interact., vol. 21, pp. 299–336, 2011. [19] A. K. Sharma and M. J. Singh, “Data analytics approach for student performance evaluation,” Int. J. Educ. Dev. using ICT, vol. 14, no. 3, pp. 45–56, 2018. [20] S. Aggarwal, “Performance analysis of machine learning techniques in educational data,” J. Inform. Optim. Sci., vol. 40, no. 2, pp. 369–380, 2019.

Copyright

Copyright © 2026 Jitendra Kumar Gupta, Abhinav Shukla , Vanita Jain, Ayush Kumar Agrawal . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET78059

Publish Date : 2026-03-09

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here