The application of data mining techniques to educational datasets is gaining increasing attention due to the growing availability of student-related information. However, organizing and interpreting this data effectively poses a significant challenge because of its high dimensional and complexity. This study explores the use of the Linear Support Vector Classifier (Linear - SVC), SVM, and naive bias known for its computational efficiency and robustness, in categorizing educational data. The model\'s output can reveal actionable insights into student performance, offering valuable support for real-time academic assessments. Additionally, it holds potential for informing future strategies related to student admissions and selection processes in higher education institutions.
Introduction
The growing accumulation of student data offers valuable insights but presents challenges for manual analysis. Automated classification techniques, especially in educational data mining (EDM), help efficiently analyze such data to support decision-making and improve educational outcomes. Among various machine learning methods, Linear Support Vector Classifier (Linear-SVC) is favored for text classification due to its simplicity, speed, and computational efficiency, even if it is sometimes less accurate than more complex models.
This study focuses on using Linear-SVC to classify student-related data and compares its performance with other models like Support Vector Machines (SVM). SVMs are powerful classifiers that separate data into categories by finding the optimal dividing boundary, and use kernel functions to handle complex, nonlinear data.
The practical problem addressed involves classifying students into reservation quota categories (such as SC, ST, OBC) based on attributes like sub-caste and academic scores, using a dataset of 152 entries from the University of Rajasthan’s Centre of Converging Technologies. The dataset includes various categorical attributes. The goal is to evaluate how effectively these models can classify students according to reservation criteria, aiding in fair educational resource allocation.
Conclusion
The analysis of the results suggests that the linear SVC algorithm be seen as the the classification medium for the small data set. However, by employing different classifiers, such as Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), etc., in less time, the percentage classified can still be increased. In the future, it is suggested that the same process be used, with performance comparisons with other classifiers for larger data sets planned. It is also possible to use mass estimation for similarity measures when processing data in parallel.
References
[1] Dangi, S. Srivastava, Multi-Class Sentiment Analysis Comparison Using Support Vector Machine (SVM) and BAGGING Technique-An Ensemble Method, In July 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE) pp. 1-6
[2] A. Dangi, S. Srivastava,, An application of student data to forecast education results of student by using classification techniques, In July 2020 Journal of Critical Reviews (JCR), SCOPUS pp. 3339-3343
[3] G. A. Verma and M. Kumari, \"Prediction of Students’ Performance Using Machine Learning Techniques,\" in Proc. 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 2020, pp. 689–693.
[4] R. Ahmad, S. A. Malik, and M. Hussain, \"Machine Learning Techniques for Predicting Academic Performance: A Case Study of a Pakistani University,\" in Proc. 2021 4th International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, Pakistan, 2021, pp. 1–6
[5] Ghosh and S. Dey, \"Predicting Students\' Academic Performance Using Linear SVC and SMOTE for Imbalanced Dataset,\" in Proc. 2021 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 2021, pp. 1–6.