Accurate salary prediction is essential for informed decision-making in various industries. This study applies ensemble learning techniques using five models—Decision Tree, Logistic Regression, XGBoost, LightGBM, and Random Forest—to predict salary levels through binary classification. By combining these models, we enhance prediction accuracy, stability, and reliability. The models are evaluated using accuracy, precision, recall, and F1-score. Results show that ensemble learning significantly improves prediction performance, offering a more reliable approach to salary forecasting. This method provides valuable insights for practical applications in salary prediction and human resource management. Additionally, the findings suggest that the ensemble approach can be applied to other classification tasks beyond salary prediction. The study also highlights the importance of model selection and evaluation metrics in optimizing performance. Overall, the research contributes to the growing field of machine learning applications in human resources and compensation management.
Introduction
Accurate salary prediction is crucial for promoting transparency, fairness, and efficiency in the labor market. Recent research leverages machine learning (ML) techniques using socio-demographic and professional features to model and forecast salaries, revealing disparities and enabling data-driven strategies for equity. Ensemble methods and deep learning models, such as Random Forest, XGBoost, LightGBM, and Deep Neural Networks, have demonstrated superior accuracy and robustness compared to traditional approaches.
The study involves preprocessing a diverse dataset (demographics, experience, job details), recategorizing salaries into three classes (low, medium, high), and applying multiple models—Decision Tree, Logistic Regression, Random Forest, XGBoost, and LightGBM—for classification. Performance is evaluated using accuracy, precision, recall, and F1-score metrics.
Results show ensemble methods significantly outperform individual models. Among them, XGBoost achieves the highest accuracy (~83%), precision, recall, and F1-score, making it the most effective for salary prediction. Random Forest and Decision Tree also perform well, while Logistic Regression lags behind due to its limited ability to capture non-linear patterns.
Visual analyses of confusion matrices and ROC curves confirm these findings, highlighting the strength of gradient boosting techniques in modeling complex relationships in salary data. Overall, integrating ensemble learning improves predictive accuracy and fairness in salary forecasting, supporting better human resource decisions.
Conclusion
In conclusion, when comparing the four models—Logistic Regression, Decision Tree, Random Forest, and XGBoost—for binary classification based on salary prediction, XGBoost clearly stands out as the most effective. Logistic Regression, being a linear model, falls short in handling complex patterns, resulting in lower accuracy and a less favourable ROC curve. The Decision Tree improves upon this by capturing non-linear relationships, offering better performance with fewer misclassifications. Random Forest enhances the results further by using multiple decision trees to reduce overfitting and increase stability, leading to a well-balanced and accurate model. However, XGBoost outperforms all others with the highest accuracy, precision, recall, and the best ROC curve. Its gradient boosting mechanism allows it to learn from mistakes and refine predictions efficiently. Therefore, XGBoost is the most powerful, accurate, and reliable model for this binary classification task
References
[1] Satpute, Babasaheb S., Raghav Yadav, and Pramod K. Yadav. \"Machine Learnig Approach for Prediction of Employee Salary using Demographic Information with Experience.\" 2023 4th IEEE Global Conference for Advancement in Technology (GCAT). IEEE, 2023.
[2] Kaya, Rukiye, Mehtap Saatçi, and Mehmet Gökhan Bakal. \"Improving Salary Offer Processes With Classification Based Machine Learning Models.\" 2024 8th International Artificial Intelligence and Data Processing Symposium (IDAP). IEEE, 2024.
[3] Aminu, Habibu, et al. \"Salary Prediction Model using Principal Component Analysis and Deep Neural Network Algorithm.\" International Journal of Innovative Science and Research Technology 8.12 (2023): 1-11
[4] Wang, Guanqi. \"Employee Salaries Analysis and Prediction with Machine Learning.\" 2022 International Conference on Machine Learning and Intelligent Systems Engineering (MLISE). IEEE, 2022.
[5] Hussain, Jawad. \"Employee Salary Prediction in HRMS Using Regression Models.\" Journal of Innovative Computing and Emerging Technologies 4.2 (2024).
[6] Matbouli, Yasser T., and Suliman M. Alghamdi. \"Statistical machine learning regression models for salary prediction featuring economy wide activities and occupations.\" Information 13.10 (2022): 495.
[7] Joshi, Manisha, Savita Bhosale, and Vishwesh A. Vyawahare. \"Using fractional derivative in learning algorithm for artificial neural network: Application for salary prediction.\" 2022 IEEE Bombay Section Signature Conference (IBSSC). IEEE, 2022.
[8] Feng, Ziyuan, Zixian Liu, and Yibo Yin. \"Comparison of deep-learning and conventional machine learning algorithms for salary prediction.\" Applied and Computational Engineering 6 (2023): 643-651.
[9] Das, Sayan, Rupashri Barik, and Ayush Mukherjee. \"Salary prediction using regression techniques.\" Proceedings of Industry Interactive Innovations in Science, Engineering & Technology (I3SET2K19) (2020).
[10] Viroonluecha, Phuwadol, and Thongchai Kaewkiriya. \"Salary predictor system for thailand labour workforce using deep learning.\" 2018 18th International Symposium on Communications and Information Technologies (ISCIT). IEEE, 2018.
[11] Khongchai, Pornthep, and Pokpong Songmuang. \"Implement of salary prediction system to improve student motivation using data mining technique.\" 2016 11th International Conference on Knowledge, Information and Creativity Support Systems (KICSS). IEEE, 2016.