The rise of online job platforms has been accompanied by an increasing number of fraudulent job postings, posing significant risks to job seekers in terms of privacy, financial security, and mental well-being. This paper presents an AI-powered Fake Job Detection system designed to automatically distinguish between genuine and fraudulent job postings using a supervised machine learning approach. We employ the publicly available fake_job_postings.csv dataset, which contains both structured and unstructured data, including job titles, locations, descriptions, and company profiles. After extensive preprocessing—such as text cleaning, feature extraction using TF-IDF for unstructured data, and handling missing values—we trained and evaluated multiple classification models, including Random Forest, Support Vector Machine (SVM), Logistic Regression, and XGBoost. Additionally, we implemented an ensemble Voting Classifier to leverage the strengths of individual models. Among these, the Random Forest Classifier demonstrated the highest performance in terms of accuracy, precision, recall, and F1-score. The final system provides a \'fakeness score\' by computing the prediction probability, offering users a clear and interpretable output. Furthermore, we developed a web-based interface that enables users to input job postings and receive real-time predictions on their legitimacy. Our system not only improves trust in online recruitment but also offers a scalable solution to combat job fraud using AI and natural language processing. The results highlight the potential of machine learning in mitigating online employment scams and provide a foundation for future enhancements, such as multilingual support and adaptive learning based on evolving fraud patterns.
Introduction
In the digital era, online job portals have simplified recruitment but also led to a rise in fraudulent job postings, which can cause financial loss, data theft, and phishing attacks. Manual verification of such postings is inefficient due to the volume and complexity of job data, necessitating automated detection systems.
This study proposes a Fake Job Detection system using supervised machine learning and natural language processing (NLP). The model is trained on the publicly available fake_job_postings.csv dataset, which includes structured features (e.g., job title, location, salary) and unstructured text (e.g., job description, company profile). Data preprocessing involved cleaning, handling missing values, label encoding, and NLP techniques like tokenization, stop-word removal, and TF-IDF vectorization.
Multiple machine learning models—Logistic Regression, SVM, Random Forest, and XGBoost—were trained, with Random Forest achieving the best standalone performance. An ensemble Voting Classifier further improved results. The system outputs a "fakeness score," indicating the probability of a job posting being fake.
A web interface was developed to allow users to input job details and receive real-time predictions, enhancing safety and trust in online recruitment. The ensemble model achieved high metrics (Accuracy 96.8%, F1-score 0.965, AUC-ROC 0.981), showing robustness and reliability.
Literature Review: Prior approaches relied on rule-based methods or traditional ML models, which lacked scalability. Modern approaches use NLP, ensemble learning, and probability-based scores for transparency and better performance.
Methodology: The research followed a structured pipeline: data acquisition → preprocessing → feature engineering → model development → ensemble integration → evaluation → deployment. Extensive testing (unit, integration, functional, edge-case, and performance) ensured reliability.
Future Improvements: Suggestions include real-time data updates, transformer-based NLP (BERT), deep learning models (RNN/LSTM), user behavior analytics, multilingual support, and browser/mobile integrations to enhance scalability, accuracy, and usability.
Conclusion
In this research, we presented a comprehensive machine learning-based approach for detecting fake job postings, addressing a growing concern in today’s digital recruitment landscape. By following a structured workflow—comprising data acquisition, exploratory data analysis, preprocessing, feature engineering, model development, and deployment—we successfully built a robust detection system capable of distinguishing between genuine and fraudulent job advertisements. Our implementation of various classification algorithms, combined with ensemble techniques such as a Voting Classifier, significantly enhanced the accuracy and reliability of the model.The results demonstrate that leveraging natural language processing (NLP) and machine learning can effectively mitigate the risks associated with fake job listings, thereby contributing to a safer online job-seeking environment. Future work may involve integrating real-time data streams, improving the model with deep learning methods, and deploying the system as a scalable web application accessible to job seekers. Overall, our study highlights the potential of AI-driven solutions in combating digital employment fraud and promoting trust in online job markets.
References
[1] Vijaya Bhaskar Reddy B., Bhaskar Reddy B., Naga Shanthi G., & Deepthi A. (2025). Fake Job Recruitment Detection Using Machine Learning Approach. International Journal of Engineering Trends and Technology, 68(4).
[2] Pillai, A. S. (2023). Detecting Fake Job Postings Using Bidirectional LSTM. arXiv preprint arXiv:2304.02019.
[3] Boka, M. (2024). Predicting Fake Job Posts Using Machine Learning Models. SSRN.
[4] Hanif, A. H. M., Maarop, N., Kamaruddin, N., & Samy, G. N. (2024). Machine Learning Approach in Predicting Fraudulent Job Advertisement. International Journal of Academic Research in Business and Social Sciences, 14(1).
[5] Kumar, P., & Sharma, R. (2024). Fake Job Detection and Analysis Using Machine Learning and Deep Learning Algorithms.
[6] Chintu, D., & Ganesh, D. (2024). Research Insights into Fake Job Detection with Machine Learning. International Journal of Research Publication and Reviews.
[7] Sowmya, A., Anitha, B., Srivalli, D., & Snidhuja, B. (2023). Prediction of Fake Job Ad using NLP-based Multilayer Perceptron. Turkish Journal of Computer and Mathematics Education, 14(1), 296-310.
[8] Alandjani, G. O. (2022). Online Fake Job Advertisement Recognition and Classification Using Machine Learning. 3c TIC: Cuadernos de Desarrollo Aplicados a las TIC, 11(1), 251-267.
[9] Gao, P., & Zhang, L. (2023). Cloud Recruitment False Information Detection Method Based on Entity Bias and BERT-BiLSTM. In 3rd International Conference on Digital Economy and Computer Application (DECA 2023) (pp. 541-547). Atlantis Press.
[10] Yamashita, M., Tran, T., & Lee, D. (2024). Fake Resume Attacks: Data Poisoning on Online Job Platforms. arXiv preprint arXiv:2402.14124.
[11] Tolosana, R., Vera-Rodriguez, R., Fierrez, J., Morales, A., & Ortega-Garcia, J. (2020). DeepFakes and Beyond: A Survey of Face Manipulation and Fake Detection.
[12] Verma, R. M., Dershowitz, N., Zeng, V., & Liu, X. (2022). Domain-Independent Deception: Definition, Taxonomy and the Linguistic Cues Debate.
[13] Kianpour, M., & Wen, S.-F. (2020). Adversarial Machine Learning.
[14] Anita, B. (2022). Detection of Fake Job Advertisements Using Machine Learning Algorithms. Journal of Artificial Intelligence and Capsule Networks, 4(3), 200-210.
[15] Nasser, I., & Alzaanin, A. H. (2020). Machine Learning and Job Posting Classification: A Comparative Study. International Journal of Engineering and Information Systems.145, 2017.