Online recruitment sites have made job searching more efficient. Nevertheless, the increase in the use of these sites has also led to an increase in fraudulent job postings that may cause financial loss as well as privacy risks for job seekers. This paper proposes an intelligent system for detecting fraudulent job advertisements using transformer-based language models and deep learning techniques. A comprehensive dataset was created by combining job postings from three different sources to enhance diversity and reflect real-world scenarios. Exploratory data analysis showed a significant class imbalance between legitimate and fraudulent jobs. In an attempt to deal with this problem, a Synthetic Minority Over-sampling Technique (SMOTE-based) oversampling method (SMOBD - Synthetic Minority Oversampling Based on Samples Density variant) was used to create artificial samples of the minority group. The Bidirectional Encoder Representations from Transformers (BERT) model was used to obtain contextual embeddings by processing job descriptions. These embeddings were subsequently taken as input features into a deep neural network classifier that does the actual fraud detection. Experimental results show that the proposed system achieved high accuracy and balanced accuracy, 86.6% and 80.9% respectively, which is a good result in the detection of fraudulent job postings due to highly imbalanced data. An interface built on Streamlit was also created to enable the user to add job descriptions and immediately identify possible scams. The suggested scheme will offer a viable remedy to the issue of enhanced security and trust in internet-based recruitment websites and will help users identify fraudulent job postings more efficiently.
Introduction
This research presents an AI-based Online Job Fraud Detection System that uses BERT (Bidirectional Encoder Representations from Transformers) and a Deep Neural Network (DNN) to identify fraudulent job postings. The increasing popularity of online recruitment platforms has also led to a rise in fake job advertisements that deceive job seekers through scams, requests for personal information, or fraudulent fees. Traditional detection methods based on manual filtering and conventional machine learning algorithms often fail to capture the contextual meaning of job descriptions, making them less effective against sophisticated scams.
To overcome these limitations, the proposed system combines BERT-based contextual text embeddings with a deep neural network classifier for improved semantic understanding of job descriptions. Since fraudulent postings are much fewer than legitimate ones, the study addresses the class imbalance problem using the SMOTE-SMOBD oversampling technique, which generates synthetic minority-class samples and improves the model's ability to detect fraudulent jobs. The dataset is created by merging multiple public job posting datasets from different countries, including the Fake Job Postings, US Job Postings, and Pakistan Job Postings datasets, resulting in greater diversity and better model generalization.
The methodology includes data preprocessing, text cleaning, BERT-based feature extraction, class balancing with SMOTE-SMOBD, and classification using a deep neural network with fully connected layers, dropout, early stopping, and learning rate scheduling. The trained model is deployed as a Streamlit web application, allowing users to input job descriptions and receive real-time predictions indicating whether a posting is legitimate or fraudulent, along with confidence scores.
The model is evaluated using accuracy, precision, recall, F1-score, and confusion matrix analysis. Experimental results achieve an accuracy of 86.6%, recall of 80.9%, precision of 8.5%, and an F1-score of 15.4%. Although precision is relatively low, the high recall demonstrates the model's effectiveness in identifying fraudulent job postings, which is particularly important in fraud detection to minimize missed scams. Overall, the proposed BERT and SMOTE-SMOBD framework provides an effective solution for enhancing trust and security in online recruitment by improving the detection of fraudulent job advertisements while offering a practical real-time web-based application.
Conclusion
This study has suggested a transformer-based system that can be used to detect fraudulent job postingsusing BERT contextual embeddings and a deep neural network classifier.
To enhance diversity and capture more practical recruitment situations, the dataset was compiled by combining several real-world job posting datasets into one. Preprocessing of data involved cleaning up of textual data and ensuring data consistency.
Considering that the number of fraudulent job postings is much lower in comparison with the legitimate job posts, the issue of class imbalance was solved with SMOTE-based oversampling methods. SMOBD variant was implemented to create the synthetic samples of the minority class, which enhances the levels of samples representativeness of the fraudulent job posting and increases the model learning ability.
The outcomes of the experiment prove that the suggested model is effective in the detection of fraudulent job postings. The model yielded an accuracy of 0.866 and recall of 0.809 which implies that the majority of the fraudulent job postings were correctly identified. In a fraud detection system, high recall is of special value since any failure to detect fraudulent posts would expose job seekers to financial and personal information risks.
Contextualization of job descriptions with the incorporation of BERT is an enhancement of the traditional machine learning strategies. The created web application on the basis of Streamlit enables the users to input job descriptions and receive real-time forecasts on whether a job opening is a legitimate or a fraud.
The system suggested will be a practical and efficient method of detecting job scams on the Internet recruitment websites. The findings show that transformer-based model performance coupled with the use of SMOTE-SMOBD data balancing enhances the ability to detect fraud and mitigate the effects of the imbalanced class distribution.
To enhance the performance of the system in the future, it can be supplemented with new datasets, hyperparameters, and advanced transformer architectures, including RoBERTa or DistilBERT can be tested. The system can further be expanded in the form of a browser extension or programmed into job portal to give scam warnings to users in real time.
References
[1] S. Vidros, C. Kolias, G. Kambourakis, and L. Akoglu, “Automatic detection of online recruitment frauds: Characteristics, methods, and a public dataset,” Future Internet, vol. 9, no. 1, pp. 1–14, Mar. 2017.
[2] S. Dutta and S. K. Bandyopadhyay, “Fake job recruitment detection using machine learning approach,” International Journal of Engineering Trends and Technology, vol. 68, no. 4, pp. 48–53, Apr. 2020.
[3] B. Alghamdi and F. Alharby, “An intelligent model for online recruitment fraud detection,” Journal of Information Security, vol. 10, no. 3, pp. 155–176, 2019.
[4] S. Lal, R. Jaiswal, N. Sardana, A. Verma, A. Kaur, and R. Mourya, “ORFDetector: Ensemble learning-based online recruitment fraud detection,” Proceedings of the 12th International Conference on Contemporary Computing (IC3), 2021.
[5] N. Nasser, M. Habiba, and A. Rauf, “Online recruitment fraud detection using artificial neural networks,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 5, pp. 321–330, May 2021.
[6] M. Habiba, A. Yousaf, and M. Imran, “A comparative study of machine learning and deep learning algorithms for fake job detection,” International Journal of Data Mining and Knowledge Management Process, vol. 11, no. 3, pp. 25–37, 2021.
[7] R. Nindyati and A. Nugraha, “Indonesian Employment Scam Detection Dataset (IESD): Context-based behavioral features for online job scam detection,” Indonesian Journal of Computing and Cybersecurity, vol. 5, no. 2, pp. 75–85, 2023.
[8] N. Akram, R. Irfan, A. S. Al-Shamayleh, A. Kousar, A. Qaddos, M. Imran, and A. Akhunzada, “Online Recruitment Fraud (ORF) Detection Using Deep Learning Approaches,” IEEE Access, vol. 12, pp. 109388–109405, Aug. 2024.
[9] Fake Job Postings Dataset (EMSCAD), Available:https://www.kaggle.com/datasets/shivamb/real-or-fake-fake-jobposting-prediction
[10] US Job Postings Dataset, Available: https://www.kaggle.com/datasets/promptcloud/us-job-postings
[11] Pakistan Job Postings Dataset, Available: https://www.kaggle.com/datasets/umerhaddii/pakistan-job-postings