The increasing spread of misinformation on Twitter necessitates effective classification models to distinguish between real and fake content. This research explores the performance of various machine learning models, including Support Vector Machines (SVM), Logistic Regression (LR), Random Forest (RF), and K-Nearest Neighbors (KNN), for classifying Twitter data. To enhance model accuracy and efficiency, multiple hyperparameter optimization techniques, such as Grid Search, Random Search, Bayesian Optimization, and Genetic Algorithm, are employed. A novel Bayesian Optimization with Hyperband (BOHB) approach is proposed to optimize classification performance while reducing computational cost. Experimental results demonstrate that SVM achieves the highest accuracy of 99%, outperforming other models across key performance metrics. The findings highlight the effectiveness of BOHB in improving misinformation detection, providing a robust and scalable solution for enhancing social media content verification.
Introduction
The rapid growth of social media, especially Twitter, has drastically changed how information spreads, but it has also facilitated the spread of misinformation and fake news. Detecting fake news efficiently on such platforms is critical, yet traditional manual or rule-based methods lack scalability. Machine learning models like SVM, Logistic Regression, Random Forest, and KNN offer promising automated solutions for classifying real vs. fake tweets. However, challenges like high-dimensional data and class imbalance require advanced hyperparameter optimization methods such as Grid Search, Random Search, Bayesian Optimization, Genetic Algorithms, and a novel BOHB approach to improve accuracy and reduce training time. This research aims to develop a robust, scalable framework combining these models and optimization techniques to combat misinformation on Twitter.
The literature review highlights various studies that used machine learning and NLP techniques for fake news detection, sentiment analysis, and hate speech detection on Twitter, emphasizing the effectiveness of models like Random Forest and SVM. Several works discuss preprocessing, feature engineering, and hybrid approaches that integrate textual and visual data to improve classification performance. Ensemble models and advanced preprocessing strategies also show promise in enhancing accuracy across different sentiment and misinformation detection tasks.
The methodology section details the process starting from dataset description, text preprocessing, feature extraction using TF-IDF, followed by classification using multiple machine learning algorithms. Hyperparameter tuning with various optimization strategies is employed to boost model performance. The research utilizes a benchmark fake and real news dataset with over 44,000 articles, aiming to identify the most effective model for misinformation detection based on key metrics like accuracy, precision, recall, and F1-score.
Conclusion
This research proposes Bayesian Optimization with Hyperband (BOHB) as an advanced hyperparameter tuning approach to enhance the classification of Twitter data as real or fake. BOHB effectively combines Bayesian Optimization\'s probabilistic model with Hyperband\'s adaptive resource allocation, ensuring an efficient search for optimal hyperparameters while minimizing computational cost. By leveraging BOHB, the models achieve superior performance, with SVM attaining 99% accuracy and Random Forest showing substantial improvements. The proposed BOHB method demonstrates effectiveness in refining ML models for misinformation detection, offering a balance between accuracy and computational efficiency. Future work can explore its integration with deep learning models for real-time analysis.
References
[1] AlJamal, M., Alquran, R., Alsarhan, A. et al. Optimized Novel Text Embedding Approach for Fake News Detection on Twitter X: Integrating Social Context, Temporal Dynamics, and Enhanced Interpretability. Int J Comput Intell Syst 18, 22 (2025).
[2] Eyasudha, J., Seth, P., Usha, G. et al. (2022).Fake Information Analysis and Detection on Pandemic in Twitter. SN COMPUT. SCI. 3, 456. https://doi.org/10.1007/s42979-022-01363y.
[3] Naik, R. R., Gautum, S., Jadeja, A., Joisar, H., & Rathore, N. (2024). Social Media Sentiment Analysis Using Twitter Dataset. In 2024 1st International Conference on Cognitive, Green and Ubiquitous Computing (IC-CGU), Ahmedabad, India.
[4] Maurya, C. G., & Jha, S. K. (2024). Sentiment Analysis: A Hybrid Approach on Twitter Data. In Proceedings of the International Conference on Machine Learning and Data Engineering (ICMLDE 2023), Procedia Computer Science, Elsevier.
[5] Dahiya, P., Jain, R., Sinha, A., Sharma, A., & Kumar, A. (2023). Sentiment Analysis of Twitter Data Using Machine Learning. In 2023 3rd International Conference on Technological Advancements in Computational Sciences (ICTACS), Tashkent, Uzbekistan (pp. 284-290). doi: 10.1109/ICTACS59847.2023.10390062.
[6] Padhy, M., Modibbo, U. M., Rautray, R., Tripathy, S. S., & Bebortta, S. (2024). Application of Machine Learning Techniques to Classify Twitter Sentiments Using Vectorization Techniques. Algorithms, 17(11), 486.
[7] Yendhe, Y. R. S., Kasturi, K. A. K., Jatar, J. R. S., & Patil, A. (2020). Fake News Detection and Sentiment Analysis in Twitter. International Journal of Advance Scientific Research and Engineering Trends (IJASRET), 5(9), 72.
[8] N. Jadhav, P. More, A. Dixit, and A. Sharma, \"Evaluating Public Opinion Through Twitter Sentiment Analysis,\" 2024 2nd International Conference on Networking, Embedded and Wireless Systems (ICNEWS), Bangalore, India, 2024, pp. 1-5.
[9] C. G. Maurya and S. K. Jha, \"Sentiment Analysis: A Hybrid Approach on Twitter Data,\" Proceedings of the International Conference on Machine Learning and Data Engineering (ICMLDE 2023), Procedia Computer Science, vol. XX, pp. XX-XX, 2023. Elsevier. DOI: 10.1016/j.procs.2024.04.094
[10] Glazkova, A. (2023). A Comparison of Text Preprocessing Techniques for Hate and Offensive Speech Detection in Twitter. Social Network Analysis and Mining, 13, 155. https://doi.org/10.1007/s13278-023-01156-y.
[11] Vidyashree, K. P., Rajendra, A. B., Gururaj, H. L., Ravi, V., & Krichen, M. (2024). A Tweet Sentiment Classification Approach Using an Ensemble Classifier. International Journal of Cognitive Computing in Engineering, 5, 170-177.
[12] E. Cano-Marin, M. Mora-Cantallops, and S. Sánchez-Alonso, “Twitter as a predictive system: A systematic literature review,” J. Bus. Res., vol. 157, p. 113561, 2023. doi: 10.1016/j.jbusres.2022.113561.
[13] Padhy, M.; Modibbo, U.M.; Rautray, R.; Tripathy, S.S.; Bebortta, S. Application of Machine Learning Techniques to Classify Twitter Sentiments Using Vectorization Techniques. Algorithms 2024, 17, 486.
[14] Shukla, D., & Dwivedi, S. K. (2024). The Study of the Effect of Preprocessing Techniques for Emotion Detection on Amazon Product Review Dataset. Social Network Analysis and Mining, 14, 191. https://doi.org/10.1007/s13278-024-01352-4.
[15] Padhy, M.; Modibbo, U.M.; Rautray, R.; Tripathy, S.S.; Bebortta, S. Application of Machine Learning Techniques to Classify Twitter Sentiments Using Vectorization Techniques. Algorithms 2024, 17, 486. https://doi.org/10.3390/a17110486.
[16] Yadav, N., Kudale, O., Rao, A., Gupta, S., & Shitole, A. (2021). Twitter Sentiment Analysis Using Supervised Machine Learning. In J. Hemanth, R. Bestak, & J. Chen (Eds.), Intelligent Data Communication Technologies and Internet of Things (Vol. 57, pp. 589–598). Springer, Singapore. https://doi.org/10.1007/978-981-15-9509-7_51.
[17] Ahmad, T.; Faisal, M.S.; Rizwan, A.; Alkanhel, R.; Khan, P.W.; Muthanna, A. Efficient Fake News Detection Mechanism Using Enhanced Deep Learning Model. Appl. Sci. 2022, 12, 1743. https://doi.org/10.3390/app12031743
[18] Folino, F., Folino, G., Guarascio, M. et al. Towards Data- and Compute-Efficient Fake-News Detection: An Approach Combining Active Learning and Pre-Trained Language Models. SN COMPUT. SCI. 5, 470 (2024).
[19] A. Altheneyan and A. Alhadlaq, \"Big Data ML-Based Fake News Detection Using Distributed Learning,\" in IEEE Access, vol. 11, pp. 29447-29463, 2023, doi: 10.1109/ACCESS.2023.3260763.
[20] S. Kumar and B. Arora, \"A Review of Fake News Detection Using Machine Learning Techniques,\" 2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 2021, pp. 1-8, doi: 10.1109/ICESC51422.2021.9532796.
[21] R. R. Rajalaxmi, L. V. N. Prasad, B. Janakiramaiah, C. S. Pavankumar, N. Neelima, and V. E. Sathishkumar, “Optimizing hyperparameters and performance analysis of LSTM model in detecting fake news on social media,” Trans. Asian Low-Resour. Lang. Inf. Process., accepted Jan. 17, 2022.
[22] S. Kumar and B. Arora, \"A Review of Fake News Detection Using Machine Learning Techniques,\" 2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 2021, pp. 1-8, doi: 10.1109/ICESC51422.2021.9532796.
[23] R. R. Rajalaxmi, L. V. N. Prasad, B. Janakiramaiah, C. S. Pavankumar, N. Neelima, and V. E. Sathishkumar, “Optimizing hyperparameters and performance analysis of LSTM model in detecting fake news on social media,” Trans. Asian Low-Resour. Lang. Inf. Process., accepted Jan. 17, 2022.