Humor Detection in text is an important aspect in natural language processing, which can influence applications like content filtering, sentiment, and emotion detection. The study introduces a state-of-the-art model for humor detection from social media text with the assistance of contemporary machine learning and deep learning models. The goal is to mark text as: humorous and non-humorous. Various ML models like RF, LR, and GBM were used with accuracies of 90.63%, 90%, and 84.85% respectively. Deep learning models like CNN and LSTM were used with higher accuracy than the traditional ML methods at 94.43% and 91.5% respectively. To further improve the performance, RoBERTa, a pre-trained transformer model, was fine-tuned for humor detection. With contextual embeddings, RoBERTa performed much better than all its previous counterparts with a high of 98.92% on the Kaggle 200k short jokes dataset. Explainable AI techniques like LIME were also utilized to create interpretability and assign linguistic features influencing predictions. This paper records the improvement in performance gains of DL models, particularly transformer-based models, over other models in the humor detection task. It mentions their use in improving AI systems\' accuracy and credibility in real-world applications like content moderation and social media analysis and setting a new benchmark for automatically detecting humor
Introduction
The rapid growth of social media has led to massive amounts of unstructured natural language data, which is used for tasks like sentiment analysis. However, humor detection remains a challenging problem due to its reliance on figurative language, cultural context, and mixed emotions that can confuse automated systems.
This study focuses on detecting humor in short social media texts by framing it as a binary classification task (humor vs. non-humor). It uses a large, balanced dataset of 200,000 short jokes and compares classical machine learning models (Random Forest, Logistic Regression, Gradient Boosting) with deep learning models (CNN, LSTM) and a fine-tuned transformer model, RoBERTa. RoBERTa achieved the highest accuracy (up to 98.92%) due to its ability to capture deep contextual and linguistic nuances.
The study also employs Particle Swarm Optimization (PSO) to fine-tune model hyperparameters, further improving performance. Explainable AI techniques like LIME and SHAP are integrated to enhance transparency and interpretability, helping to understand which features influence humor detection.
Results show CNN, especially after PSO optimization, performs strongly with 94.43% accuracy, but RoBERTa remains the top performer overall. The use of XAI methods provides critical insights into model decisions, ensuring trustworthiness.
Key contributions:
Combining ML, DL, and transformer models for humor detection.
Incorporating Explainable AI for interpretability.
This research offers an in-depth comparative study of models of humor detection, with emphasis on deep learning and machine learning methods based on social media datasets. The conventional machine learning algorithms like Logistic Regression (LR), Random Forest (RF), and Gradient Boosting Machine (GBM) performed at the moderate level of accuracy. However, deep learning models like Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) outperformed the classical ones, as they can capture both spatial and sequential relationships in text more effectively. The combination of RoBERTa with CNN achieved superior results compared to all other models, attaining the highest accuracy and F1-score. This model\'s superiority can be attributed to its context-aware tokenization, ability to model long-range dependencies, and the advantages gained from pre-training on extensive datasets. The introduction of Explainable AI (XAI) was beneficial in finding out the predominant linguistic patterns related to humor classification. This results in better explainability and interpretability of models. The presented transformer-based models were highly competent in humor classification and outperform conventional ML/DL approaches, achieving higher accuracy and generalizing better. Although the RoBERTa-based classification model is proven more effective at humor detection, there are still many scopes where it can be further improved to develop its effectiveness. The multimodal analysis that encompasses the textual, images, and audio can be used to develop a better understanding of humor from textual cues only. An alternative approach is to employ few-shot and zero-shot learning methods to enhance performance on datasets with scarce labeled examples, enabling the model to better generalize across various humor styles and language differences.
Another important direction is cross-linguistic and cultural adaptation, as humor varies significantly across languages and cultural contexts. Expanding the study to multilingual humor detection using pre-trained multilingual transformers can enhance model applicability. Furthermore, real-time deployment of humor detection models in social media applications requires optimizing computational efficiency without compromising accuracy. The other important aspect is addressing bias and fairness issues in humor classification, so that the model does not reinforce stereotypes or unintended biases in humor recognition. Advances in these areas can make humor detection systems more robust, interpretable, and applicable to real-world scenarios.
References
[1] R. Aron and J. Godara, “Analysis of Classification Based Sentiment Analysis Techniques,” Think India Journal, vol. 22, no. 30, pp. 843–849, 2019.
[2] D. Li, R. Rzepka, M. Ptaszynski, and K. Araki, “HEMOS: A novel deep learning-based fine-grained humor detecting method for sentiment analysis of social media,” Information Processing & Management, vol. 57, no. 6, p. 102290, 2020.
[3] P.-Y. Chen and V.-W. Soo, “Humor recognition using deep learning,” in Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: Human language technologies, volume 2 (short papers), 2018, pp. 113–117. Accessed: Oct. 08, 2024. [Online]. Available: https://aclanthology.org/N18-2018/
[4] T. Winters and P. Delobelle, “Dutch humor detection by generating negative examples,” arXiv preprint arXiv:2010.13652, 2020, Accessed: Oct. 08, 2024. [Online]. Available: https://arxiv.org/abs/2010.13652
[5] L. De Oliveira and A. L. Rodrigo, “Humor detection in yelp reviews,” Retrieved on December, vol. 15, p. 2019, 2015.
[6] M. Bedi, S. Kumar, M. S. Akhtar, and T. Chakraborty, “Multi-modal sarcasm detection and humor classification in code-mixed conversations,” IEEE Transactions on Affective Computing, vol. 14, no. 2, pp. 1363–1375, 2021.
[7] F. Barbieri and H. Saggion, “Automatic Detection of Irony and Humour in Twitter.,” in ICCC, 2014, pp. 155–162. Accessed: Oct. 07, 2024. [Online]. Available: https://computationalcreativity.net/iccc2014/wp-content/uploads/2014/06/9.2_Barbieri.pdf
[8] R. Zhang and N. Liu, “Recognizing Humor on Twitter,” in Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, Shanghai China: ACM, Nov. 2014, pp. 889–898. doi: 10.1145/2661829.2661997.
[9] L. Chen and C. M. Lee, “Convolutional neural network for humor recognition,” arXiv preprint arXiv:1702.02584, 2017, Accessed: Oct. 08, 2024. [Online]. Available:
https://www.researchgate.net/profile/Chong-Min-Lee/publication/313519600_Convolutional_Neural_Network_for_Humor_Recognition/links/58da70a0aca272d801dc51e8/Convolutional-Neural-Network-for-Humor-Recognition.pdf
[10] J. Mao and W. Liu, “A BERT-based Approach for Automatic Humor Detection and Scoring.,” IberLEF@ SEPLN, vol. 2421, pp. 197–202, 2019.
[11] A. Jaiswal, A. Mathur, and S. Mattu, “Automatic humour detection in tweets using soft computing paradigms,” in 2019 international conference on machine learning, big data, cloud and parallel computing (comitcon), IEEE, 2019, pp. 172–176. Accessed: Oct. 08, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/8862259/
[12] D. Bertero and P. Fung, “A long short-term memory framework for predicting humor in dialogues,” in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016, pp. 130–135. Accessed: Oct. 08, 2024. [Online]. Available: https://aclanthology.org/N16-1016.pdf
[13] O. Weller and K. Seppi, “The rJokes Dataset: a Large Scale Humor Collection,” in Proceedings of the Twelfth Language Resources and Evaluation Conference, N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, and S. Piperidis, Eds., Marseille, France: European Language Resources Association, May 2020, pp. 6136–6141. Accessed: Oct. 08, 2024. [Online]. Available: https://aclanthology.org/2020.lrec-1.753
[14] A. Kumar, S. Dikshit, and V. H. C. Albuquerque, “Explainable Artificial Intelligence for Sarcasm Detection in Dialogues,” Wireless Communications and Mobile Computing, vol. 2021, no. 1, p. 2939334, Jan. 2021, doi: 10.1155/2021/2939334.
[15] X. Fan et al., “Humor detection via an internal and external neural network,” Neurocomputing, vol. 394, pp. 105–111, 2020.