Forcible land grabs have become a chronic governance problem in Tamil Nadu, with landowners often induced into transfer deeds through coercion, forgery, or official compulsion. The currently existing e-Governance verification process relies upon manual presence verification, proof of identity, and wet signature—all features that fail to capture any form of Behavioural coercion signs. None of the previous risk assessment frameworks focus on Tamil-speaking landowners or take the linguistic context of present-day informal Tamil into account. This study introduces a multimodal behavioural analytics framework that enhances the current land registration pipeline with an extra consent verification phase. It collects video of the registration interaction (and concurrently extracts audio), running separate video stream-based (CNN) and audio stream-based (Transformer-based NLP) models for independent scoring of the emotional stress and vocal distress factors, respectively, and subsequently fusing them using weighted late fusion to produce a coercion risk score Good/Average/Poor. Experiments on an original Tamil-speaking land transaction dataset give an accuracy of 91.4% for classification with a precision of 0.903 and recall of 0.924. The designed system acts as an augmentation—an extra layer to the registration pipeline to increase accountability—and doesn\'t aim at replacing the existing process.
Introduction
The document proposes a multimodal AI system to detect coerced consent in Tamil Nadu land registration processes, addressing a critical limitation of current e-Governance systems that can verify signatures but cannot detect whether consent is freely given or forced.
Core Problem
Land registration in India is vulnerable to coercion through threats, pressure, or manipulation, but existing digital systems cannot detect this because they only capture formal consent. This leads to legal disputes that are difficult to resolve with current verification methods.
Proposed Solution
The authors introduce a real-time decision-support system that runs as an overlay on the existing land registration workflow. It analyzes a short video of the registrant using two modalities:
Facial emotion analysis (CNN/ResNet-based) to detect stress-related expressions
Tamil speech analysis (Wav2Vec + Tamil-BERT) to detect linguistic and behavioral markers of coercion
These outputs are combined using a weighted late-fusion model to produce a final coercion risk score, which assists (but does not replace) the registration officer’s decision.
Key Modules
Facial Emotion Module: Extracts stress indicators (fear, anger, disgust) from video frames.
Tamil Speech Module: Converts speech to text and detects coercion-related patterns in Tamil language using NLP models.
Fusion Engine: Combines facial and speech scores (speech weighted slightly higher) into a 3-level risk classification: Good, Average, Poor.
Decision System: Flags cases for approval, delayed review, or human intervention.
Dataset & Training
A new dataset of 312 simulated Tamil land registration interactions was created with legal expert annotation.
Scenarios include voluntary consent to severe coercion.
Models trained on ResNet-50, Wav2Vec 2.0, and Tamil-BERT variants.
Results
The proposed system achieves 91.4% accuracy, outperforming:
Facial-only model (~79.8%)
Speech-only model (~83.2%)
Existing multimodal baseline (~85.6%)
Fusion significantly improves performance, showing that facial and speech signals complement each other.
Key Findings
Speech provides stronger coercion signals than facial expressions in this setting.
Multimodal fusion reduces false detections and improves robustness.
Some errors occur in elderly participants due to natural speech/emotion variations.
Conclusion
The proposed system utilizes a multimodal Behavioural analytics framework to detect coercion within the e-Governance land registration process in Tamil Nadu. By leveraging a CNN for facial emotion detection and a transformer-based NLP module trained on a specialized Tamil dataset, and merging them through a weighted late-fusion mechanism, we achieve 91.4% accuracy and 0.913 macro F1-score, significantly outperforming unimodal systems and general multimodal models.
The system is designed as an advisory overlay, producing risk scores and interpretations to support—but never replace—the human decision-making of the supervising registration officer. This structure ensures that no new automated authority centers are established within the existing legal process and maintains the officer\'s final discretion over each case.
This research demonstrates the feasibility of an AI-based Behavioural verification system for integration with existing land registration infrastructure in Tamil Nadu and highlights the importance of localized training for superior performance over multilingual models. It is our hope that pilot deployment at select Sub-Registrar offices will pave the way for wider adoption and improved fairness in the land registration process.
References
[1] M. Rangarajan and S. Krishnaswamy, \"Land disputes and coercive acquisition in Tamil Nadu: A district-level analysis,\" J. South Asian Dev., vol. 17, no. 2, pp. 145–170, Aug. 2022.
[2] P. Arumugam, \"Governance failure and property rights: Evidence from Tamil Nadu land registration,\" Econ. Political Wkly., vol. 58, no. 12, pp. 34–41, Mar. 2023.
[3] S. Li, W. Deng, and J. Du, \"Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild,\" in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2852–2861.
[4] S. Poria, E. Cambria, R. Bajpai, and A. Hussain, \"A review of affective computing: From unimodal analysis to multimodal fusion,\" Inf. Fusion, vol. 37, pp. 98–125, Sep. 2017.
[5] K. Anandan and R. Selvam, \"Challenges in Tamil natural language processing: A survey,\" Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 4, pp. 211–220, 2022.
[6] P. Ekman and W. V. Friesen, \"Facial action coding system: A technique for the measurement of facial movement,\" Consulting Psychologists Press, Palo Alto, CA, USA, 1978.
[7] S. Poria, D. Hazarika, N. Majumder, and R. Mihalcea, \"Beneath the tip of the iceberg: Current challenges and new directions in sentiment analysis research,\" IEEE Trans. Affect. Comput., vol. 13, no. 1, pp. 108–125, Jan.–Mar. 2022.
[8] D. Kunchukuttan et al., \"The AI4Bharat-IndicNLP corpus: Monolingual corpora and word embeddings for Indic languages,\" in Proc. EMNLP, 2020, pp. 3743–3753.
[9] T. Baltrusaitis, C. Ahuja, and L.-P. Morency, \"Multimodal machine learning: A survey and taxonomy,\" IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 2, pp. 423–443, Feb. 2019.
[10] R. Sharma, A. Gupta, and V. Mehta, \"Involuntary consent detection in financial digital transactions using multimodal video analytics,\" in Proc. Int. Conf. Comput. Intell. Data Sci. (ICCIDS), 2021, pp. 1–7.
[11] A. Mollahosseini, B. Hasani, and M. H. Mahoor, \"AffectNet: A database for facial expression, valence, and arousal computing in the wild,\" IEEE Trans. Affect. Comput., vol. 10, no. 1, pp. 18–31, Jan.–Mar. 2019.
[12] A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, \"Wav2Vec 2.0: A framework for self-supervised learning of speech representations,\" in Adv. Neural Inf. Process. Syst., vol. 33, 2020, pp. 12449–12460.