Indian Legal Text Summarization Using InCaseLaw BERT

Authors: Aswin Asokan, Bineesha K P, Shyam Prakash M, Dr. Jisha P Abraham, Prof. Richu Shibu

DOI Link: https://doi.org/10.22214/ijraset.2025.73399

Abstract

SummarizingIndianlegaldocumentssuchascourtjudgmentsand orders have significant difficulties due to its complexity and length. Recent advances in Natural Language Processing (NLP) pave the way to overcome this challenge.This paper presents the design, implementation, and performance evaluation of the summarizationof Indian legal texts using domain-specific transformer models.We begin the work with an introduction to domain-specific transformer models for the summarization of legal texts.Through this work, we have used pre-trained transformer models fine-tuned on various In- dian court judgments to generate concise summaries categorized into facts, arguments, judgments, analysis, and statutes, ensuring read- ability.Key components of the work include fine-tuned transformer models for sentence selection, categorization, and paraphrasing, as wellasGoogle’sGeminimodelforassistinguserswiththeirinquiries. The work aims to assist users in accessing and reduce the time taken for research of these complex texts.

Introduction

Indian legal documents are complex and difficult for common users to understand, requiring expert knowledge and time-consuming efforts. Existing summarization techniques often miss key details, highlighting the need for better tools to improve readability and accessibility.

This work proposes an AI-based Indian legal text summarizer using domain-specific transformer models, notably the InCaseLaw BERT model, to extract and categorize important sentences from legal documents. These extracted sections are then paraphrased for clarity using a ChatGPT-based paraphraser. Additionally, the system integrates Google Gemini, a large language model chatbot, to answer user queries interactively about the legal content.

The system aims to:

Improve accessibility to complex legal information by generating structured, section-wise summaries.
Provide interactive legal assistance via a conversational AI interface.
Support legal research, education, and user-friendly navigation of legal texts.

The methodology involves a pipeline of PDF parsing, sentence extraction with fine-tuned BERT models, paraphrasing for readability, and conversational query support using Google Gemini. The backend is built with Python and Hugging Face transformers, and the frontend uses React.

Evaluation metrics (F1 and ROUGE scores) indicate strong performance in sentence classification and summary quality. Challenges include sentence boundary detection errors, computational load for large documents, and scarcity of annotated datasets.

Future improvements plan to add automated document classification, multilingual support, case similarity analysis, community validation, and mobile app development to enhance usability and reach.

Conclusion

In this paper, we have presented how a transformer based model can be usedto generate structured and concise summaries from large Indian legal documents.We evaluated the system with both F1 score and ROUGE scoresto ensure its ability to provide accurate summaries.While the system displays promising results, further refinements are required to improve its efficiencyand effectiveness.Future works will focus on improving the model through expanding datasets, multilingual and multi-document support, as well asmobile application development to improve accessibility.

References

[1] Reshma Sheik, Sneha Rao Ganta, S. Jaya Nirmala “Legal sentence boundary detection using hybrid deep learning and statistical models”, National Institute of Technology, Trichy, Tiruchirappalli, Tamil Nadu,India,14March2024 [2] ReshmaSheik,S.JayaNirmala“DeepLearningTechniquesforLegal Text Summarization”, 2021 IEEE 8th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON) | 978-1-6654-0962-9/21/$31.00 ©2021 IEEE | DOI:10.1109/UPCON52273.2021.9667640 [3] Paul, Shounak & Mandal, Arpan & Goyal, Pawan & Ghosh, Saptarshi. (2022). Pre-training Transformers on Indian Legal Text. 10.48550/arXiv.2209.06049. [4] Kevin, Craig & Johnson, Niyi. (2025). The Use of Chatbots in Providing Free Legal Guidance:Benefits and Limitations. [5] Kurt Pehlivano?lu, M., Gobosho, R. T., Syakura, M. A., Shanmuganathan, V., & de-la-Fuente-Valentín, L. (2024). Comparative analysis of paraphrasing performance of ChatGPT, GPT-3, and T5 languagemodelsusinganewChatGPTgenerateddataset:ParaGPT. ExpertSystems, 41(11), e13699. https://doi.org/10.1111/exsy.13699 [6] Vorobev, Vladimir, and Kuznetsov, Maxim. A paraphrasing model based on ChatGPT paraphrases. Proceedings of the [conference name not specified], 2023. [7] Abhay Shukla, Paheli Bhattacharya, Soham Poddar, Rajdeep Mukherjee, Kripabandhu Ghosh, Pawan Goyal, & Saptarshi Ghosh. (2022). Legal CaseDocumentSummarization:ExtractiveandAbstractiveMethods and their Evaluation [Data set].The 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing(AACL-IJCNLP).

Copyright

Copyright © 2025 Aswin Asokan, Bineesha K P, Shyam Prakash M, Dr. Jisha P Abraham, Prof. Richu Shibu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET73399

Publish Date : 2025-07-26

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here