SummarizingIndianlegaldocumentssuchascourtjudgmentsand orders have significant difficulties due to its complexity and length. Recent advances in Natural Language Processing (NLP) pave the way to overcome this challenge.This paper presents the design, implementation, and performance evaluation of the summarizationof Indian legal texts using domain-specific transformer models.We begin the work with an introduction to domain-specific transformer models for the summarization of legal texts.Through this work, we have used pre-trained transformer models fine-tuned on various In- dian court judgments to generate concise summaries categorized into facts, arguments, judgments, analysis, and statutes, ensuring read- ability.Key components of the work include fine-tuned transformer models for sentence selection, categorization, and paraphrasing, as wellasGoogle’sGeminimodelforassistinguserswiththeirinquiries. The work aims to assist users in accessing and reduce the time taken for research of these complex texts.
Introduction
Indian legal documents are complex and difficult for common users to understand, requiring expert knowledge and time-consuming efforts. Existing summarization techniques often miss key details, highlighting the need for better tools to improve readability and accessibility.
This work proposes an AI-based Indian legal text summarizer using domain-specific transformer models, notably the InCaseLaw BERT model, to extract and categorize important sentences from legal documents. These extracted sections are then paraphrased for clarity using a ChatGPT-based paraphraser. Additionally, the system integrates Google Gemini, a large language model chatbot, to answer user queries interactively about the legal content.
The system aims to:
Improve accessibility to complex legal information by generating structured, section-wise summaries.
Provide interactive legal assistance via a conversational AI interface.
Support legal research, education, and user-friendly navigation of legal texts.
The methodology involves a pipeline of PDF parsing, sentence extraction with fine-tuned BERT models, paraphrasing for readability, and conversational query support using Google Gemini. The backend is built with Python and Hugging Face transformers, and the frontend uses React.
Evaluation metrics (F1 and ROUGE scores) indicate strong performance in sentence classification and summary quality. Challenges include sentence boundary detection errors, computational load for large documents, and scarcity of annotated datasets.
Future improvements plan to add automated document classification, multilingual support, case similarity analysis, community validation, and mobile app development to enhance usability and reach.
Conclusion
In this paper, we have presented how a transformer based model can be usedto generate structured and concise summaries from large Indian legal documents.We evaluated the system with both F1 score and ROUGE scoresto ensure its ability to provide accurate summaries.While the system displays promising results, further refinements are required to improve its efficiencyand effectiveness.Future works will focus on improving the model through expanding datasets, multilingual and multi-document support, as well asmobile application development to improve accessibility.