Summarization (SUMM)

SUMM automates the process of generating a gist of a legal case document that captures the critical aspects of the case


Type of Task	Text Generation
Dataset	`In-Abs` (Shukla et al., 2022)
Language	English
No. of documents	7,130
Type of summary	Abstractive
Evaluation Metric	ROUGE-L, BERT-SCORE

Dataset Code

Task Motivation and Description

Going through legal documents (spanning tens of pages) can be a time-consuming and cumbersome activity; however, if there was a summary of the document, that could help legal practitioners and make their workflow more efficient.

The task of summarization involves generating a gist (of a legal document) that captures the critical aspects of the case.

Summarization is a standard task in NLP; however, in the case of the legal domain, there are a few additional challenges, such as:

(i) case documents are generally very lengthy, and thus the summaries are long too;

(ii) large-scale summarization datasets are difficult to build since it is expensive to gather expert annotations.

Summarization could be extractive (selecting the important sentences) or abstractive (generating the gist). In our setting, summarization is an abstractive generation task.

Dataset

We collected Supreme Court of India judgments from the website of Legal Information Institute of India, which provides free and non-profit access to databases of Indian law. Abstractive summaries (also called headnotes) are available for some of these cases, of which we include 7,130 case documents, together with their headnotes/summaries as part of the dataset. We reserve 100 randomly-selected document-summary pairs for evaluation, and the remaining 7,030 pairs are used for training. The dataset is named In-Abs: Indian legal documents Abstractive summarization, and was released in our previous work Shukla et al. (2022).

Dataset Format

Each document (json) has the following format:

Dict{
  'id': string  // case identifier
  'num_doc_tokens': int // number of words in full document
  'num_summ_tokens': int  // number of words in summary
  'document': List(string)  // sentences of case document
  'summary': List(string) // sentences of summary
}

Task Evaluation

We use standard metrics for summarization such as ROUGE-1, ROUGE-2, ROUGE-L F1-scores and BERT-SCORE (Zhang et al., 2020).

Baseline Models

We apply both extractive and abstractive methods on the dataset.

Extractive Methods

We apply the following extractive techniques:

(i) CaseSummarizer (Legal-specific, Unsupervised)

(ii) DSDR (Open domain, Unsupervised)

(iii) Gist (Legal-specific, Supervised)

(iv) SummaRuNNer (Open domain, Supervised)

To adapt the abstractive gold-standard summaries for these extractive methods, we use the technique suggested by Narayan et al. (2018).

Abstractive Methods

We also apply the following abstractive techniques:

(i) BART (Open domain)

(ii) Legal-Pegasus (Legal-specific)

(iii) Legal-LED (Legal-specific)

While Legal-LED can accommodate a large number of documents (16,384 token limit), the same is not true for the other models. To overcome this problem, we chunk the document into equal-sized chunks (each chunk size is lesser than the model length limit) and pass each chunk through the model. The summaries for each chunk are concatenated to form the final summary. To convert the overall document summary (gold standard) into chunk-wise summaries, we follow the approach given by Gidiotis and Tsoumakas (2020). All the models are fine-tuned on the summarization dataset.

Results

Algorithm	Rouge-1	Rouge-2	Rouge-L	BERT-Score
DSDR	0.485	0.222	0.270	0.848
CaseSummarizer	0.454	0.229	0.279	0.843
SummaRuNNer	0.493	0.255	0.274	0.849
Gist	0.471	0.238	0.308	0.842
BART	0.495	0.249	0.330	0.851
Legal-Pegasus	0.488	0.252	0.341	0.851
Legal-LED	0.471	0.235	0.332	0.856

Summarization (SUMM)

Task Motivation and Description #

Dataset #

Dataset Format #

Task Evaluation #

Baseline Models #

Extractive Methods #

Abstractive Methods #

Results #