Rhetorical Role Prediction (RR)

RR aims to break down a court case document into topically coherent units such as Facts, Arguments, Rulings, etc.

Type of Task Multi-class Text Classification
Dataset Malik et al. (2022)
Language English
No. of sentences 21,184
No. of labels 12 + “None”
Evaluation Metric macro-F1

Task Motivation and Description

As pointed out earlier, legal documents are typically long (avg. length 4,000 tokens) and highly unstructured. The information is spread throughout the document. If the legal document could be segmented into topically coherent units (such as facts, arguments, precedent, statute, etc.), then it would be helpful in downstream applications. Each topically coherent unit is referred to as a Rhetorical Role (RR).

Given a legal document, the task of RR prediction involves assigning RR label(s) to each sentence. We target 13 RR labels: Fact, Issue, Arguments (Respondent), Argument (Petitioner), Statute, Dissent, Precedent Relied Upon, Precedent Not Relied Upon, Precedent Overruled, Ruling By Lower Court, Ratio Of The Decision, Ruling By Present Court, None.

RR Prediction is a foundational task that can further help to structure the information and help in downstream applications related to document understanding, information extraction, and retrieval.

Dataset

We use the dataset created in our previous work Malik et al. (2022). The dataset consists of 21,184 sentences from legal documents (in English) about banking and competition law. The sentences are annotated with 13 RRs by six legal experts (from a reputed law school). We follow the same train/dev/test split as in Malik et al. (2022).

Dataset Format

Each document (json) has the following format:

Dict{
  'id': string  // case identifier
  'text': List(string)  // case document sentences
  'labels': List(ClassLabel)  // list of labels (rhetorical roles) corresponding to each sentence
  'expert_1': Dict{ // annotations by expert_1
    'primary': List(string) // primary rhetorical role of each sentence
    'secondary': List(string) // secondary rhetorical role of each sentence
    'tertiary': List(string)  // tertiary rhetorical role of each sentence
    'overall': List(string) // final rhetorical role after considering primary, secondary and tertiary
  }
  'expert_2': Dict{...} // similar to expert_1
  'expert_3': Dict{...} // similar to expert_1
}

Task Evaluation

The task is evaluated using the standard metric of macro-F1 score.

Baseline Models

For the task of RR Prediction, in our previous work Malik et al. (2022), we have experimented with various models, e.g., BERT, Legal-BERT, BiLSTM-CRF, Multi-Task Learning (MTL) based model. We do not discuss each of these models here and refer the reader to Malik et al. (2022). We briefly describe the best-performing model that is used as the baseline.

We pose Rhetorical Role prediction as a sequence prediction task. Input to the baseline model is a sequence of sentences, and output is the RR label for each sentence. The baseline model is based on the intuition that RRs tend to maintain some inertia when going from one sentence to another, and changes in RR labels are not abrupt but smooth. In particular, as the baseline model, we use a Multi-Task Learning (MTL) model (Malik et al., 2022). The MTL model predicts the RR label for the current sentence as the main task. The auxiliary task in the case of the MTL model predicts if there is a change in the RR label going from the previous sentence to the current one. The auxiliary provides information to the main task is to predict a new RR label when moving from one sentence to another. Further details about the results can be found in Malik et al. (2022).

Results

Model mF1 on CL mF1 on IT mF1 on CL+IT
BiLSTM-CRF (sent2vec) 0.61 0.59 0.65
BiLSTM-CRF (BERT embs) 0.63 0.63 0.63
LSP-BiLSTM-CRF (BERT-SC) 0.68 0.65 0.67
MTL-BiLSTM-CRF (BERT-SC) 0.69 0.70 0.70