Prior Case Retrieval (PCR)

PCR requires identifying relevant prior cases from a set of candidates, given a query case document

Type of Task Text Retrieval
Dataset IL-PCR (Joshi et al., 2023a)
Language English
No. of documents 7,070
Evaluation Metric micro-F1@K

Task Motivation and Description

When framing a legal document, legal experts (judges and lawyers) use their expertise to cite previous cases to support their arguments/reasoning. Legal experts have relied on their expertise to cite previous cases; however, with an exponentially growing number of cases, it becomes practically impossible to recall all possible cases. Automating this process can make the jobs of legal experts easier and expedite the justice delivery process.

Given a query document (without citations), the task of Prior Case Retrieval (PCR) is to automatically retrieve the legal documents from the candidate pool which are relevant (and hence cited) in the given query document.

Note that in contrast to standard relevance based on semantic similarity in the case of information retrieval tasks, relevance in the legal domain is mainly about similar factual situations and previous legal precedents. Hence, PCR requires a model to learn to reason about legal documents by following the chain of facts and arguments presented, much like a judge.

Dataset

For the task of PCR we created Indian Legal Prior Case Retrieval (IL-PCR) corpus. IL-PCR corpus (7,070 documents) is created by scraping legal documents (available in the public domain) from the IndianKanoon website. Individuals’ and organizations’ names are anonymized to remove any biases in the models. Each query case is annotated with ground truth labels indicating the relevant candidates’ cases.

Dataset Format

Each document (json) has the following format:

Dict{
  'id': string  // case identifier
  'text': List(string)  // full case document sentences
  'relevant_candidates': List(string) // list of relevant candidate document IDs
}

Task Evaluation

Micro-averaged F1 score is used as the evaluation metric. Each candidate is assigned a relevance score by the model based on the given query case. The prediction (i.e., whether a candidate is cited or not) is done based on the Top-K-ranked candidates.

Baseline Models

We experimented with various models (including transformer based) for the PCR task. Please refer to the original paper for details. Here, we discuss the baseline model. The baseline model is based on events in the legal document. In our case, events are defined as a tuple consisting of a predicate (typically a verb) and its corresponding arguments (such as subject, object, etc.). The events are obtained by parsing legal documents with a dependency parser. The document is represented by Event tuples. The relevance between the query and candidates is calculated by comparing the event-based representation of the documents.

We have observed that the event-based models perform the best but still have a micro F1 score of 39.15, which is relatively low. Surprisingly, transformer-based models do not perform well on the PCR task. For more details, please refer to the original paper.

Results of Events Filtered Docs

Method K μP@K μR@K μF1@K
BM25 5 24.26 16.50 19.64
BM25 (Bi-gram) 6 33.69 27.50 30.28
BM25 (Tri-gram) 6 41.35 33.76 37.17
BM25 (Quad-gram) 7 40.12 38.2 39.15
BM25 (Penta-gram) 7 39.57 37.70 38.61