Legal Named Entity Recognition (L-NER)

L-NER aims to predict from a court case document, the entities belonging to classes of legal interest such as Judge, Appellant, Respondent, etc.

Type of Task Multi-class Sequence Labeling
Dataset Joshi et al. (2023b)
Language English
No. of documents 105
No. of labels 12
Evaluation Metric strict macro-F1

Task Motivation and Description

Named Entity Recognition (NER) is a standard task in NLP (Yadav and Bethard, 2019). However, in the legal domain, the types of named entities one may be interested in differ. For example, one would like to identify the judge, petitioner (appellant), and respondent in the legal document. If one were to run a standard NER system on a legal document, the judge, petitioner, and respondent would all be labeled with a PERSON tag. Hence, a separate task is needed to identify the legal named entities in the documents. L-NER is a foundational task and can be helpful in various applications related to information extraction, knowledge graph creation, legal search, and other legal tasks. Further, L-NER could be useful in anonymizing the documents used for training legal text understanding models; this can further help to reduce bias.

Formally, given a legal document, the task of Legal Named Entity Recognition is to identify the legal named entities (set of 12 entity types), namely, Appellant, Respondent, Judge, Appellant Counsel, Respondent Counsel, Court, Authority, Witness, Statute, Precedent, Date, and Case Number.

The NE classes were derived after in-depth discussions with legal experts.

Dataset

For the L-NER task, we collected a total of 105 cases documents (in English), publicly available at the IndianKanoon website. The collected documents belong to the Supreme Court of India (SCI) and some state High Courts. With the help of legal experts (a law professor and law graduates from a reputed law school), these documents are annotated with 12 entity types mentioned above.

Dataset Format

Each document (json) has the following format:

Dict{
  'id': string  // case identifier
  'text': string  // case contents
  'spans': List(
    Dict{  // each dict represents an entity
      'start': int  // starting char index of entity
      'end': int  // ending char index of entity
      'label': ClassLabel // entity type
    }
  )
}

Task Evaluation

NER can be formulated as a sequence prediction task, where each word receives either of the labels {B-X, I-X, O} as per the popular B-I-O scheme (Yadav and Bethard, 2019) (X represents any of the legal classes we are interested in). We use standard metrics of strict macro-averaged precision, recall, and F1 score for evaluation. The strict score assumes a correct match only if both the entity boundary and entity type are correctly predicted.

Baseline Models

We perform NER based on token representations generated by BERT-based models. Since each document in the dataset does not come pre-segmented into sentences or paragraphs, we need to chunk documents before passing them to BERT, as case documents easily exceed the token limits of BERT. However, unlike other tasks like text classification, we need to devise a chunking strategy to avoid splitting true NEs into different chunks. For this, we choose to chunk at the last stopword (based on NLTK’s list of English stopwords), which satisfies the chunk size limit. The assumption is that these stopwords are not expected to be part of entity names.

We experiment with four different BERT encoders:

(i) bert-base-uncased,

(ii) LegalBERT,

(iii) CaseLawBERT,

(iv) InLegalBERT

Results

Model Strict mP Strict mR Strict mF1
BERT 86.50 ± 2.12 84.09 ± 3.61 85.22 ± 2.90
LegalBERT 88.62 ± 0.54 85.83 ± 0.75 87.14 ± 0.62
CaseLawBERT 89.96 ± 0.65 86.30 ± 0.42 88.05 ± 0.48
InLegalBERT 93.28 ± 0.23 90.38 ± 0.05 91.78 ± 0.10