Legal Machine Translation (L-MT)

L-MT involves translating a legal snippet (law article / paragraph of a case judgment) in English to Indic languages

Type of Task Text Generation
Dataset MILPaC (Mahapatra et al., 2023)
Language English + 9 Indic Languages
No. of docs 17,853 pairs
Type of summary Abstractive
Evaluation Metric BLEU, GLEU, chrF++

Task Motivation and Description

Although most legal documents in the Indian judiciary is written in English, only about 10% of the general population is comfortable in reading and understanding English. Moreover, different states in India have different languages. Thus there is a need to translate legal documents between English and Indic languages, to improve the accessibility of law to the general public.

The task of legal machine translation involves translating legal text from English to Indic languages.

Machine translation is a standard task in NLP; however, in the case of the Indian legal domain, the major challenge lies in acquiring legal texts written in low-resource Indic languages.

Dataset

MILPaC is based on 10 languages in total: English (EN), Hindi (HI), Bengali (BN), Marathi (MR), Tamil (TA), Gujarati (GU), Telugu (TE), Malayalam (ML), Punjabi (PA) and Oriya (OR).

MILPaC comprises of three subsets:

(i) MILPaC-IP: Constructed from a set of primers released by IPTLS, a society of law practitioners. Comprises of different questions and answers in English and all 9 Indic languages.

(ii) MILPaC-CCI-FAQ: Constructed from a set of FAQ booklets released by the Competition Commission of India (CCI), mostly based on The Competition Act, 2002.

(iii) MILPaC-Acts: Constructed from the official version and translations of Indian parliamentary acts.

The dataset is named MILPaC: Multilingual Indian Legal Parallel Corpora, and was released in our previous work Mahapatra et al. (2023).

Dataset Format

Each document (json) has the following format:

Dict{
  'id': string  // instance identifier
  'src_lang': string  // language of source text
  'src': string // source text
  'tgt_lang': string  // language of target text
  'tgt': string // target text
}

Task Evaluation

We use standard metrics for summarization such as BLEU, GLEU, chrF++ used in prior works (Mahapatra et al. (2023)).

Baseline Models

We apply both commercial and open-source MT systems on the dataset.

Commercial Systems

We employ the following commercial systems:

(i) Google Cloud Translation - Advanced Edition (v3) system

(ii) Microsoft Azure Cognitive Services (v3) Translation API

(iii) Large Language Models (Davinci-003 and GPT-3.5-turbo-instruct)

Experiments with LLMs were conducted only in a one-shot setting.

Open-source systems

We also apply the following open-source translation systems:

(i) mBART-50 (Cannot be applied to Oriya and Punjabi)

(ii) OPUS (Legal-specific)

(iii) NLLB (Legal-specific)

(iv) IndicTrans

Except for mBART-50, all models chosen for the experiments can translate from English to all 9 Indic languages in the MILPaC dataset.

Results

Microsoft, Google and IndicTrans are the best performing systems overall. We report the metrics of these models averaged across all data subsets and all languages.

Model BLEU GLEU chrF++
Google 28.0 31.2 51.7
Microsoft 28.4 32 56.9
IndicTrans 25.5 29.4 54.4