Legal Machine Translation (L-MT)
L-MT involves translating a legal snippet (law article / paragraph of a case judgment) in English to Indic languages
Type of Task | Text Generation |
Dataset | MILPaC (Mahapatra et al., 2023) |
Language | English + 9 Indic Languages |
No. of docs | 17,853 pairs |
Type of summary | Abstractive |
Evaluation Metric | BLEU, GLEU, chrF++ |
Task Motivation and Description
Although most legal documents in the Indian judiciary is written in English, only about 10% of the general population is comfortable in reading and understanding English. Moreover, different states in India have different languages. Thus there is a need to translate legal documents between English and Indic languages, to improve the accessibility of law to the general public.
The task of legal machine translation involves translating legal text from English to Indic languages.
Machine translation is a standard task in NLP; however, in the case of the Indian legal domain, the major challenge lies in acquiring legal texts written in low-resource Indic languages.
Dataset
MILPaC is based on 10 languages in total: English (EN), Hindi (HI), Bengali (BN), Marathi (MR), Tamil (TA), Gujarati (GU), Telugu (TE), Malayalam (ML), Punjabi (PA) and Oriya (OR).
MILPaC comprises of three subsets:
(i) MILPaC-IP: Constructed from a set of primers released by IPTLS, a society of law practitioners. Comprises of different questions and answers in English and all 9 Indic languages.
(ii) MILPaC-CCI-FAQ: Constructed from a set of FAQ booklets released by the Competition Commission of India (CCI), mostly based on The Competition Act, 2002.
(iii) MILPaC-Acts: Constructed from the official version and translations of Indian parliamentary acts.
The dataset is named MILPaC: Multilingual Indian Legal Parallel Corpora
, and was released in our previous work Mahapatra et al. (2023).
Dataset Format
Each document (json) has the following format:
Dict{
'id': string // instance identifier
'src_lang': string // language of source text
'src': string // source text
'tgt_lang': string // language of target text
'tgt': string // target text
}
Task Evaluation
We use standard metrics for summarization such as BLEU, GLEU, chrF++ used in prior works (Mahapatra et al. (2023)).
Baseline Models
We apply both commercial and open-source MT systems on the dataset.
Commercial Systems
We employ the following commercial systems:
(i) Google Cloud Translation - Advanced Edition (v3) system
(ii) Microsoft Azure Cognitive Services (v3) Translation API
(iii) Large Language Models (Davinci-003 and GPT-3.5-turbo-instruct)
Experiments with LLMs were conducted only in a one-shot setting.
Open-source systems
We also apply the following open-source translation systems:
(i) mBART-50 (Cannot be applied to Oriya and Punjabi)
(ii) OPUS (Legal-specific)
(iii) NLLB (Legal-specific)
(iv) IndicTrans
Except for mBART-50, all models chosen for the experiments can translate from English to all 9 Indic languages in the MILPaC dataset.
Results
Microsoft, Google and IndicTrans are the best performing systems overall. We report the metrics of these models averaged across all data subsets and all languages.
Model | BLEU | GLEU | chrF++ |
---|---|---|---|
28.0 | 31.2 | 51.7 | |
Microsoft | 28.4 | 32 | 56.9 |
IndicTrans | 25.5 | 29.4 | 54.4 |