In the multiplicative representation, on updating a single parameter, the resultant weight matrix has many more updates as compared to additive transformations, as can be seen in the figure below. This can lead to the requirement of fewer updates to modify the weight matrix to another matrix, leading to faster convergence. We observe this empirically in our experiments.
Convergence time reflects how quickly a model reaches a stable or desirable level of performance during training. To complement the evaluation metrics presented in Table 1, we demonstrate in this section that our proposed techniques achieve faster convergence compared to LoRA. We quantify convergence speed using the Area Under the Curve (AUC) metric for the training loss curve, where a lower AUC indicates faster convergence. The figure illustrates the training loss curves for LoRMA (both \(\mathcal{I}_{+}\) and \(\mathcal{I}_{\pi}\) variants) compared to LoRA on the CoLA task while using the RoBERTa\(_\text{base}\) model. The results show a steeper decline in training loss. The percentage reduction in AUC for various tasks relative to LoRA is summarized in the table. Similar trends were observed for other tasks as well.
As explained earlier, a naive low-rank multiplicative adaptation of \(\mathbf W_0\) has limitations. We present here the empirical verification of the same, and the results are shown in the below table. The experiments were done on RoBERTalarge on a subset of GLUE tasks, and all the hyperparameters and training conditions were kept exactly the same, apart from the presence and absence of the rank inflation strategies. Further, we evaluate the effectiveness of the proposed rank inflation strategies by monitoring the rank of matrices throughout the training procedure. We observe that these operations successfully help break the rank bottleneck, and the matrices are almost full rank throughout.
For any technique, denote \(\Delta \mathbf{W}\) to be the difference between the final adapted weight matrix and the initial weight matrix (the frozen weights). We investigate the relationship of \(\Delta \mathbf{W}_{\text{LoRA}}\) with \(\Delta \mathbf{W}_{\text{LoRMA}_{+}}\) and \(\Delta \mathbf{W}_{\text{LoRMA}_{\pi}}\) as compared to a random matrix. To assess the correlation, we employ a variety of metrics, the results of which are summarized in Table 1. We utilize the Frobenius norm \(\left\Vert \cdot \right\Vert_F\) to measure the deviation between the matrices. The cosine similarity of the flattened matrices (\(\texttt{cos}(\cdot, \cdot)\)) and the principal subspace angle \(\Theta_1(\cdot, \cdot)\) between their column spaces have been used to measure their alignment. We compute the sum of squared differences between the top-\(r\) singular values \((\cdot, \cdot)_{\mathcal{S}}^r\) and eigenvalues \((\cdot, \cdot)_{\mathcal{E}}^r\) of the two matrices to assess their similarity.
Correlation between \(\Delta \mathbf{W}_{\text{LoRA}}\) and \(\Delta \mathbf{W}_{\text{LoRMA}}\) for RoBERTalarge. \(\uparrow/\downarrow\) indicates higher/lower is more similar.
As can be seen in the table above, the main trend points towards a high correlation between \(\Delta \mathbf{W}_{\text{LoRA}}\) and \(\Delta \mathbf{W}_{\text{LoRMA}_{+}}\) and \(\Delta \mathbf{W}_{\text{LoRMA}_{\pi}}\), which shows that our multiplicative techniques can capture updates learned by additive LoRA. Additionally to assess the expressibility of the transformations, we compare the rank of \(\Delta \mathbf{W}\). For LoRA, \(\Delta \mathbf{W} = \mathbf{B} \mathbf{A}\); hence, it is restricted to be a low-rank update. While for LoRMA\(_{\pi}\), there are no such limitations. We empirically observe them to be almost full-rank matrices.
@inproceedings{bihany-etal-2025-lorma,
title = "{L}o{RMA}: Low-Rank Multiplicative Adaptation for {LLM}s",
author = "Bihany, Harsh and Patel, Shubham and Modi, Ashutosh",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-acl.527/",
}