Commit 4731379
authored
FEAT Integrate BD-LoRA into PEFT (#2895)
Implements BD-LoRA: Block-Diagonal LoRA for Eliminating Communication
Overhead in Tensor Parallel LoRA Serving
(https://openreview.net/forum?id=1cjLvtFOmL).
With BD-LoRA, the LoRA weights are implemented in a block-diagonal way.
This allows to reduce communication overhead when using tensor
parallelism (TP) and thus faster serving.
There is an experiment vLLM PR to support this, but it's not merged
(yet): vllm-project/vllm#28136.1 parent 4d63474 commit 4731379
16 files changed
Lines changed: 907 additions & 3 deletions
File tree
- examples/bdlora_finetuning
- method_comparison/MetaMathQA/experiments/lora/llama-3.2-3B-rank14-target-mlp-bdlora
- src/peft
- tuners
- lora
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
Loading
Loading
0 commit comments