Skip to content

[Perf] Speed up LoRA Batch Initialization #6961

@lifuhuang

Description

@lifuhuang

Motivation

prepare_lora_batch is triggered once per forward pass and is one of the main sources of perf overheads from LoRA. Based on suggestion from @Fridge003 , there are some low-hanging fruits for perf optimization such as eliminating unnecessary cuda device syncs.

Current status:

Baseline #6960 + #6994 + #8940
ITL@P95 78.42 ms 68.24 ms (-13.0%) 52.51 (-33.0%) 38.40 (-51.0%)
ITL@P50 34.36 ms 32.85 ms (-4.4%) 22.68 (-34.0%) 18.30 (-46.7%)
TTFT@P50 91.37 ms 85.52 ms (-6.5%) 62.65 (-31.4%) 53.79 (-41.1%)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions