Skip to content

Comments

[Feature] Support Tensor Parallelism and Weight Slicing for Lora#4239

Closed
aoshen524 wants to merge 264 commits intosgl-project:mainfrom
aoshen524:feature/lora
Closed

[Feature] Support Tensor Parallelism and Weight Slicing for Lora#4239
aoshen524 wants to merge 264 commits intosgl-project:mainfrom
aoshen524:feature/lora

Conversation

@aoshen524
Copy link
Contributor

@aoshen524 aoshen524 commented Mar 9, 2025

Motivation

#3414 reports issues regarding limited model support compared to test_generation_models.py. This PR introduces tensor parallelism and weight slicing for LoRA, alongside additional improvements to testing and functionality.

Modifications

  • Implemented tensor parallelism support in LoRA, allowing efficient distribution of computations across multiple devices.
  • Introduced LoRA weight slicing and refactor memory pool to facilitate distributed inference, optimizing memory usage and performance.

checklist:

  • Remove tensor.contiguous() used in GPU

@Fridge003 Fridge003 changed the title Feature/lora [Feature] Support Tensor Parallelism and Weight Slicing for Lora Mar 9, 2025
merrymercy and others added 22 commits March 11, 2025 04:05
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
Co-authored-by: yinfan98 <1106310035@qq.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.