Feature request
Hey all:
Very glad to see that PR-263 , but i try it and find that it use set_adpater method to switch lora-adapter , it means that we can only have concurrency with 1 in the same time. i see that lora layer can be inference without merged to origin base model. So can we support an way that permit user dynamicaly combine base_model and lora_adapter, and user can do concurrent call to base_model and lora_adapter
Thanks.
Motivation
This way can save many money............
Your contribution
NA