Skip to content

Support multi-adpater concurrent inferencing #973

@whybeyoung

Description

@whybeyoung

Feature request

Hey all:

Very glad to see that PR-263 , but i try it and find that it use set_adpater method to switch lora-adapter , it means that we can only have concurrency with 1 in the same time. i see that lora layer can be inference without merged to origin base model. So can we support an way that permit user dynamicaly combine base_model and lora_adapter, and user can do concurrent call to base_model and lora_adapter

Thanks.

Motivation

This way can save many money............

Your contribution

NA

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions