Prerequisites
Feature Description
Support reranking API and models.
Motivation
Reranking is currently very common techniques used along with embeddings in RAG systems. Also there are models where same model instance can be used for both embeddings and reranking - that is great resource optimisation.
Possible Implementation
Reranking is relatively close to embeddings and there are models for both embed/rerank like bge-m3 - supported by llama.cpp with --embed. I'm guessing that one possible challenge/dilemma is that for inference and embed the OpenAI API schema is being used and OpenAI does not offer rerank API. I think currently there is Jina rerank API commonly used in other projects.
I think that in terms of actual reranking there should not be very complex as it is quite related to embedding calls.
Prerequisites
Feature Description
Support reranking API and models.
Motivation
Reranking is currently very common techniques used along with embeddings in RAG systems. Also there are models where same model instance can be used for both embeddings and reranking - that is great resource optimisation.
Possible Implementation
Reranking is relatively close to embeddings and there are models for both embed/rerank like bge-m3 - supported by llama.cpp with --embed. I'm guessing that one possible challenge/dilemma is that for inference and embed the OpenAI API schema is being used and OpenAI does not offer rerank API. I think currently there is Jina rerank API commonly used in other projects.
I think that in terms of actual reranking there should not be very complex as it is quite related to embedding calls.