Skip to content

Update default Embeddings parameters #925

@davidmezzetti

Description

@davidmezzetti

Make the following parameter updates:

  • keyword: Allow boolean (sets to BM25 like current) and string (sets to a scoring method)
  • sparse: new parameter that enables sparse vector scoring. When set to True it will use a default sparse vector model, otherwise this should be set to a model path
  • dense: new parameter that is an alias for path. True sets the default dense vector model, otherwise this should be set to a model path.
  • hybrid: True enables BM25 + vector search (like it does today). String supports hybrid indexes with a scoring method or sparse vector path

With this new pattern, embeddings can be created as follows.

from txtai import Embeddings

embeddings = Embeddings(keyword=True)
embeddings = Embeddings(keyword="bm25")

embeddings = Embeddings(sparse=True)
embeddings = Embeddings(sparse="prithivida/Splade_PP_en_v2")

embeddings = Embeddings(dense=True)
embeddings = Embeddings(dense="sentence-transformers/all-MiniLM-L6-v2")

embeddings = Embeddings(sparse=True, dense=True)
embeddings = Embeddings(hybrid=True)
embeddings = Embeddings(hybrid="prithivida/Splade_PP_en_v2")

This change will be fully backwards compatible.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions