chroma-core · jairad26 · Nov 17, 2025 · Nov 14, 2025 · propel-code-bot · Nov 14, 2025
diff --git a/...s.trychroma.com/markdoc/content/integrations/embedding-models/amazon-bedrock.md b/...s.trychroma.com/markdoc/content/integrations/embedding-models/amazon-bedrock.md
@@ -0,0 +1,38 @@
+---
+id: amazon-bedrock
+name: Amazon Bedrock
+---
+
+# Amazon Bedrock
+
+Chroma provides a convenient wrapper around Amazon Bedrock's embedding API. This embedding function runs remotely on Amazon Bedrock's servers, and requires AWS credentials configured via boto3.
+
+{% Tabs %}
+
+{% Tab label="python" %}
+
+This embedding function relies on the `boto3` python package, which you can install with `pip install boto3`.
+
+```python
+import boto3
+from chromadb.utils.embedding_functions import AmazonBedrockEmbeddingFunction
+
+session = boto3.Session(profile_name="profile", region_name="us-east-1")
+bedrock_ef = AmazonBedrockEmbeddingFunction(
+    session=session,
+    model_name="amazon.titan-embed-text-v1"
+)
+
+texts = ["Hello, world!", "How are you?"]
+embeddings = bedrock_ef(texts)
+```
+
+You can pass in an optional `model_name` argument, which lets you choose which Amazon Bedrock embedding model to use. By default, Chroma uses `amazon.titan-embed-text-v1`.
+
+{% /Tab %}
+
+{% /Tabs %}
+
+{% Banner type="tip" %}
+Visit Amazon Bedrock [documentation](https://docs.aws.amazon.com/bedrock/) for more information on available models and configuration.
+{% /Banner %}
diff --git a/...docs.trychroma.com/markdoc/content/integrations/embedding-models/chroma-bm25.md b/...docs.trychroma.com/markdoc/content/integrations/embedding-models/chroma-bm25.md
@@ -0,0 +1,69 @@
+---
+id: chroma-bm25
+name: Chroma BM25
+---
+
+# Chroma BM25
+
+Chroma provides a built-in BM25 sparse embedding function. BM25 (Best Matching 25) is a ranking function used to estimate the relevance of documents to a given search query. This embedding function runs locally and does not require any external API keys.
+
+Sparse embeddings are useful for retrieval tasks where you want to match on specific keywords or terms, rather than semantic similarity.
+
+{% Tabs %}
+
+{% Tab label="python" %}
+
+```python
+from chromadb.utils.embedding_functions import ChromaBm25EmbeddingFunction
+
+bm25_ef = ChromaBm25EmbeddingFunction(
+    k=1.2,
+    b=0.75,
+    avg_doc_length=256.0,
+    token_max_length=40
+)
+
+texts = ["Hello, world!", "How are you?"]
+sparse_embeddings = bm25_ef(texts)
+```
+
+You can customize the BM25 parameters:
+- `k`: Controls term frequency saturation (default: 1.2)
+- `b`: Controls document length normalization (default: 0.75)
+- `avg_doc_length`: Average document length in tokens (default: 256.0)
+- `token_max_length`: Maximum token length (default: 40)
+- `stopwords`: Optional list of stopwords to exclude
+
+{% /Tab %}
+
+{% Tab label="typescript" %}
+
+```typescript
+// npm install @chroma-core/chroma-bm25
+
+import { ChromaBm25EmbeddingFunction } from "@chroma-core/chroma-bm25";
+
+const embedder = new ChromaBm25EmbeddingFunction({
+  k: 1.2,
+  b: 0.75,
+  avgDocLength: 256.0,
+  tokenMaxLength: 40,
+});
+
+// use directly
+const sparseEmbeddings = await embedder.generate(["document1", "document2"]);
+
+// pass documents to query for .add and .query
+const collection = await client.createCollection({
+  name: "name",
+  embeddingFunction: embedder,
+});
+```
+
+{% /Tab %}
+
+{% /Tabs %}
+
+{% Banner type="tip" %}
+BM25 is a classic information retrieval algorithm that works well for keyword-based search. For semantic search, consider using dense embedding functions instead.
+{% /Banner %}
diff --git a/...rychroma.com/markdoc/content/integrations/embedding-models/chroma-cloud-qwen.md b/...rychroma.com/markdoc/content/integrations/embedding-models/chroma-cloud-qwen.md
@@ -0,0 +1,63 @@
+---
+id: chroma-cloud-qwen
+name: Chroma Cloud Qwen
+---
+
+# Chroma Cloud Qwen
+
+Chroma provides a convenient wrapper around Chroma Cloud's Qwen embedding API. This embedding function runs remotely on Chroma Cloud's servers, and requires a Chroma API key. You can get an API key by signing up for an account at [Chroma Cloud](https://www.trychroma.com/).
+
+{% Tabs %}
+
+{% Tab label="python" %}
+
+This embedding function relies on the `httpx` python package, which you can install with `pip install httpx`.
+
+```python
+from chromadb.utils.embedding_functions import ChromaCloudQwenEmbeddingFunction, ChromaCloudQwenEmbeddingModel
+import os
+
+os.environ["CHROMA_API_KEY"] = "YOUR_API_KEY"
+qwen_ef = ChromaCloudQwenEmbeddingFunction(
+    model=ChromaCloudQwenEmbeddingModel.QWEN3_EMBEDDING_0p6B,
+    task="nl_to_code"
+)
+
+texts = ["Hello, world!", "How are you?"]
+embeddings = qwen_ef(texts)
+```
+
+You must pass in a `model` argument and `task` argument. The `task` parameter specifies the task for which embeddings are being generated. You can optionally provide custom `instructions` for both documents and queries.
+
+{% /Tab %}
+
+{% Tab label="typescript" %}
+
+```typescript
+// npm install @chroma-core/chroma-cloud-qwen
+
+import { ChromaCloudQwenEmbeddingFunction, ChromaCloudQwenEmbeddingModel } from "@chroma-core/chroma-cloud-qwen";
+
+const embedder = new ChromaCloudQwenEmbeddingFunction({
+  apiKeyEnvVar: "CHROMA_API_KEY", // Or set CHROMA_API_KEY env var
+  model: ChromaCloudQwenEmbeddingModel.QWEN3_EMBEDDING_0p6B,
+  task: "nl_to_code",
+});
+
+// use directly
+const embeddings = await embedder.generate(["document1", "document2"]);
+
+// pass documents to query for .add and .query
+const collection = await client.createCollection({
+  name: "name",
+  embeddingFunction: embedder,
+});
+```
+
+{% /Tab %}
+
+{% /Tabs %}
+
+{% Banner type="tip" %}
+Visit Chroma Cloud [documentation](https://docs.trychroma.com/) for more information on available models and configuration.
+{% /Banner %}
diff --git a/...chroma.com/markdoc/content/integrations/embedding-models/chroma-cloud-splade.md b/...chroma.com/markdoc/content/integrations/embedding-models/chroma-cloud-splade.md
@@ -0,0 +1,63 @@
+---
+id: chroma-cloud-splade
+name: Chroma Cloud Splade
+---
+
+# Chroma Cloud Splade
+
+Chroma provides a convenient wrapper around Chroma Cloud's Splade sparse embedding API. This embedding function runs remotely on Chroma Cloud's servers, and requires a Chroma API key. You can get an API key by signing up for an account at [Chroma Cloud](https://www.trychroma.com/).
+
+Sparse embeddings are useful for retrieval tasks where you want to match on specific keywords or terms, rather than semantic similarity.
+
+{% Tabs %}
+
+{% Tab label="python" %}
+
+This embedding function relies on the `httpx` python package, which you can install with `pip install httpx`.
+
+```python
+from chromadb.utils.embedding_functions import ChromaCloudSpladeEmbeddingFunction, ChromaCloudSpladeEmbeddingModel
+import os
+
+os.environ["CHROMA_API_KEY"] = "YOUR_API_KEY"
+splade_ef = ChromaCloudSpladeEmbeddingFunction(
+    model=ChromaCloudSpladeEmbeddingModel.SPLADE_PP_EN_V1
+)
+
+texts = ["Hello, world!", "How are you?"]
+sparse_embeddings = splade_ef(texts)
+```
+
+You can optionally pass in a `model` argument. By default, Chroma uses `prithivida/Splade_PP_en_v1`.
+
+{% /Tab %}
+
+{% Tab label="typescript" %}
+
+```typescript
+// npm install @chroma-core/chroma-cloud-splade
+
+import { ChromaCloudSpladeEmbeddingFunction, ChromaCloudSpladeEmbeddingModel } from "@chroma-core/chroma-cloud-splade";
+
+const embedder = new ChromaCloudSpladeEmbeddingFunction({
+  apiKeyEnvVar: "CHROMA_API_KEY", // Or set CHROMA_API_KEY env var
+  model: ChromaCloudSpladeEmbeddingModel.SPLADE_PP_EN_V1,
+});
+
+// use directly
+const sparseEmbeddings = await embedder.generate(["document1", "document2"]);
+
+// pass documents to query for .add and .query
+const collection = await client.createCollection({
+  name: "name",
+  embeddingFunction: embedder,
+});
+```
+
+{% /Tab %}
+
+{% /Tabs %}
+
+{% Banner type="tip" %}
+Visit Chroma Cloud [documentation](https://docs.trychroma.com/) for more information on available models and configuration.
+{% /Banner %}
diff --git a/docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/nomic.md b/docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/nomic.md
@@ -0,0 +1,45 @@
+---
+id: nomic
+name: Nomic
+---
+
+# Nomic
+
+Chroma provides a convenient wrapper around Nomic's embedding API. This embedding function runs remotely on Nomic's servers, and requires an API key. You can get an API key by signing up for an account at [Nomic](https://atlas.nomic.ai/).
+
+{% Tabs %}
+
+{% Tab label="python" %}
+
+This embedding function relies on the `nomic` python package, which you can install with `pip install nomic`.
+
+```python
+from chromadb.utils.embedding_functions import NomicEmbeddingFunction
+import os
+
+os.environ["NOMIC_API_KEY"] = "YOUR_API_KEY"
+nomic_ef = NomicEmbeddingFunction(
+    model="nomic-embed-text-v1",
+    task_type="search_document",
+    query_config={"task_type": "search_query"}
+)
+
+texts = ["Hello, world!", "How are you?"]
+embeddings = nomic_ef(texts)
+```
+
+You must pass in a `model` argument and `task_type` argument. The `task_type` can be one of:
+- `search_document`: Used to encode large documents in retrieval tasks at indexing time
+- `search_query`: Used to encode user queries or questions in retrieval tasks
+- `classification`: Used to encode text for text classification tasks
+- `clustering`: Used for clustering or reranking tasks
+
+The `query_config` parameter allows you to specify a different task type for queries, which is useful when you want to use `search_document` for documents and `search_query` for queries.
+
+{% /Tab %}
+
+{% /Tabs %}
+
+{% Banner type="tip" %}
+Visit Nomic [documentation](https://docs.nomic.ai/platform/embeddings-and-retrieval/text-embedding) for more information on available models and task types.
+{% /Banner %}
diff --git a/docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/open-clip.md b/docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/open-clip.md
@@ -0,0 +1,54 @@
+---
+id: open-clip
+name: OpenCLIP
+---
+
+# OpenCLIP
+
+Chroma provides a convenient wrapper around the OpenCLIP library. This embedding function runs locally and supports both text and image embeddings, making it useful for multimodal applications.
+
+{% Tabs %}
+
+{% Tab label="python" %}
+
+This embedding function relies on several python packages:
+- `open-clip-torch`: Install with `pip install open-clip-torch`
+- `torch`: Install with `pip install torch`
+- `pillow`: Install with `pip install pillow`
+
+```python
+from chromadb.utils.embedding_functions import OpenCLIPEmbeddingFunction
+import numpy as np
+from PIL import Image
+
+open_clip_ef = OpenCLIPEmbeddingFunction(
+    model_name="ViT-B-32",
+    checkpoint="laion2b_s34b_b79k",
+    device="cpu"
+)
+
+# For text embeddings
+texts = ["Hello, world!", "How are you?"]
+text_embeddings = open_clip_ef(texts)
+
+# For image embeddings
+images = [np.array(Image.open("image1.jpg")), np.array(Image.open("image2.jpg"))]
+image_embeddings = open_clip_ef(images)
+
+# Mixed embeddings
+mixed = ["Hello, world!", np.array(Image.open("image1.jpg"))]
+mixed_embeddings = open_clip_ef(mixed)
+```
+
+You can pass in optional arguments:
+- `model_name`: The name of the OpenCLIP model to use (default: "ViT-B-32")
+- `checkpoint`: The checkpoint to use for the model (default: "laion2b_s34b_b79k")
+- `device`: Device used for computation, "cpu" or "cuda" (default: "cpu")
+
+{% /Tab %}
+
+{% /Tabs %}
+
+{% Banner type="tip" %}
+OpenCLIP is great for multimodal applications where you need to embed both text and images in the same embedding space. Visit [OpenCLIP documentation](https://github.com/mlfoundations/open_clip) for more information on available models and checkpoints.
+{% /Banner %}
diff --git a/...hroma.com/markdoc/content/integrations/embedding-models/sentence-transformer.md b/...hroma.com/markdoc/content/integrations/embedding-models/sentence-transformer.md
@@ -0,0 +1,42 @@
+---
+id: sentence-transformer
+name: Sentence Transformer
+---
+
+# Sentence Transformer
+
+Chroma provides a convenient wrapper around the Sentence Transformers library. This embedding function runs locally and uses pre-trained models from Hugging Face.
+
+{% Tabs %}
+
+{% Tab label="python" %}
+
+This embedding function relies on the `sentence_transformers` python package, which you can install with `pip install sentence_transformers`.
+
+```python
+from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction
+
+sentence_transformer_ef = SentenceTransformerEmbeddingFunction(
+    model_name="all-MiniLM-L6-v2",
+    device="cpu",
+    normalize_embeddings=False
+)
+
+texts = ["Hello, world!", "How are you?"]
+embeddings = sentence_transformer_ef(texts)
+```
+
+You can pass in optional arguments:
+- `model_name`: The name of the Sentence Transformer model to use (default: "all-MiniLM-L6-v2")
+- `device`: Device used for computation, "cpu" or "cuda" (default: "cpu")
+- `normalize_embeddings`: Whether to normalize returned vectors (default: False)
+
+For a full list of available models, visit [Hugging Face Sentence Transformers](https://huggingface.co/sentence-transformers) or [SBERT documentation](https://www.sbert.net/docs/pretrained_models.html).
+
+{% /Tab %}
+
+{% /Tabs %}
+
+{% Banner type="tip" %}
+Sentence Transformers are great for semantic search tasks. Popular models include `all-MiniLM-L6-v2` (fast and efficient) and `all-mpnet-base-v2` (higher quality). Visit [SBERT documentation](https://www.sbert.net/docs/pretrained_models.html) for more model recommendations.
+{% /Banner %}