Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
id: amazon-bedrock
name: Amazon Bedrock
---

# Amazon Bedrock

Chroma provides a convenient wrapper around Amazon Bedrock's embedding API. This embedding function runs remotely on Amazon Bedrock's servers, and requires AWS credentials configured via boto3.

{% Tabs %}

{% Tab label="python" %}

This embedding function relies on the `boto3` python package, which you can install with `pip install boto3`.

```python
import boto3
from chromadb.utils.embedding_functions import AmazonBedrockEmbeddingFunction

session = boto3.Session(profile_name="profile", region_name="us-east-1")
bedrock_ef = AmazonBedrockEmbeddingFunction(
session=session,
model_name="amazon.titan-embed-text-v1"
)

texts = ["Hello, world!", "How are you?"]
embeddings = bedrock_ef(texts)
```

You can pass in an optional `model_name` argument, which lets you choose which Amazon Bedrock embedding model to use. By default, Chroma uses `amazon.titan-embed-text-v1`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Documentation]

In several of the new documentation files, the example code explicitly sets a parameter to its default value, while the following text describes it as optional. This could be slightly confusing for users, as they might think the parameter is required. To improve clarity, consider rephrasing the explanation to acknowledge that the example shows the default being set explicitly, or remove the parameter from the example to demonstrate that it's optional.

For example, you could change this line to something like:

The model_name argument is optional and defaults to "amazon.titan-embed-text-v1". The example above shows how to set it explicitly, but it can be omitted to use the default.

This pattern also appears in:

  • docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/chroma-cloud-splade.md (line 31)
  • docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/text2vec.md (line 27)
Context for Agents
[**Documentation**]

In several of the new documentation files, the example code explicitly sets a parameter to its default value, while the following text describes it as optional. This could be slightly confusing for users, as they might think the parameter is required. To improve clarity, consider rephrasing the explanation to acknowledge that the example shows the default being set explicitly, or remove the parameter from the example to demonstrate that it's optional.

For example, you could change this line to something like:
> The `model_name` argument is optional and defaults to `"amazon.titan-embed-text-v1"`. The example above shows how to set it explicitly, but it can be omitted to use the default.

This pattern also appears in:
- `docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/chroma-cloud-splade.md` (line 31)
- `docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/text2vec.md` (line 27)

File: docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/amazon-bedrock.md
Line: 30


{% /Tab %}

{% /Tabs %}

{% Banner type="tip" %}
Visit Amazon Bedrock [documentation](https://docs.aws.amazon.com/bedrock/) for more information on available models and configuration.
{% /Banner %}
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
id: chroma-bm25
name: Chroma BM25
---

# Chroma BM25

Chroma provides a built-in BM25 sparse embedding function. BM25 (Best Matching 25) is a ranking function used to estimate the relevance of documents to a given search query. This embedding function runs locally and does not require any external API keys.

Sparse embeddings are useful for retrieval tasks where you want to match on specific keywords or terms, rather than semantic similarity.

{% Tabs %}

{% Tab label="python" %}

```python
from chromadb.utils.embedding_functions import ChromaBm25EmbeddingFunction

bm25_ef = ChromaBm25EmbeddingFunction(
k=1.2,
b=0.75,
avg_doc_length=256.0,
token_max_length=40
)

texts = ["Hello, world!", "How are you?"]
sparse_embeddings = bm25_ef(texts)
```

You can customize the BM25 parameters:
- `k`: Controls term frequency saturation (default: 1.2)
- `b`: Controls document length normalization (default: 0.75)
- `avg_doc_length`: Average document length in tokens (default: 256.0)
- `token_max_length`: Maximum token length (default: 40)
- `stopwords`: Optional list of stopwords to exclude

{% /Tab %}

{% Tab label="typescript" %}

```typescript
// npm install @chroma-core/chroma-bm25

import { ChromaBm25EmbeddingFunction } from "@chroma-core/chroma-bm25";

const embedder = new ChromaBm25EmbeddingFunction({
k: 1.2,
b: 0.75,
avgDocLength: 256.0,
tokenMaxLength: 40,
});

// use directly
const sparseEmbeddings = await embedder.generate(["document1", "document2"]);

// pass documents to query for .add and .query
const collection = await client.createCollection({
name: "name",
embeddingFunction: embedder,
});
Comment on lines +42 to +60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Documentation]

The TypeScript code snippet uses a client variable without defining it, which can be confusing for users. To make the example self-contained and runnable, it's best to include the client initialization. Additionally, using a more descriptive collection name like "my_collection" instead of "name" would make the example clearer.

Context for Agents
[**Documentation**]

The TypeScript code snippet uses a `client` variable without defining it, which can be confusing for users. To make the example self-contained and runnable, it's best to include the client initialization. Additionally, using a more descriptive collection name like `"my_collection"` instead of `"name"` would make the example clearer.

File: docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/chroma-bm25.md
Line: 60

```

{% /Tab %}

{% /Tabs %}

{% Banner type="tip" %}
BM25 is a classic information retrieval algorithm that works well for keyword-based search. For semantic search, consider using dense embedding functions instead.
{% /Banner %}
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
id: chroma-cloud-qwen
name: Chroma Cloud Qwen
---

# Chroma Cloud Qwen

Chroma provides a convenient wrapper around Chroma Cloud's Qwen embedding API. This embedding function runs remotely on Chroma Cloud's servers, and requires a Chroma API key. You can get an API key by signing up for an account at [Chroma Cloud](https://www.trychroma.com/).

{% Tabs %}

{% Tab label="python" %}

This embedding function relies on the `httpx` python package, which you can install with `pip install httpx`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Documentation]

Fix capitalization: 'python' should be 'Python'.

Context for Agents
[**Documentation**]

Fix capitalization: 'python' should be 'Python'.

File: docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/chroma-cloud-qwen.md
Line: 14


```python
from chromadb.utils.embedding_functions import ChromaCloudQwenEmbeddingFunction, ChromaCloudQwenEmbeddingModel
import os

os.environ["CHROMA_API_KEY"] = "YOUR_API_KEY"
qwen_ef = ChromaCloudQwenEmbeddingFunction(
model=ChromaCloudQwenEmbeddingModel.QWEN3_EMBEDDING_0p6B,
task="nl_to_code"
)

texts = ["Hello, world!", "How are you?"]
embeddings = qwen_ef(texts)
```

You must pass in a `model` argument and `task` argument. The `task` parameter specifies the task for which embeddings are being generated. You can optionally provide custom `instructions` for both documents and queries.

{% /Tab %}

{% Tab label="typescript" %}

```typescript
// npm install @chroma-core/chroma-cloud-qwen

import { ChromaCloudQwenEmbeddingFunction, ChromaCloudQwenEmbeddingModel } from "@chroma-core/chroma-cloud-qwen";

const embedder = new ChromaCloudQwenEmbeddingFunction({
apiKeyEnvVar: "CHROMA_API_KEY", // Or set CHROMA_API_KEY env var
model: ChromaCloudQwenEmbeddingModel.QWEN3_EMBEDDING_0p6B,
task: "nl_to_code",
});

// use directly
const embeddings = await embedder.generate(["document1", "document2"]);

// pass documents to query for .add and .query
const collection = await client.createCollection({
name: "name",
embeddingFunction: embedder,
});
```

{% /Tab %}

{% /Tabs %}

{% Banner type="tip" %}
Visit Chroma Cloud [documentation](https://docs.trychroma.com/) for more information on available models and configuration.
{% /Banner %}
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
id: chroma-cloud-splade
name: Chroma Cloud Splade
---

# Chroma Cloud Splade

Chroma provides a convenient wrapper around Chroma Cloud's Splade sparse embedding API. This embedding function runs remotely on Chroma Cloud's servers, and requires a Chroma API key. You can get an API key by signing up for an account at [Chroma Cloud](https://www.trychroma.com/).

Sparse embeddings are useful for retrieval tasks where you want to match on specific keywords or terms, rather than semantic similarity.

{% Tabs %}

{% Tab label="python" %}

This embedding function relies on the `httpx` python package, which you can install with `pip install httpx`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Documentation]

Fix capitalization: 'python' should be 'Python'.

Context for Agents
[**Documentation**]

Fix capitalization: 'python' should be 'Python'.

File: docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/chroma-cloud-splade.md
Line: 16


```python
from chromadb.utils.embedding_functions import ChromaCloudSpladeEmbeddingFunction, ChromaCloudSpladeEmbeddingModel
import os

os.environ["CHROMA_API_KEY"] = "YOUR_API_KEY"
splade_ef = ChromaCloudSpladeEmbeddingFunction(
model=ChromaCloudSpladeEmbeddingModel.SPLADE_PP_EN_V1
)

texts = ["Hello, world!", "How are you?"]
sparse_embeddings = splade_ef(texts)
```

You can optionally pass in a `model` argument. By default, Chroma uses `prithivida/Splade_PP_en_v1`.

{% /Tab %}

{% Tab label="typescript" %}

```typescript
// npm install @chroma-core/chroma-cloud-splade

import { ChromaCloudSpladeEmbeddingFunction, ChromaCloudSpladeEmbeddingModel } from "@chroma-core/chroma-cloud-splade";

const embedder = new ChromaCloudSpladeEmbeddingFunction({
apiKeyEnvVar: "CHROMA_API_KEY", // Or set CHROMA_API_KEY env var
model: ChromaCloudSpladeEmbeddingModel.SPLADE_PP_EN_V1,
});

// use directly
const sparseEmbeddings = await embedder.generate(["document1", "document2"]);

// pass documents to query for .add and .query
const collection = await client.createCollection({
name: "name",
embeddingFunction: embedder,
});
```

{% /Tab %}

{% /Tabs %}

{% Banner type="tip" %}
Visit Chroma Cloud [documentation](https://docs.trychroma.com/) for more information on available models and configuration.
{% /Banner %}
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
id: nomic
name: Nomic
---

# Nomic

Chroma provides a convenient wrapper around Nomic's embedding API. This embedding function runs remotely on Nomic's servers, and requires an API key. You can get an API key by signing up for an account at [Nomic](https://atlas.nomic.ai/).

{% Tabs %}

{% Tab label="python" %}

This embedding function relies on the `nomic` python package, which you can install with `pip install nomic`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Documentation]

Fix capitalization: 'python' should be 'Python'.

Context for Agents
[**Documentation**]

Fix capitalization: 'python' should be 'Python'.

File: docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/nomic.md
Line: 14


```python
from chromadb.utils.embedding_functions import NomicEmbeddingFunction
import os

os.environ["NOMIC_API_KEY"] = "YOUR_API_KEY"
nomic_ef = NomicEmbeddingFunction(
model="nomic-embed-text-v1",
task_type="search_document",
query_config={"task_type": "search_query"}
)

texts = ["Hello, world!", "How are you?"]
embeddings = nomic_ef(texts)
```

You must pass in a `model` argument and `task_type` argument. The `task_type` can be one of:
- `search_document`: Used to encode large documents in retrieval tasks at indexing time
- `search_query`: Used to encode user queries or questions in retrieval tasks
- `classification`: Used to encode text for text classification tasks
- `clustering`: Used for clustering or reranking tasks

The `query_config` parameter allows you to specify a different task type for queries, which is useful when you want to use `search_document` for documents and `search_query` for queries.

{% /Tab %}

{% /Tabs %}

{% Banner type="tip" %}
Visit Nomic [documentation](https://docs.nomic.ai/platform/embeddings-and-retrieval/text-embedding) for more information on available models and task types.
{% /Banner %}
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
id: open-clip
name: OpenCLIP
---

# OpenCLIP

Chroma provides a convenient wrapper around the OpenCLIP library. This embedding function runs locally and supports both text and image embeddings, making it useful for multimodal applications.

{% Tabs %}

{% Tab label="python" %}

This embedding function relies on several python packages:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Documentation]

Fix capitalization: 'python' should be 'Python'.

Context for Agents
[**Documentation**]

Fix capitalization: 'python' should be 'Python'.

File: docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/open-clip.md
Line: 14

- `open-clip-torch`: Install with `pip install open-clip-torch`
- `torch`: Install with `pip install torch`
- `pillow`: Install with `pip install pillow`

```python
from chromadb.utils.embedding_functions import OpenCLIPEmbeddingFunction
import numpy as np
from PIL import Image

open_clip_ef = OpenCLIPEmbeddingFunction(
model_name="ViT-B-32",
checkpoint="laion2b_s34b_b79k",
device="cpu"
)

# For text embeddings
texts = ["Hello, world!", "How are you?"]
text_embeddings = open_clip_ef(texts)

# For image embeddings
images = [np.array(Image.open("image1.jpg")), np.array(Image.open("image2.jpg"))]
image_embeddings = open_clip_ef(images)

# Mixed embeddings
mixed = ["Hello, world!", np.array(Image.open("image1.jpg"))]
mixed_embeddings = open_clip_ef(mixed)
```

You can pass in optional arguments:
- `model_name`: The name of the OpenCLIP model to use (default: "ViT-B-32")
- `checkpoint`: The checkpoint to use for the model (default: "laion2b_s34b_b79k")
- `device`: Device used for computation, "cpu" or "cuda" (default: "cpu")

{% /Tab %}

{% /Tabs %}

{% Banner type="tip" %}
OpenCLIP is great for multimodal applications where you need to embed both text and images in the same embedding space. Visit [OpenCLIP documentation](https://github.com/mlfoundations/open_clip) for more information on available models and checkpoints.
{% /Banner %}
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
id: sentence-transformer
name: Sentence Transformer
---

# Sentence Transformer

Chroma provides a convenient wrapper around the Sentence Transformers library. This embedding function runs locally and uses pre-trained models from Hugging Face.

{% Tabs %}

{% Tab label="python" %}

This embedding function relies on the `sentence_transformers` python package, which you can install with `pip install sentence_transformers`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Documentation]

Fix capitalization: 'python' should be 'Python'.

Context for Agents
[**Documentation**]

Fix capitalization: 'python' should be 'Python'.

File: docs/docs.trychroma.com/markdoc/content/integrations/embedding-models/sentence-transformer.md
Line: 14


```python
from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction

sentence_transformer_ef = SentenceTransformerEmbeddingFunction(
model_name="all-MiniLM-L6-v2",
device="cpu",
normalize_embeddings=False
)

texts = ["Hello, world!", "How are you?"]
embeddings = sentence_transformer_ef(texts)
```

You can pass in optional arguments:
- `model_name`: The name of the Sentence Transformer model to use (default: "all-MiniLM-L6-v2")
- `device`: Device used for computation, "cpu" or "cuda" (default: "cpu")
- `normalize_embeddings`: Whether to normalize returned vectors (default: False)

For a full list of available models, visit [Hugging Face Sentence Transformers](https://huggingface.co/sentence-transformers) or [SBERT documentation](https://www.sbert.net/docs/pretrained_models.html).

{% /Tab %}

{% /Tabs %}

{% Banner type="tip" %}
Sentence Transformers are great for semantic search tasks. Popular models include `all-MiniLM-L6-v2` (fast and efficient) and `all-mpnet-base-v2` (higher quality). Visit [SBERT documentation](https://www.sbert.net/docs/pretrained_models.html) for more model recommendations.
{% /Banner %}
Loading
Loading