milvus-io
diff --git a/‎assets/full-text-search.png‎
17.6 KB b/‎assets/full-text-search.png‎
17.6 KB
diff --git a/‎site/en/userGuide/search-query-get/full-text-search.md‎
Lines changed: 35 additions & 26 deletions b/‎site/en/userGuide/search-query-get/full-text-search.md‎
Lines changed: 35 additions & 26 deletions
@@ -16,43 +16,45 @@ By integrating full text search with semantic-based dense vector search, you can
 
 </div>
 
-## Overview
+## BM25 implementation
 
-Full text search simplifies the process of text-based searching by eliminating the need for manual embedding. This feature operates through the following workflow:
+Milvus provides full text search powered by the BM25 relevance algorithm, a widely adopted scoring function in information retrieval systems, and Milvus integrates it into the search workflow to deliver accurate, relevance-ranked text results.
 
-1. **Text input**: You insert raw text documents or provide query text without any need for manual embedding.
+Full text search in Milvus follows the workflow below:
 
-1. **Text analysis**: Milvus uses an [analyzer](analyzer-overview.md) to tokenize input text into individual, searchable terms.
+1. **Raw text input**: You insert text documents or provide a query using plain text, no embedding models required.
 
-1. **Function processing**: The built-in function receives tokenized terms and converts them into sparse vector representations.
+1. **Text analysis**: Milvus uses an [analyzer](analyzer-overview.md) to process your text into meaningful terms that can be indexed and searched.
 
-1. **Collection store**: Milvus stores these sparse embeddings in a collection for efficient retrieval.
+1. **BM25 function processing**: A built-in function transforms these terms into sparse vector representations optimized for BM25 scoring.
 
-1. **BM25 scoring**: During a search, Milvus applies the BM25 algorithm to calculate scores for the stored documents and ranks matched results based on relevance to the query text.
+1. **Collection store**: Milvus stores the resulting sparse embeddings in a collection for fast retrieval and ranking.
+
+1. **BM25 relevance scoring**: At search time, Milvus applies the BM25 scoring function to compute document relevance and return ranked results that best match the query terms.
 
 ![Full Text Search](../../../../assets/full-text-search.png)
 
 To use full text search, follow these main steps:
 
-1. [Create a collection](full-text-search.md#Create-a-collection-for-full-text-search): Set up a collection with necessary fields and define a function to convert raw text into sparse embeddings.
+1. [Create a collection](full-text-search.md#Create-a-collection-for-BM25-full-text-search): Set up the required fields and define a BM25 function that converts raw text into sparse embeddings.
 
 1. [Insert data](full-text-search.md#Insert-text-data): Ingest your raw text documents to the collection.
 
-1. [Perform searches](full-text-search.md#Perform-full-text-search): Use query texts to search through your collection and retrieve relevant results.
+1. [Perform searches](full-text-search.md#Perform-full-text-search): Use natural-language query text to retrieve ranked results based on BM25 relevance.
 
-## Create a collection for full text search
+## Create a collection for BM25 full text search
 
-To enable full text search, create a collection with a specific schema. This schema must include three necessary fields:
+To enable BM25-powered full text search, you must prepare a collection with the required fields, define a BM25 function to generate sparse vectors, configure an index, and then create the collection.
 
-- The primary field that uniquely identifies each entity in a collection.
+### Define schema fields
 
-- A `VARCHAR` field that stores raw text documents, with the `enable_analyzer` attribute set to `True`. This allows Milvus to tokenize text into specific terms for function processing.
+Your collection schema must include at least three required fields:
 
-- A `SPARSE_FLOAT_VECTOR` field reserved to store sparse embeddings that Milvus will automatically generate for the `VARCHAR` field.
+- **Primary field**: Uniquely identifies each entity in the collection.
 
-### Define the collection schema
+- **Text field** (`VARCHAR`): Stores raw text documents. Must set `enable_analyzer=True` so Milvus can process the text for BM25 relevance ranking. By default, Milvus uses the [`standard`](standard-analyzer.md)[ analyzer](standard-analyzer.md) for text analysis. To configure a different analyzer, refer to [Analyzer Overview](analyzer-overview.md).
 
-First, create the schema and add the necessary fields:
+- **Sparse vector field** (`SPARSE_FLOAT_VECTOR`): Stores sparse embeddings automatically generated by the BM25 function.
 
 <div class="multipleCode">
     <a href="#python">Python</a>
@@ -72,9 +74,11 @@ client = MilvusClient(
 
 schema = client.create_schema()
 
-schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True, auto_id=True)
-schema.add_field(field_name="text", datatype=DataType.VARCHAR, max_length=1000, enable_analyzer=True)
-schema.add_field(field_name="sparse", datatype=DataType.SPARSE_FLOAT_VECTOR)
+schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True, auto_id=True) # Primary field
+# highlight-start
+schema.add_field(field_name="text", datatype=DataType.VARCHAR, max_length=1000, enable_analyzer=True) # Text field
+schema.add_field(field_name="sparse", datatype=DataType.SPARSE_FLOAT_VECTOR) # Sparse vector field; no dim required for sparse vectors
+# highlight-end
 ```
 
 ```java
@@ -197,15 +201,19 @@ export schema='{
     }'
 ```
 
-In this configuration,
+In the preceding config,
 
 - `id`: serves as the primary key and is automatically generated with `auto_id=True`.
 
-- `text`: stores your raw text data for full text search operations. The data type must be `VARCHAR`, as `VARCHAR` is Milvus string data type for text storage. Set `enable_analyzer=True` to allow Milvus to tokenize the text. By default, Milvus uses the `standard`[ analyzer](standard-analyzer.md) for text analysis. To configure a different analyzer, refer to [Analyzer Overview](analyzer-overview.md).
+- `text`: stores your raw text data for full text search operations. The data type must be `VARCHAR`, as `VARCHAR` is Milvus string data type for text storage.
 
 - `sparse`: a vector field reserved to store internally generated sparse embeddings for full text search operations. The data type must be `SPARSE_FLOAT_VECTOR`.
 
-Now, define a function that will convert your text into sparse vector representations and then add it to the schema:
+### Define the BM25 function
+
+The BM25 function converts tokenized text into sparse vectors that support BM25 scoring.
+
+Define the function and add it to your schema:
 
 <div class="multipleCode">
     <a href="#python">Python</a>
@@ -220,6 +228,7 @@ bm25_function = Function(
     name="text_bm25_emb", # Function name
     input_field_names=["text"], # Name of the VARCHAR field containing raw text data
     output_field_names=["sparse"], # Name of the SPARSE_FLOAT_VECTOR field reserved to store generated embeddings
+    # highlight-next-line
     function_type=FunctionType.BM25, # Set to `BM25`
 )
 
@@ -304,7 +313,7 @@ export schema='{
    </tr>
    <tr>
      <td><p><code>name</code></p></td>
-     <td><p>The name of the function. This function converts your raw text from the <code>text</code> field into searchable vectors that will be stored in the <code>sparse</code> field.</p></td>
+     <td><p>The name of the function. This function converts your raw text from the <code>text</code> field into BM25-compatible sparse vectors that will be stored in the <code>sparse</code> field.</p></td>
    </tr>
    <tr>
      <td><p><code>input_field_names</code></p></td>
@@ -316,19 +325,19 @@ export schema='{
    </tr>
    <tr>
      <td><p><code>function_type</code></p></td>
-     <td><p>The type of the function to use. Set the value to <code>FunctionType.BM25</code>.</p></td>
+     <td><p>The type of the function to use. Must be <code>FunctionType.BM25</code>.</p></td>
    </tr>
 </table>
 
 <div class="alert note">
 
-For collections with multiple `VARCHAR` fields requiring text-to-sparse-vector conversion, add separate functions to the collection schema, ensuring each function has a unique name and `output_field_names` value.
+If multiple `VARCHAR` fields require BM25 processing, define **one BM25 function per field**, each with a unique name and output field.
 
 </div>
 
 ### Configure the index
 
-After defining the schema with necessary fields and the built-in function, set up the index for your collection. To simplify this process, use `AUTOINDEX` as the `index_type`, an option that allows Milvus to choose and configure the most suitable index type based on the structure of your data.
+After defining the schema with necessary fields and the built-in function, set up the index for your collection.
 
 <div class="multipleCode">
     <a href="#python">Python</a>