Skip to content

[RFC] Tracking Search Pipeline Execution #16705

@junweid62

Description

@junweid62

Is your feature request related to a problem? Please describe

With the expansion of search pipeline processors, tracking data transformations and understanding data flow through complex processors is becoming challenging. The introduction of ML inference processors, which can manipulate model inputs and outputs, increases the need for a tool to visualize and debug the flow of data across these processors. Such functionality would aid in troubleshooting, optimizing pipeline configurations, and provide transparency for end-to-end transformations of search requests and responses.

As search pipeline processors grow in complexity, there is an increasing need to: Related Issue

  1. Track how data flows and transforms through each processor.
  2. Debug data transformations and pinpoint any failures within the pipeline.
  3. View the end-to-end pipeline execution for both the request and response sides of a search.

This capability would also be valuable for frontend plugins like the Flow Framework, helping users configure and test complex ingest and search pipelines.

Describe the solution you'd like

Adding verbose Parameter to Search Request [Preferred]

Overview

In this approach, the verbose_pipeline parameter is introduced as a query parameter in the search request URL. When used in conjunction with the search_pipeline parameter, it activates a debugging mode, allowing detailed tracking of search pipeline processor execution without requiring a new API or changes to the Explain API.
searchRequestflow drawio


Pros

  1. Minimal Changes to Existing Workflow:

    • No need for a new API endpoint; the debugging functionality is seamlessly integrated into the existing search request.
  2. Backward Compatibility:

    • The verbose parameter is optional and defaults to false. Existing search requests remain unaffected unless explicitly updated to include verbose=true.
  3. Alignment with OpenSearch Design:

    • Consistent with the design of existing search features, such as the profile query parameter.

Cons

  1. Performance Impact:

    • Activating verbose mode may slightly increase computational load due to additional processor-level logging, primarily for debugging purposes. By integrating with the existing search backpressure mechanism, the system can dynamically manage resource usage, ensuring stability while allowing detailed debugging during low-load periods.

Example Request

GET /my_index/_search?search_pipeline=my_debug_pipeline&verbose_pipeline=true

Example Response

{
  "took": 15,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 50,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "my_index",
        "_id": "1",
        "_score": 1.0,
        "_source": { "field": "value" }
      }
    ]
  },
  "processor_result": [
    {
      "processor": "filter_query",
      "status": "success",
      "execution_time": 3,
      "input": { "query": { "match_all": {} } },
      "output": { "query": { "filtered_query": { "match_all": {} } } }
    },
    {
      "processor": "collapse",
      "status": "success",
      "execution_time": 5,
      "input": { "hits": [...] },
      "output": { "collapsed_hits": [...] }
    }
  ]
}

Common Fields for All Processors

Each processor, regardless of type, will include the following common fields:

  • processor: The name or type of the processor (e.g., filter_query, collapse).
  • status: Indicates whether the processor completed successfully (success) or encountered an error (failure).
  • execution_time: The time taken by the processor to execute, in milliseconds.
  • input: The input data provided to the processor. The structure of this field varies depending on the processor type.
  • output: The transformed data output by the processor. The structure of this field varies depending on the processor type.

Request Processor Fields

For processors that handle the incoming search request:

  • input: The original search request before processing (e.g., the query, filters, and other parameters).
  • output: The modified search request after this processor has applied its transformations.

Example:

{
  "processor": "filter_query",
  "status": "success",
  "execution_time": 3,
  "input": { "query": { "match_all": {} } },
  "output": { "query": { "filtered_query": { "match_all": {} } } }
}

Search Phase Result Processor Fields

For processors that handle intermediate results during the search phase:

  • input: The set of search hits or results passed into this processor.
  • output: The modified or filtered set of search hits after the processor has completed its operation.
{
  "processor":"normalization-processor"
  "status": "success",
  "execution_time": 5,
  "input": {
    "hits": [
      { "_index": "my_index", "_id": "1", "_score": 1.0, "_source": { "field": "value1" } },
      { "_index": "my_index", "_id": "2", "_score": 0.9, "_source": { "field": "value2" } }
    ]
  },
  "output": {
    "hits": [
      { "_index": "my_index", "_id": "1", "_score": 1.0, "_source": { "field": "value1" } }
    ]
  }
}

Response Processor Fields

For processors that handle the final search response:

  • input: The raw search response from the previous phase or processor.
  • output: The final transformed response to be returned to the client.
{
  "processor": "Rerank",
  "status": "success",
  "execution_time": 4,
  "input": { "hits": [ ... ] },
  "output": { "hits": [ ... ] }
}

Verbose Mode Support Across Search Pipeline Configurations

The verbose mode is designed to seamlessly integrate with all ways of using a search pipeline, ensuring consistent debugging capabilities regardless of the method chosen. Below is an overview of how verbose mode supports different search pipeline configurations:

  1. Default Search Pipeline
PUT /my_index/_settings
{
  "index.search.default_pipeline": "my_pipeline"
}

GET /my_index/_search?verbose_pipeline=true
  1. Specified Search Pipeline by ID
GET /my_index/_search?search_pipeline=my_pipeline&verbose_pipeline=true
  1. Ad-Hoc (Temporary) Search Pipeline
POST /my_index/_search?verbose_pipeline=true
{
  "query": {
    "match": { "text_field": "some search text" }
  },
  "search_pipeline": {
    "request_processors": [
      {
        "filter_query": {
          "query": { "term": { "visibility": "public" } }
        }
      }
    ],
    "response_processors": [
      {
        "collapse": {
          "field": "category"
        }
      }
    ]
  }
}

Related component

Search

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCIssues requesting major changesSearchSearch query, autocomplete ...etcenhancementEnhancement or improvement to existing feature or requestv2.19.0Issues and PRs related to version 2.19.0

    Type

    No type

    Projects

    Status

    ✅ Done

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions