[Feature Request] Make batch ingestion automatic, not a parameter on _bulk

### Is your feature request related to a problem? Please describe

A new batch method was added to `o.o.ingest.Processor` interface in #12457 that allows ingest processor to operate on multiple documents simultaneous, instead of one by one. For certain processors, this can allow for much faster and more efficient processing. However, a new [`batch_size` parameter](https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/rest/action/document/RestBulkAction.java#L100) was added to the _bulk API with a default value of 1. This means in order to benefit from batch processing of any of my ingest processors, I have to do at a minimum two this: determine how many documents to include in my _bulk request _and_ determine the optimal value to set for this batch_size parameter. I must also change all my ingestion tooling to support this new batch_size parameter and to specify it.

### Describe the solution you'd like

I want the developers of my ingestion processors to determine good defaults for how they want to handle batches of documents, and then I can see increased performance with no change to my ingestion tooling by simply updating to the latest software. I acknowledge I may still have to experiment with finding the optimal number of documents to include in each _bulk request (this is the status quo for this API and not specific to ingest processors). Also, certain ingest processors may define expert-level configuration options to further optimize if necessary, but I expect the defaults to work well most of the time and to almost always be better than the performance I saw before batching was implemented.

### Related component

Indexing

### Additional context

I believe this can be implemented as follows:

- [required] Increase the default value of `batch_size` from 1 to `Integer.MAX_VALUE`. This means that by default the entire content of my bulk request will be passed to each ingest processor. However, the [default implemention of batchExecute](https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/ingest/Processor.java#L97-L106) just operates on one document at a time, so unless my ingest processor is updated to leverage the new batchExecute method, I will see exactly the same behavior that existed previously.
- [optional] Emit a deprecation warning if `batch_size` is specified in the _bulk API
- [optional] Remove the functionality of the `batch_size` parameter in the _bulk API on main

Additional discussion exists on the original RFC starting here: https://github.com/opensearch-project/OpenSearch/issues/12457#issuecomment-2146287178

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Make batch ingestion automatic, not a parameter on _bulk #14283

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Make batch ingestion automatic, not a parameter on _bulk #14283

Description

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions