-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Is your feature request related to a problem? Please describe
The indexing request today is parsed in IndexShard to generate a ParsedDocument instance containing Lucene field instances along with the _source. This is then handed off to the Engine for indexing. For custom plugins such as the tsdb plugin (#19461), we don't really need OpenSearch indexing flow to parse the document. The parsing can be handled within the plugin (MetricsEngine) for better throughput and CPU utilization.
Benchmarks show ~18% improved ingestion throughput (wps/core) and P95 CPU utilization by skipping the default parsing and using customized parsing logic that suits the metrics use case. Under extreme load, this optimization further shows 57% improvement in wps/core and 25% P95 core utilization.
This feature request is to support such custom use cases by allowing them to bypass OS default mapping/parsing framework.
Describe the solution you'd like
Introduce a new setting skip_default_document_parsing to skip the default index mapping and document parsing in indexing flow, and instead do the following in IndexShard.
operation = new Engine.Index(
new Term(IdFieldMapper.NAME, Uid.encodeId(sourceToParse.id())),
new ParsedDocument(null, null, sourceToParse.id(), null, null, sourceToParse.source(), sourceToParse.getMediaType(), null),
seqNo,
opPrimaryTerm,
version,
versionType,
origin,
System.nanoTime(),
autoGeneratedTimeStamp,
isRetry,
ifSeqNo,
ifPrimaryTerm
);
return index(engine, operation);
This creates a ParsedDocument containing only the _source and other required information. The custom engine implementations can handle document parsing by looking up _source.
Related component
Indexing
Describe alternatives you've considered
No response
Additional context
No response