Add ingest-attachment support for per document indexed_chars limit#31352
Merged
dadoonet merged 3 commits intoelastic:6.xfrom Jun 16, 2018
Merged
Add ingest-attachment support for per document indexed_chars limit#31352dadoonet merged 3 commits intoelastic:6.xfrom
indexed_chars limit#31352dadoonet merged 3 commits intoelastic:6.xfrom
Conversation
We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.
We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.
Which allows running:
```
PUT _ingest/pipeline/attachment
{
"description" : "Extract attachment information. Used to parse pdf and office files",
"processors" : [
{
"attachment" : {
"field" : "data",
"indexed_chars_field" : "size"
}
}
]
}
```
Then index either:
```
PUT index/doc/1?pipeline=attachment
{
"data": "BASE64"
}
```
Which will use the default value (or the one defined by `indexed_chars`)
Or
```
PUT index/doc/2?pipeline=attachment
{
"data": "BASE64",
"size": 1000
}
```
Backport of elastic#28977 in 6.x branch (6.4.0)
Collaborator
|
Pinging @elastic/es-core-infra |
Member
|
@dadoonet The CI build failed because if a checkstyle violation: |
dnhatn
added a commit
that referenced
this pull request
Jun 19, 2018
* 6.x: Add get stored script and delete stored script to high level REST API Increasing skip version for failing test on 6.x Skip get_alias tests for 5.x (#31397) Fix defaults in GeoShapeFieldMapper output (#31302) Test: better error message on failure Mute DefaultShardsIT#testDefaultShards test Fix reference to XContentBuilder.string() (#31337) [DOCS] Adds monitoring breaking change (#31369) [DOCS] Adds security breaking change (#31375) [DOCS] Backports breaking change (#31373) RestAPI: Reject forcemerge requests with a body (#30792) Docs: Use the default distribution to test docs (#31251) Use system context for cluster state update tasks (#31241) [DOCS] Adds testing for security APIs (#31345) [DOCS] Removes ML item from release highlights [DOCS] Removes breaking change (#31376) REST high-level client: add validate query API (#31077) Move language analyzers from server to analysis-common module. (#31300) Expose lucene's RemoveDuplicatesTokenFilter (#31275) [Test] Fix :example-plugins:rest-handler on Windows Delete typos in SAML docs (#31199) Ensure we don't use a remote profile if cluster name matches (#31331) Test: Skip alias tests that failed all weekend [DOCS] Fix version in SQL JDBC Maven template [DOCS] Improve install and setup section for SQL JDBC Add ingest-attachment support for per document `indexed_chars` limit (#31352) SQL: Fix rest endpoint names in node stats (#31371) [DOCS] Fixes small issue in release notes Support for remote path in reindex api Closes #22913 [ML] Put ML filter API response should contain the filter (#31362) Remove trial status info from start trial doc (#31365) [DOCS] Added links in breaking changes pages [DOCS] Adds links to release notes and highlights Docs: Document changes in rest client QA: Fix tribe tests to use node selector REST Client: NodeSelector for node attributes (#31296) LLClient: Fix assertion on windows LLClient: Support host selection (#30523) Add QA project and fixture based test for discovery-ec2 plugin (#31107) [ML] Hold ML filter items in sorted set (#31338) [ML] Add description to ML filters (#31330)
89 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note: this is a backport of #28977 which does not require a review.
As I forgot to backport it, I just like to double check that CI will be happy.
I'll update the PR if CI is unhappy and then will ask for a review.
We today support a global
indexed_charsprocessor parameter. But in some cases, users would like to set this limit depending on the document itself.It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.
We add an option which reads this limit value from the document itself
by adding a setting named
indexed_chars_field.Which allows running:
Then index either:
Which will use the default value (or the one defined by
indexed_chars)Or
Backport of #28977 in 6.x branch (6.4.0)