Expose the Lucene Korean analyzer module in a plugin#30397
Merged
jimczi merged 4 commits intoelastic:masterfrom May 4, 2018
Merged
Expose the Lucene Korean analyzer module in a plugin#30397jimczi merged 4 commits intoelastic:masterfrom
jimczi merged 4 commits intoelastic:masterfrom
Conversation
This change adds a new plugin called `analysis-nori` that exposes Korean text analysis in es using the new Lucene Korean analyzer module named (`nori`). The plugin adds: * a Korean analyzer: `nori` * a Korean tokenizer: `nori_tokenizer` * a part of speech stop filter: `nori_part_of_speech` * a filter that can replace Hanja characters with their Hangul transcription: `nori_readingform`
Collaborator
|
Pinging @elastic/es-search-aggs |
jpountz
approved these changes
May 4, 2018
docs/CHANGELOG.asciidoc
Outdated
| got ignored at index time because of the <<ignore-malformed,`ignore_malformed`>> | ||
| option. ({pull}30140[#29658]) | ||
|
|
||
| Plugins:: |
Contributor
There was a problem hiding this comment.
I would put it in Features rather than Plugins. I don't think the fact it is exposed via a plugin matters.
docs/plugins/analysis-nori.asciidoc
Outdated
|
|
||
| `discard`:: | ||
|
|
||
| Decompose compounds and discards the original form (*default*). Example output: |
docs/plugins/analysis-nori.asciidoc
Outdated
|
|
||
| `mixed`:: | ||
|
|
||
| Decompose compounds and keeps the original form. Example output: |
| } finally { | ||
| reader.close(); | ||
| } | ||
| } |
Contributor
There was a problem hiding this comment.
let's use try-with-resources?
jimczi
added a commit
that referenced
this pull request
May 4, 2018
This change adds a new plugin called `analysis-nori` that exposes Korean text analysis in es using the new Lucene Korean analyzer module named (`nori`). The plugin adds: * a Korean analyzer: `nori` * a Korean tokenizer: `nori_tokenizer` * a part of speech stop filter: `nori_part_of_speech` * a filter that can replace Hanja characters with their Hangul transcription: `nori_readingform`
jimczi
added a commit
that referenced
this pull request
May 4, 2018
jimczi
added a commit
that referenced
this pull request
May 4, 2018
jimczi
added a commit
that referenced
this pull request
May 4, 2018
jimczi
added a commit
that referenced
this pull request
May 4, 2018
jimczi
added a commit
that referenced
this pull request
May 5, 2018
jimczi
added a commit
that referenced
this pull request
May 5, 2018
jasontedor
added a commit
that referenced
this pull request
May 6, 2018
* master: (35 commits) DOCS: Correct mapping tags in put-template api DOCS: Fix broken link in the put index template api Add put index template api to high level rest client (#30400) Relax testAckedIndexing to allow document updating [Docs] Add snippets for POS stop tags default value Move respect accept header on no handler to 6.3.1 Respect accept header on no handler (#30383) [Test] Add analysis-nori plugin to the vagrant tests [Docs] Fix bad link [Docs] Fix end of section in the korean plugin docs Expose the Lucene Korean analyzer module in a plugin (#30397) Docs: remove transport_client from CCS role example (#30263) [Rollup] Validate timezone in range queries (#30338) Use readFully() to read bytes from CipherInputStream (#28515) Fix docs Recently merged #29229 had a doc bug that broke the doc build. This commit fixes. Test: remove cluster permission from CCS user (#30262) Add Get Settings API support to java high-level rest client (#29229) Watcher: Remove unneeded index deletion in tests Set the new lucene version for 6.4.0 [ML][TEST] Clean up jobs in ModelPlotIT ...
dnhatn
added a commit
that referenced
this pull request
May 8, 2018
* elastic-master: Watcher: Mark watcher as started only after loading watches (#30403) Pass the task to broadcast actions (#29672) Disable REST default settings testing until #29229 is back-ported Correct wording in log message (#30336) Do not fail snapshot when deleting a missing snapshotted file (#30332) AwaitsFix testCreateShrinkIndexToN DOCS: Correct mapping tags in put-template api DOCS: Fix broken link in the put index template api Add put index template api to high level rest client (#30400) Relax testAckedIndexing to allow document updating [Docs] Add snippets for POS stop tags default value Move respect accept header on no handler to 6.3.1 Respect accept header on no handler (#30383) [Test] Add analysis-nori plugin to the vagrant tests [Docs] Fix bad link [Docs] Fix end of section in the korean plugin docs Expose the Lucene Korean analyzer module in a plugin (#30397) Docs: remove transport_client from CCS role example (#30263) [Rollup] Validate timezone in range queries (#30338) Use readFully() to read bytes from CipherInputStream (#28515) Fix docs Recently merged #29229 had a doc bug that broke the doc build. This commit fixes. Test: remove cluster permission from CCS user (#30262) Add Get Settings API support to java high-level rest client (#29229) Watcher: Remove unneeded index deletion in tests
dnhatn
added a commit
that referenced
this pull request
May 8, 2018
* 6.x: Stop forking javac (#30462) Fix tribe tests Docs: Use task_id in examples of tasks (#30436) Security: Rename IndexLifecycleManager to SecurityIndexManager (#30442) Packaging: Set elasticsearch user to have non-existent homedir (#29007) [Docs] Fix typo in cardinality-aggregation.asciidoc (#30434) Avoid NPE in `more_like_this` when field has zero tokens (#30365) Build: Switch to building javadoc with html5 (#30440) Add a quick tour of the project to CONTRIBUTING (#30187) Add stricter geohash parsing (#30376) Reindex: Use request flavored methods (#30317) Silence SplitIndexIT.testSplitIndexPrimaryTerm test failure. (#30432) Auto-expand replicas when adding or removing nodes (#30423) Silence IndexUpgradeIT test failures. (#30430) Fix line length violation in cache tests Add failing test for core cache deadlock [DOCS] convert forcemerge snippet Update forcemerge.asciidoc (#30113) Added zentity to the list of API extension plugins (#29143) Fix the search request default operation behavior doc (#29302) (#29405) Watcher: Mark watcher as started only after loading watches (#30403) Correct wording in log message (#30336) Do not fail snapshot when deleting a missing snapshotted file (#30332) AwaitsFix testCreateShrinkIndexToN DOCS: Correct mapping tags in put-template api DOCS: Fix broken link in the put index template api Add put index template api to high level rest client (#30400) [Docs] Add snippets for POS stop tags default value Remove entry inadvertently picked into changelog Move respect accept header on no handler to 6.3.1 Respect accept header on no handler (#30383) [Test] Add analysis-nori plugin to the vagrant tests [Rollup] Validate timezone in range queries (#30338) [Docs] Fix bad link [Docs] Fix end of section in the korean plugin docs add the Korean nori plugin to the change logs Expose the Lucene Korean analyzer module in a plugin (#30397) Docs: remove transport_client from CCS role example (#30263) Test: remove cluster permission from CCS user (#30262) Watcher: Remove unneeded index deletion in tests fix docs branch version fix lucene snapshot version Upgrade to 7.4.0-snapshot-1ed95c097b (#30357) [ML][TEST] Clean up jobs in ModelPlotIT Watcher: Ensure trigger service pauses execution (#30363) [DOCS] Fixes ordering of changelog sections [DOCS] Commented out empty sections in the changelog to fix the doc build. (#30372) Make RepositoriesMetaData contents unmodifiable (#30361) Change signature of Get Repositories Response (#30333) 6.x Backport: Terms query validate bug (#30319) InternalEngineTests.testConcurrentOutOfOrderDocsOnReplica should use two documents (#30121) Security: reduce garbage during index resolution (#30180) Test: use trial license in qa tests with security [ML] Add integration test for model plots (#30359) SQL: Fix bug caused by empty composites (#30343) [ML] Account for gaps in data counts after job is reopened (#30294) [ML] Refactor DataStreamDiagnostics to use array (#30129) Make licensing FIPS-140 compliant (#30251) Do not load global state when deleting a snapshot (#29278) Don't load global state when only restoring indices (#29239) Tests: Use different watch ids per test in smoke test (#30331) Watcher: Make start/stop cycle more predictable and synchronous (#30118) [Docs] Add term query with normalizer example Adds Eclipse config for xpack licence headers (#30299) Fix message content in users tool (#30293) [DOCS] Removed X-Pack breaking changes page [DOCS] Added security breaking change [DOCS] Fixes link to TLS LDAP info [DOCS] Merges X-Pack release notes into changelog (#30350) [DOCS] Fixes broken links to bootstrap user (#30349) [Docs] Remove errant changelog line Fix NPE when CumulativeSum agg encounters null/empty bucket (#29641) [DOCS] Reorganizes authentication details in Stack Overview (#30280) Tests: Simplify VersionUtils released version splitting (#30322) Fix merging logic of Suggester Options (#29514) ReplicationTracker.markAllocationIdAsInSync may hang if allocation is cancelled (#30316) [DOCS] Adds LDAP realm configuration details (#30214) [DOCS] Adds native realm configuration details (#30215) Disable SSL on testing old BWC nodes (#30337) [DOCS] Enables edit links for X-Pack pages Cancelling a peer recovery on the source can leak a primary permit (#30318) SQL: Reduce number of ranges generated for comparisons (#30267) [DOCS] Adds links to changelog sections Convert server javadoc to html5 (#30279) REST Client: Add Request object flavored methods (#29623) Create default ES_TMPDIR on Windows (#30325) [Docs] Clarify `fuzzy_like_this` redirect (#30183) Fix docs of the `_ignored` meta field. Add a new `_ignored` meta field. (#29658) Move repository-azure fixture test to QA project (#30253)
89 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change adds a new plugin called
analysis-norithat exposesKorean text analysis in es using the new Lucene Korean analyzer module named (
nori).The plugin adds:
norinori_tokenizernori_part_of_speechnori_readingform