add nori_number token filter in analysis-nori#53583
Conversation
|
Pinging @elastic/es-search (:Search/Analysis) |
|
Pinging @elastic/es-docs (>docs) |
jimczi
left a comment
There was a problem hiding this comment.
Thanks for adding this filter danmuzi ! I left one comment regarding the discard_punctuation option of the tokenizer that was added to handle the number filter correctly.
|
Thanks for your review, @jimczi 👍 |
jimczi
left a comment
There was a problem hiding this comment.
I left two minor comments but the change looks good to me.
But I'm not sure it's right to include the discard_punctuation option in this PR.
Because this PR is for nori_number token filter.
I think it's ok since the discard_punctuation option was added specifically for the number token filter. Let's add both in the same pr, thanks for separating the commits though.
plugins/analysis-nori/src/test/java/org/elasticsearch/index/analysis/NoriAnalysisTests.java
Outdated
Show resolved
Hide resolved
docs/plugins/analysis-nori.asciidoc
Outdated
There was a problem hiding this comment.
Maybe add add a NOTE: to emphasize this part ?
|
Thanks Jim. |
|
@elasticmachine ok to test |
3864137 to
1a4367e
Compare
|
I'm not sure why elasticsearch-ci/2 and elasticsearch-ci/bwc and elasticsearch-ci/default-distro are failed. |
|
Because of the my rebase mistake, the previous Jenkins build history has been lost in this conversation. After the rebase, all Jenkins builds passed. |
|
Thanks for your kind reviews! @jimczi |
This change adds the `nori_number` token filter. It also adds a `discard_punctuation` option in nori_tokenizer that should be used in conjunction with the new filter.
The
KoreanNumberFilterhas included in Nori after Lucene 8.2.0. (LUCENE-8812)However, it isn't supported now in Nori Analysis plugin. (Kuromoji supports
kuromoji_number)It seems to be missing(#30397) because
KoreanNumberFilterdidn't exist at Lucene 7.4.0 that supports Nori first time.This PR is for that.