Skip to content

[BUG] Using synonym filter after hunspell. #16530

@aswad1

Description

@aswad1

Describe the bug

When using synonym filter after hunspell. I don't see the expected plural synonyms in the output. In the configuration below, I have added synonyms:

  • stationary
  • stationery
  • stationaries
  • stationeries
PUT /test-index3
{
  "settings": {
    "analysis": {
      "filter": {
        "custom_synonym_graph-replacement_filter": {
          "type": "synonym_graph",
          "synonyms": [
            "stationary, stationery, stationaries, stationeries"
          ]
        },
        "custom_hunspell_stemmer": {
          "type": "hunspell",
          "locale": "en_US"
        }
      },
      "analyzer": {
        "test_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "custom_hunspell_stemmer",
            "custom_synonym_graph-replacement_filter"
          ]
        }
      }
    }
  }

While testing, I don't see stationaries and stationeries in the output.

POST /test-index3/_analyze
{
  "analyzer": "test_analyzer",
  "text": "stationary"
}

--
{
  "tokens": [
    {
      "token": "stationery",
      "start_offset": 0,
      "end_offset": 10,
      "type": "SYNONYM",
      "position": 0
    },
    {
      "token": "stationary",
      "start_offset": 0,
      "end_offset": 10,
      "type": "SYNONYM",
      "position": 0
    },
    {
      "token": "stationary",
      "start_offset": 0,
      "end_offset": 10,
      "type": "word",
      "position": 0
    }
  ]
}

Here is the details analysis from Opensearch:

POST /test-index3/_analyze
{
  "analyzer": "test_analyzer",
  "text": "stationary",
   "explain": true
}

------------------
{
  "detail": {
    "custom_analyzer": true,
    "charfilters": [],
    "tokenizer": {
      "name": "whitespace",
      "tokens": [
        {
          "token": "stationary",
          "start_offset": 0,
          "end_offset": 10,
          "type": "word",
          "position": 0,
          "bytes": "[73 74 61 74 69 6f 6e 61 72 79]",
          "positionLength": 1,
          "termFrequency": 1
        }
      ]
    },
    "tokenfilters": [
      {
        "name": "lowercase",
        "tokens": [
          {
            "token": "stationary",
            "start_offset": 0,
            "end_offset": 10,
            "type": "word",
            "position": 0,
            "bytes": "[73 74 61 74 69 6f 6e 61 72 79]",
            "positionLength": 1,
            "termFrequency": 1
          }
        ]
      },
      {
        "name": "custom_hunspell_stemmer",
        "tokens": [
          {
            "token": "stationary",
            "start_offset": 0,
            "end_offset": 10,
            "type": "word",
            "position": 0,
            "bytes": "[73 74 61 74 69 6f 6e 61 72 79]",
            "keyword": false,
            "positionLength": 1,
            "termFrequency": 1
          }
        ]
      },
      {
        "name": "custom_synonym_graph-replacement_filter",
        "tokens": [
          {
            "token": "stationery",
            "start_offset": 0,
            "end_offset": 10,
            "type": "SYNONYM",
            "position": 0,
            "bytes": "[73 74 61 74 69 6f 6e 65 72 79]",
            "keyword": false,
            "positionLength": 1,
            "termFrequency": 1
          },
          {
            "token": "stationary",
            "start_offset": 0,
            "end_offset": 10,
            "type": "SYNONYM",
            "position": 0,
            "bytes": "[73 74 61 74 69 6f 6e 61 72 79]",
            "keyword": false,
            "positionLength": 1,
            "termFrequency": 1
          },
          {
            "token": "stationary",
            "start_offset": 0,
            "end_offset": 10,
            "type": "word",
            "position": 0,
            "bytes": "[73 74 61 74 69 6f 6e 61 72 79]",
            "keyword": false,
            "positionLength": 1,
            "termFrequency": 1
          }
        ]
      }
    ]
  }
}

The hunspell rules and dictionary files are attached.
en-US.aff.txt
en-US.dic.txt

Related component

Other

To Reproduce

N/A

Expected behavior

The screen capture for Solr analysis screenshot where the synonym graph filter is highlighted. You will see all the synonyms displayed under SGF

Solr-screenshot

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Labels

OtherbugSomething isn't workingv2.19.0Issues and PRs related to version 2.19.0

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions