Skip to content

duplicate terms in name affecting scoring #507

@missinglink

Description

@missinglink

It seems as though having duplicate tokens in a name is causing elasticsearch to score the result higher (this is due to how the TF/IDF scoring works)

While it's impossible to have exact duplicate names (this is taken care of by pelias/model in a 'post' step), it is possible to have two terms which are very similar such as this:

Screenshot 2019-11-15 at 16 18 39

https://www.openstreetmap.org/way/432890745

I'm opening this issue so I don't forget, we can either try to solve this during import or during search.

A query such as /v1/search?text=whole foods market, NY illustrates the issue (although there may be other things at play here)

Screenshot 2019-11-15 at 16 22 28

Screenshot 2019-11-15 at 16 22 16

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions