Skip to content

BM25 : Incorrect scoring function #1828

@Atulgang

Description

@Atulgang

https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/summarization/bm25.py
Instead of "len(document)" it should be the length of the index document of the corpus.

 def get_score(self, document, index, average_idf): 
           # in this line it should be the length of the index document in the corpus
            score += (idf * self.f[index][word] * (PARAM_K1 + 1)
                      / (self.f[index][word] + PARAM_K1 * (1 - PARAM_B + PARAM_B * len(document) / self.avgdl)))

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIssue described a buggood first issueIssue for new contributors (not required gensim understanding + very simple)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions