Skip to content

Unreasonable Query Result #2684

@JiaqiLiu

Description

@JiaqiLiu

Problem description

  **The query result seems not correct. The code is self-explained. Thank you!**

Steps/code/corpus to reproduce

Include full tracebacks, logs and datasets if necessary. Please keep the examples minimal ("minimal reproducible example").

from gensim.summarization.bm25 import BM25, get_bm25_weights


text1 = "A constellation is a group of stars that are considered to form imaginary outlines or meaningful patterns on the celestial sphere."
text2 = "The 88 modern constellations are formally defined regions of the sky together covering the entire celestial sphere."
text = [text1, text2]

corpus = [text1.split(" "), text2.split(" ")]
print(f'corpus: {corpus}')

query = text2.split(" ")

bm25 = BM25(corpus)
scores = bm25.get_scores(query)
scores = [(s, i) for i, s in enumerate(scores)]
scores.sort(key=lambda t: t[0], reverse=True)
print(f'scores:         {scores}')

for s, idx in scores:
  print(f'{s}\t{idx}: {text[idx]}')

Output:

-0.3601521710456333         0: A constellation is a group of stars that are considered to form imaginary outlines or meaningful patterns on the celestial sphere.
-0.44989406787023367     1: The 88 modern constellations are formally defined regions of the sky together covering the entire celestial sphere.

Versions

Please provide the output of:

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import gensim; print("gensim", gensim.__version__)
from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)

Output:

macOS-10.14.6-x86_64-i386-64bit
Python 3.8.0 (default, Nov  6 2019, 15:49:01)
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.17.4
SciPy 1.3.3
gensim 3.8.1
FAST_VERSION 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions