Skip to content

Conversation

@droudy
Copy link
Contributor

@droudy droudy commented Jan 29, 2017

Issue #1105

Uses average query time of 1000 random queries as opposed to only a single query. Includes a "dry run" before running queries. Also fixes a discrepancy where a comment says that the vector for "army" is being retrieved when the word is actually "science". Benchmarks were ran on a 2.4GHz 4 core i7 processor.

"Gensim: 0.007451029\n",
"Annoy: 0.002149934\n",
"\n",
"Annoy is 3.46570127269 times faster on average over 1000 random queries\n"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The focus and emphasis on such a level of precision is misleading (and unnecessary).

Also, please mention the other factors that affect this number, like index size etc. So people don't go away thinking "annoy is ~3.5x faster than gensim", whereas in reality this is anything between 1x-infinity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piskvorky Should I round to a smaller decimal place or leave the exact figure out completely?

Copy link
Owner

@piskvorky piskvorky Jan 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say round to a smaller decimal place, plus include a fat disclaimer that this number is by no means "constant" :)

It's completely incidental to this dataset, BLAS setup, Annoy parameters etc. The algos have fundamentally different complexity characteristics.

"('terrorism,', 0.6300898194313049)\n",
"('creditors', 0.6264415979385376)\n"
"('signature', 0.5921074748039246)\n",
"('\"dangerously', 0.5920691192150116)\n",
Copy link
Owner

@piskvorky piskvorky Jan 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like bad preprocessing. Any reason not to simply use utils.simple_preprocess?

@tmylk tmylk merged commit 6ece162 into piskvorky:develop Jan 29, 2017
@piskvorky
Copy link
Owner

piskvorky commented Jan 30, 2017

This doesn't look right -- I still see "dangerously in the notebook as a token, which should never happen with simple_preprocess.

EDIT: disregard, github was showing me only partial changes. Thanks for the fixes 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants