-
Notifications
You must be signed in to change notification settings - Fork 121
Description
Quering taking lot of time (18 sec to 3 min) intermittently
I am using pymagnitude in one of my project to load and use GoogleNews-vectors-negative300.bin.
I have converted GoogleNews-vectors-negative300.bin ----> to a .magnitude file and loading the .magnitude file using Magnitude(). I use pymagnitude to generate embedding of words and then train a ANN model on those embedding.
On my local (with below mentioned details), i face no issue and
Environments:-
(local):-
Mac, 32 GB RAM,docker with centos ---- very fast less than fraction of a second
(Testing Environment):-
CentOs 16 GB Ram --- intermittent slowness, taking 18sec to 3 min for querying some words and the process timeouts.
** I am using a mount , to keep my mmap files. And assured that it is not getting wiped out.
Here are the finings of a few words on Testing Environment and on local :-
Word, Time on Testing Environment
li��n , 0.82 min
ph���m, 0.4 min
al,1.3
Time on local of above keys is very less , even less than a second.
On further investigation and profiling execution time we observed that more time is being taken in case an OOV token if found, and _db_query_similar_keys_vector function is invoked.
Sample Queries which are taking more time:-
SELECT
magnitude.*
FROM
magnitude_subword,
magnitude
WHERE
char_ngrams MATCH "\uf000al" OR "al" OR "l" OR "\uf000"
AND magnitude.rowid = magnitude_subword.rowid
ORDER BY
(
(
LENGTH(offsets(magnitude_subword)) - LENGTH(
REPLACE(offsets(magnitude_subword), ' ', '')
)
) + 1
) DESC,
magnitude.key LIKE 'a%'
AND LENGTH(magnitude.key) <= 4 DESC,
magnitude.key LIKE '%';
-- Took 3.8 min to execute
SELECT
magnitude.*
FROM
magnitude_subword,
magnitude
WHERE
char_ngrams MATCH "\uf000ch" OR "ch" OR "h" OR "n" OR "ng" OR "ng\uf000"
AND magnitude.rowid = magnitude_subword.rowid
ORDER BY
(
(
LENGTH(offsets(magnitude_subword)) - LENGTH(
REPLACE(offsets(magnitude_subword), ' ', '')
)
) + 1
) DESC,
magnitude.key LIKE 'a%'
AND LENGTH(magnitude.key) <= 4 DESC,
magnitude.key LIKE '%';
-- Took 2 min to execute