Skip to content

Quering taking lot of time (18 sec to 3 min) intermittently #90

@shubhamjoshi2130

Description

@shubhamjoshi2130

Quering taking lot of time (18 sec to 3 min) intermittently

I am using pymagnitude in one of my project to load and use GoogleNews-vectors-negative300.bin.

I have converted GoogleNews-vectors-negative300.bin ----> to a .magnitude file and loading the .magnitude file using Magnitude(). I use pymagnitude to generate embedding of words and then train a ANN model on those embedding.

On my local (with below mentioned details), i face no issue and

Environments:-

(local):-
Mac, 32 GB RAM,docker with centos ---- very fast less than fraction of a second

(Testing Environment):-
CentOs 16 GB Ram --- intermittent slowness, taking 18sec to 3 min for querying some words and the process timeouts.

** I am using a mount , to keep my mmap files. And assured that it is not getting wiped out.

Here are the finings of a few words on Testing Environment and on local :-

Word, Time on Testing Environment
li��n , 0.82 min
ph���m, 0.4 min
al,1.3
Time on local of above keys is very less , even less than a second.

On further investigation and profiling execution time we observed that more time is being taken in case an OOV token if found, and _db_query_similar_keys_vector function is invoked.

Sample Queries which are taking more time:-

SELECT
magnitude.*
FROM
magnitude_subword,
magnitude
WHERE
char_ngrams MATCH "\uf000al" OR "al" OR "l" OR "\uf000"
AND magnitude.rowid = magnitude_subword.rowid
ORDER BY
(
(
LENGTH(offsets(magnitude_subword)) - LENGTH(
REPLACE(offsets(magnitude_subword), ' ', '')
)
) + 1
) DESC,
magnitude.key LIKE 'a%'
AND LENGTH(magnitude.key) <= 4 DESC,
magnitude.key LIKE '%';

-- Took 3.8 min to execute

SELECT
magnitude.*
FROM
magnitude_subword,
magnitude
WHERE
char_ngrams MATCH "\uf000ch" OR "ch" OR "h" OR "n" OR "ng" OR "ng\uf000"
AND magnitude.rowid = magnitude_subword.rowid
ORDER BY
(
(
LENGTH(offsets(magnitude_subword)) - LENGTH(
REPLACE(offsets(magnitude_subword), ' ', '')
)
) + 1
) DESC,
magnitude.key LIKE 'a%'
AND LENGTH(magnitude.key) <= 4 DESC,
magnitude.key LIKE '%';
-- Took 2 min to execute

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions