I have an API that returns most_similar_approx from a magnitude model. The model is built from native Word2Vec format with 50 dimensions and 50 trees. The magnitude model is close to 350MB, with approximately 350000 tokens.
Load testing this API I observed that the performance deteriorates as I increase the topn value for most_similar_approx, I need a high number of similar tokens for downstream activities,
with topn=150 I get a throughput of 500 transactions per second on the API,
while gradually reducing it I get 800 transactions with topn=50 and and ~1300 with topn=10.
The server instance is not under any memory/cpu load, am using a c5.xlarge AWS EC2 instance.
Is there anyway I can tune the model to improve the performance for a high topn value?