A new too slow scanning callback#1921
Merged
plusvic merged 1 commit intoVirusTotal:masterfrom May 25, 2023
Merged
Conversation
Member
|
It looks like the test cases are failing due to some heap overflow detected with https://github.com/VirusTotal/yara/actions/runs/4927239541/jobs/8803939475?pr=1921 |
Contributor
Author
|
I am sorry for the late reply. The PR should be fixed now. |
plusvic
approved these changes
May 25, 2023
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The goal was to create a deterministic way to detect potentially slow scanning due to a lower quality of rules.
The first version tested the actual speed. However, other factors, such as CPU usage, could influence this.
In this version, I was focusing more on indicators of the rules themselves.
The first indicator is where Yara is using 0-length atoms, basically testing input byte by byte. This problem is partially addressed by existing warnings about the low quality of atoms (aka famous slowing-down scanning). Still, due to the changing nature of heuristics for these calculations, it is sometimes hard to conclude this is the case.
However, I did not want to generate a callback if the size of the scanned input is relatively small; thus, the effect of the slowing is not that significant. I tested how the slow rules behave on different sizes of inputs. The slowing was more notable when the files were bigger than 0.2 MB. For that reason, I am generating a callback just for files that are larger than that.
The second indicator is the number of potential matches. If the count is higher than one million, the ERROR_TOO_MANY_MATCHES is returned. However, even the lower bound can indicate that something is wrong.
I tested some additional factors, but these two showed up as the simplest yet the most effective so far.
Example: