-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Feature Request
Description:
Hello,
I've noticed that if matches overlap, byte-offsets are only provided for the beginning of the matched part. As a result, the number of matches obtained with --count-matches flag is larger than the number of obtained byte-offsets with the -o -b flags. I suggest the addition of a new option or modification to existing options that allows users to obtain byte-offsets for every match, even when matches overlap.
Providing all byte-offsets for overlapping matches directly would streamline workflows which require byte-offsets for all matches.
Steps to Reproduce:
Text in a.txt: "012a34"
Pattern: "\p{N}{2}"
Use the regular expression to search for matches in a.txt:
hg -e "\p{N}{2}" -b -o a.txt
The number of matches obtained with the --count-matches flag is 3. It would be nice to be able to also obtain three byte-offsets (0,1 and 4 in this example).
Thank you for considering this feature request. I appreciate your work for enabling regex pattern searches with Hyperscan.
Notice: I edited this issue since I realized the matching mechanism is working with a sliding window.
