Skip to content

Providing Byte-Offsets for Every Match #46

@fabianovasi

Description

@fabianovasi

Feature Request
Description:

Hello,
I've noticed that if matches overlap, byte-offsets are only provided for the beginning of the matched part. As a result, the number of matches obtained with --count-matches flag is larger than the number of obtained byte-offsets with the -o -b flags. I suggest the addition of a new option or modification to existing options that allows users to obtain byte-offsets for every match, even when matches overlap.

Providing all byte-offsets for overlapping matches directly would streamline workflows which require byte-offsets for all matches.

Steps to Reproduce:

Text in a.txt: "012a34"
Pattern: "\p{N}{2}"
Use the regular expression to search for matches in a.txt:

hg -e "\p{N}{2}" -b -o a.txt

Result:

The number of matches obtained with the --count-matches flag is 3. It would be nice to be able to also obtain three byte-offsets (0,1 and 4 in this example).

Thank you for considering this feature request. I appreciate your work for enabling regex pattern searches with Hyperscan.

Notice: I edited this issue since I realized the matching mechanism is working with a sliding window.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions