Skip to content

Conversation

@esarp
Copy link
Collaborator

@esarp esarp commented Dec 9, 2025

Implements bytes.startswith in mypy. Potentially could be more efficient without relying on memcmp but not sure.

Tested with the following benchmark code, which shows a ~6.3x performance improvement compared to standard Python:

import time

def bench(prefix: bytes, a: list[bytes], n: int) -> int:
    i = 0
    for x in range(n):
        for b in a:
            if b.startswith(prefix):
                i += 1
    return i


a = [b"foo", b"barasdfsf", b"foobar", b"ab", b"asrtert", b"sertyeryt"]
n = 5 * 1000 * 1000
prefix = b"foo"

bench(prefix, a, n)

t0 = time.time()
bench(prefix, a, n)
td = time.time() - t0
print(f"{td}s")

Output:

$ python /tmp/bench.py
1.0015509128570557s
$ python -c 'import bench'
0.154998779296875s

@esarp esarp force-pushed the mypycBytesStartswith branch from 72f9ae5 to 2064d9e Compare December 9, 2025 18:45
@esarp esarp force-pushed the mypycBytesStartswith branch from e8c65fe to 1185d2f Compare December 9, 2025 18:49
@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2025

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

const char *self_buf = PyBytes_AS_STRING(self);
const char *subobj_buf = PyBytes_AS_STRING(subobj);

if (subobj_len == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this if check should go above the 2 PyBytes_AS_STRING lines? We can exit without those calls if the check returns true

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, updated. I split the checks around each PyBytes_GET_SIZE call a bit further to optimize for the empty-arg case. Probably won't save a ton but I don't think it makes it unreadable

# Test empty cases
assert test.startswith(b'')
assert b''.startswith(b'')
assert not b''.startswith(test)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test with bytearray 1) as the receiver object and 2) the argument. This way we will also test the slow path.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few checks to cover those as well

@JukkaL JukkaL merged commit 1cea058 into python:master Dec 11, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants