Skip to content

Conversation

@udoprog
Copy link

@udoprog udoprog commented Aug 7, 2014

  • Avoid 'case' overhead by generating BITWISE_FUNC vs. BITARRAY_FUNC.
  • Change type of zero and one to (unsigned char) due to compiler complaining of
    overflows on '-pedantic'. Also add 'static' to help the compiler out more.

This resulted in a ~10x speedup on large bitarrays for me using the following
test.

import bitarray
import timeit

a = bitarray.bitarray(50000)
b = bitarray.bitarray(50000)

def test_and():
    global a
    a &= b

def test_or():
    global a
    a |= b

def test_xor():
    global a
    a ^= b

print timeit.timeit("test_and()", "from __main__ import test_and")
print timeit.timeit("test_or()", "from __main__ import test_or")
print timeit.timeit("test_xor()", "from __main__ import test_xor")
upstream master:
20.3912520409
20.6214001179
20.5252711773

with this patch:
2.11912703514
2.14890694618
2.1437420845

About memcpy usage in simd_v16uc_op:

I found that memcpy did the most clever thing in most cases when inspecting
compiler output.
On my system it uses movdqa to copy memory to and from xmm registers.

* Avoid 'case' overhead by generating BITWISE_FUNC vs. BITARRAY_FUNC.
* Change type of zero and one to (unsigned char) due to compiler complaining of
  overflows on '-pedantic'.

This resulted in a ~10x speedup on large bitarrays for me using the following
test.

```python
import bitarray
import timeit

a = bitarray.bitarray(50000)
b = bitarray.bitarray(50000)

def test_and():
    global a
    a &= b

def test_or():
    global a
    a |= b

def test_xor():
    global a
    a ^= b

print timeit.timeit("test_and()", "from __main__ import test_and")
print timeit.timeit("test_or()", "from __main__ import test_or")
print timeit.timeit("test_xor()", "from __main__ import test_xor")
```

```
upstream master:
20.3912520409
20.6214001179
20.5252711773

with this patch:
2.11912703514
2.14890694618
2.1437420845
```

About memcpy usage in simd_v16uc_op:

I found that memcpy did the most clever thing in most cases when inspecting
compiler output.
On my system it uses movdqa to copy memory to and from xmm registers.
@diamondman
Copy link

This looks really promising. Several operations need to be faster with bitarray, and as long as everything passes tests, I support this PR.

@andre-merzky
Copy link

Hey @udoprog,

this repo has been silent for quite a while, and we thus created a fork at https://github.com/diamondman/bitarray/.

If you are interested, please feel free to transplant your pull request to the forked repo, we would be very happy to begin the code review and merge into bitarray before pushing out a new release. If you do not have the time to do so, we would kindly ask your permission to do the PR transfer our-self. If you could ping back in the next couple of days, one way or the other, that would be great.

Many thanks!

@diamondman
Copy link

diamondman#5

@ilanschnell
Copy link
Owner

Thank you for your PR, and sorry for the long response time. I recently used uint64 integers to optimize bitwise operations, see #133, in a non-gcc specific way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants