Hi all! Thank you for developing this library. It is super useful in multiple projects!
I noticed something that might be an issue. But I am not sure.
I have a code where I multiply a complex valued (a) array with a real valued array (ker):
Basically, I need to multiply each element of 'a' twice.
My code is as follows:
auto func(real a1, real a2 complex ker):
// this trick halves the number of loads for ker also the reason why I use a1 and a2 instead of a
const auto low = xsimd::zip_lo(ker, ker);
const auto high= xsimd::zip_hi(ker, ker);
const auto res0 = a1 * low;
const auto res1 = a2 * high;
what I noticed is that the original implementation of reduce_add on my machine can be optimized. Is it possible to have a split function that returns low and hi? By doing split + add multiple times my code is 7 times faster.
I have pushed the benchmarks here:
https://github.com/DiamonDinoia/cpp-learning/tree/master/xsimd
it results in the following performance:
| ns/op |
op/s |
err% |
ins/op |
cyc/op |
IPC |
bra/op |
miss% |
total |
benchmark |
| 6.96 |
143,690,879.59 |
0.6% |
19.00 |
21.47 |
0.885 |
0.00 |
0.0% |
0.01 |
add+store |
| 2.31 |
432,949,727.65 |
0.6% |
24.00 |
7.11 |
3.374 |
0.00 |
0.0% |
0.01 |
hsum |
| 3.81 |
262,211,901.24 |
0.1% |
36.00 |
11.75 |
3.064 |
2.00 |
0.0% |
0.01 |
reduce_add |
| 2.59 |
385,491,672.62 |
0.2% |
20.00 |
7.99 |
2.503 |
0.00 |
0.0% |
0.01 |
union pun |
| 1.18 |
846,618,297.70 |
0.9% |
17.00 |
3.64 |
4.672 |
0.00 |
0.0% |
0.01 |
double union pun |
I tweaked master a bit in https://github.com/DiamonDinoia/xsimd/tree/hadd-tweaks
and I got:
| ns/op |
op/s |
err% |
ins/op |
cyc/op |
IPC |
bra/op |
miss% |
total |
benchmark |
| 7.00 |
142,933,991.35 |
0.9% |
19.00 |
21.50 |
0.884 |
0.00 |
0.0% |
0.01 |
add+store |
| 2.27 |
439,741,444.70 |
0.9% |
24.00 |
6.99 |
3.434 |
0.00 |
0.0% |
0.01 |
hsum |
| 2.99 |
334,267,996.40 |
1.5% |
36.00 |
9.15 |
3.935 |
2.00 |
0.0% |
0.01 |
reduce_add |
| 2.09 |
478,101,632.03 |
1.2% |
28.00 |
6.44 |
4.346 |
2.00 |
0.0% |
0.01 |
union pun |
| 1.05 |
956,625,856.43 |
1.6% |
17.00 |
3.21 |
5.289 |
0.00 |
0.0% |
0.01 |
double union pun |
Thanks,
Marco
Hi all! Thank you for developing this library. It is super useful in multiple projects!
I noticed something that might be an issue. But I am not sure.
I have a code where I multiply a complex valued (a) array with a real valued array (ker):
Basically, I need to multiply each element of 'a' twice.
My code is as follows:
what I noticed is that the original implementation of reduce_add on my machine can be optimized. Is it possible to have a split function that returns low and hi? By doing split + add multiple times my code is 7 times faster.
I have pushed the benchmarks here:
https://github.com/DiamonDinoia/cpp-learning/tree/master/xsimd
it results in the following performance:
add+storehsumreduce_addunion pundouble union punI tweaked master a bit in https://github.com/DiamonDinoia/xsimd/tree/hadd-tweaks
and I got:
add+storehsumreduce_addunion pundouble union punThanks,
Marco