-
Notifications
You must be signed in to change notification settings - Fork 762
Closed
Description
Playing with the lib on the new Apple macbook air with M1, I found out that the performance of the native version is slower than the emulation of the x86_64 version.
After a bit of investigation, the performance difference come from the arm version not using intrinsics. A small modification of the cmake header detection script and clang.h/gcc.h files to import arm_neon.h instead of x86intrinsics.h when compiling on arm64, greatly improved the performances, see table. The performance boost should translate to others arm platforms. All tests are OK but I'm not able to check if everything works fine on windows or Android, so I didn't make a pull request.
CKKS performance test, degree 8192, on macbook air M1 8Go, timings in microseconds.
| native arm w/o intrinsics | native arm with intrinsics | |
|---|---|---|
| encode | 683 | 441 |
| decode | 1309 | 900 |
| encrypt | 4356 | 2551 |
| decrypt | 245 | 112 |
| add | 38 | 37 |
| multiply | 808 | 280 |
| multiply plain | 368 | 106 |
| square | 597 | 203 |
| relinearize | 4253 | 1931 |
| rescale | 1014 | 474 |
| rotate 1 step | 4279 | 1972 |
| rotate rd | 17199 | 7821 |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels