Adding ARM64 intrinsics support


Playing with the lib on the new Apple macbook air with M1, I found out that the performance of the native version is slower than the emulation of the x86_64 version. 

After a bit of investigation, the performance difference come from the arm version not using  intrinsics. A small modification of the cmake header detection script and `clang.h/gcc.h` files to import `arm_neon.h` instead of `x86intrinsics.h` when compiling on arm64, greatly improved the performances, see table. The performance boost should translate to others arm platforms. All tests are OK but I'm not able to check if everything works fine on windows or Android, so I didn't make a pull request.

CKKS performance test, degree 8192, on macbook air M1 8Go, timings in microseconds.
|   | native arm w/o intrinsics  | native arm with intrinsics  |
|---|---|---|
| encode  | 683  | 441  |
| decode | 1309  | 900  |
| encrypt  | 4356  | 2551  |
| decrypt  | 245  | 112  |
| add  | 38  | 37  |
| multiply  | 808  | 280  |
| multiply plain  | 368  | 106  |
| square  | 597  | 203  |
| relinearize  | 4253  | 1931  |
| rescale  | 1014  | 474  |
| rotate 1 step  | 4279  | 1972  |
| rotate rd  | 17199  | 7821  |




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding ARM64 intrinsics support #268

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	native arm w/o intrinsics	native arm with intrinsics
encode	683	441
decode	1309	900
encrypt	4356	2551
decrypt	245	112
add	38	37
multiply	808	280
multiply plain	368	106
square	597	203
relinearize	4253	1931
rescale	1014	474
rotate 1 step	4279	1972
rotate rd	17199	7821

Adding ARM64 intrinsics support #268

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions