Commit af0ea95
Optimize 3-bit packing (#1029)
Summary:
Optimizes 3-bit packing as outlined here: T199311618
Before change:
----------------------------------------------------------------------------------
benchmark_pack_uint_values<3>/128/8 47.0 ns 46.4 ns 15106555
benchmark_pack_uint_values<3>/128/64 6.94 ns 6.90 ns 101226284
benchmark_pack_uint_values<3>/128/128 3.27 ns 3.24 ns 215022716
benchmark_unpack_uint_values<3>/128/8 22.0 ns 21.9 ns 32585572
benchmark_unpack_uint_values<3>/128/64 6.02 ns 5.98 ns 116910230
benchmark_unpack_uint_values<3>/128/128 2.74 ns 2.73 ns 257088291
After change:
----------------------------------------------------------------------------------
benchmark_pack_uint_values<3>/128/8 19.5 ns 19.5 ns 36050883
benchmark_pack_uint_values<3>/128/64 3.90 ns 3.87 ns 181151919
benchmark_pack_uint_values<3>/128/128 1.57 ns 1.57 ns 447247194
benchmark_unpack_uint_values<3>/128/8 20.5 ns 20.4 ns 34490914
benchmark_unpack_uint_values<3>/128/64 3.19 ns 3.11 ns 228019714
benchmark_unpack_uint_values<3>/128/128 1.71 ns 1.70 ns 408587338
Unpacking perf for 128 values is 1.60x faster (2.74/1.71).
Reviewed By: digantdesai
Differential Revision: D640106661 parent dec0313 commit af0ea95
File tree
4 files changed
+143
-211
lines changed- torchao/experimental/kernels/cpu/aarch64
- benchmarks
- bitpacking
- tests
4 files changed
+143
-211
lines changedLines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| 17 | + | |
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
109 | 109 | | |
110 | 110 | | |
111 | 111 | | |
| 112 | + | |
112 | 113 | | |
113 | 114 | | |
114 | 115 | | |
| |||
185 | 186 | | |
186 | 187 | | |
187 | 188 | | |
| 189 | + | |
188 | 190 | | |
189 | 191 | | |
190 | 192 | | |
| |||
0 commit comments