[Performance] Use Lua scripts for scattered index operations#1571
Open
vmoens wants to merge 1 commit intogh/vmoens/57/basefrom
Open
[Performance] Use Lua scripts for scattered index operations#1571vmoens wants to merge 1 commit intogh/vmoens/57/basefrom
vmoens wants to merge 1 commit intogh/vmoens/57/basefrom
Conversation
vmoens
added a commit
that referenced
this pull request
Feb 14, 2026
Replace per-row GETRANGE/SETRANGE pipeline commands with server-side Lua scripts for tensor/list/bool indices. Each key now emits exactly one EVAL command regardless of the number of indexed positions. Hybrid strategy: - int / slice (any step): single GETRANGE + local stride (unchanged) - tensor / list / bool: Lua GETRANGES script (new) - step>1 writes: covering-range RMW (unchanged) - scattered writes: Lua SETRANGES script (new) This gives deterministic O(K) commands for K keys with O(N*row_size) bandwidth, eliminating the covering-range waste for sparse indices like td[tensor([0, 1000])]. Fancy/bool writes improved ~2.5x (5.9ms -> 2.4ms). ghstack-source-id: 59fb136 Pull-Request: #1571
This was referenced Feb 14, 2026
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_plain_set_nested | 35.9310μs | 14.5620μs | 68.6718 KOps/s | 69.3438 KOps/s | |
| test_plain_set_stack_nested | 36.5010μs | 14.7715μs | 67.6979 KOps/s | 69.0744 KOps/s | |
| test_plain_set_nested_inplace | 40.6010μs | 16.1382μs | 61.9648 KOps/s | 61.8277 KOps/s | |
| test_plain_set_stack_nested_inplace | 46.5110μs | 15.9019μs | 62.8857 KOps/s | 62.5348 KOps/s | |
| test_items | 34.0610μs | 5.5538μs | 180.0572 KOps/s | 181.3411 KOps/s | |
| test_items_nested | 0.5763ms | 0.5225ms | 1.9140 KOps/s | 1.9554 KOps/s | |
| test_items_nested_locked | 0.5717ms | 0.5213ms | 1.9184 KOps/s | 1.9419 KOps/s | |
| test_items_nested_leaf | 0.1180ms | 91.9692μs | 10.8732 KOps/s | 10.8536 KOps/s | |
| test_items_stack_nested | 0.6002ms | 0.5216ms | 1.9173 KOps/s | 1.9492 KOps/s | |
| test_items_stack_nested_leaf | 0.1223ms | 91.7531μs | 10.8988 KOps/s | 10.8081 KOps/s | |
| test_items_stack_nested_locked | 0.5772ms | 0.5225ms | 1.9138 KOps/s | 1.9367 KOps/s | |
| test_keys | 30.2800μs | 4.1159μs | 242.9602 KOps/s | 241.8597 KOps/s | |
| test_keys_nested | 0.1470ms | 0.1179ms | 8.4794 KOps/s | 8.5366 KOps/s | |
| test_keys_nested_locked | 88.8760ms | 0.1398ms | 7.1518 KOps/s | 7.9307 KOps/s | |
| test_keys_nested_leaf | 0.1384ms | 0.1090ms | 9.1707 KOps/s | 9.2673 KOps/s | |
| test_keys_stack_nested | 0.1493ms | 0.1192ms | 8.3893 KOps/s | 8.5260 KOps/s | |
| test_keys_stack_nested_leaf | 0.1353ms | 0.1085ms | 9.2190 KOps/s | 9.2959 KOps/s | |
| test_keys_stack_nested_locked | 0.1610ms | 0.1285ms | 7.7795 KOps/s | 7.9669 KOps/s | |
| test_values | 6.1380μs | 0.9961μs | 1.0039 MOps/s | 1.0025 MOps/s | |
| test_values_nested | 73.8410μs | 46.7094μs | 21.4090 KOps/s | 21.3443 KOps/s | |
| test_values_nested_locked | 83.6020μs | 49.5439μs | 20.1841 KOps/s | 20.2728 KOps/s | |
| test_values_nested_leaf | 82.5720μs | 52.8102μs | 18.9357 KOps/s | 18.9705 KOps/s | |
| test_values_stack_nested | 0.1224ms | 46.1585μs | 21.6645 KOps/s | 21.4608 KOps/s | |
| test_values_stack_nested_leaf | 80.6520μs | 52.2008μs | 19.1568 KOps/s | 18.9401 KOps/s | |
| test_values_stack_nested_locked | 85.5520μs | 49.4662μs | 20.2158 KOps/s | 20.1835 KOps/s | |
| test_membership | 4.1852μs | 0.8129μs | 1.2301 MOps/s | 1.2377 MOps/s | |
| test_membership_nested | 24.2410μs | 2.9749μs | 336.1479 KOps/s | 335.5553 KOps/s | |
| test_membership_nested_leaf | 26.1900μs | 3.0172μs | 331.4340 KOps/s | 335.7182 KOps/s | |
| test_membership_stacked_nested | 33.8410μs | 3.0119μs | 332.0129 KOps/s | 334.0291 KOps/s | |
| test_membership_stacked_nested_leaf | 27.6210μs | 3.0132μs | 331.8706 KOps/s | 336.2902 KOps/s | |
| test_membership_nested_last | 33.7900μs | 4.3708μs | 228.7928 KOps/s | 230.4465 KOps/s | |
| test_membership_nested_leaf_last | 29.6510μs | 4.4110μs | 226.7035 KOps/s | 229.5998 KOps/s | |
| test_membership_stacked_nested_last | 34.0710μs | 4.3407μs | 230.3781 KOps/s | 228.7252 KOps/s | |
| test_membership_stacked_nested_leaf_last | 25.1300μs | 4.3407μs | 230.3787 KOps/s | 230.0922 KOps/s | |
| test_nested_getleaf | 48.1110μs | 20.6752μs | 48.3672 KOps/s | 46.7511 KOps/s | |
| test_nested_get | 58.1810μs | 19.1893μs | 52.1123 KOps/s | 49.8121 KOps/s | |
| test_stacked_getleaf | 57.6110μs | 20.2224μs | 49.4500 KOps/s | 46.7939 KOps/s | |
| test_stacked_get | 45.6710μs | 19.4251μs | 51.4798 KOps/s | 49.2546 KOps/s | |
| test_nested_getitemleaf | 46.8310μs | 21.2409μs | 47.0791 KOps/s | 46.6567 KOps/s | |
| test_nested_getitem | 45.9710μs | 20.1906μs | 49.5280 KOps/s | 49.2473 KOps/s | |
| test_stacked_getitemleaf | 45.4710μs | 21.3594μs | 46.8178 KOps/s | 46.6820 KOps/s | |
| test_stacked_getitem | 52.5710μs | 20.3599μs | 49.1161 KOps/s | 48.8905 KOps/s | |
| test_lock_nested | 7.8776ms | 0.4591ms | 2.1780 KOps/s | 2.1835 KOps/s | |
| test_lock_stack_nested | 0.5002ms | 0.4586ms | 2.1807 KOps/s | 2.1493 KOps/s | |
| test_unlock_nested | 0.4685ms | 0.3659ms | 2.7328 KOps/s | 2.7084 KOps/s | |
| test_unlock_stack_nested | 0.4215ms | 0.3682ms | 2.7156 KOps/s | 2.6606 KOps/s | |
| test_flatten_speed | 0.1655ms | 0.1178ms | 8.4881 KOps/s | 8.5872 KOps/s | |
| test_unflatten_speed | 0.6387ms | 0.5727ms | 1.7460 KOps/s | 1.7513 KOps/s | |
| test_common_ops | 0.8418ms | 0.6848ms | 1.4603 KOps/s | 1.4680 KOps/s | |
| test_creation | 0.1226ms | 2.7499μs | 363.6447 KOps/s | 366.3530 KOps/s | |
| test_creation_empty | 29.5500μs | 5.7996μs | 172.4252 KOps/s | 175.2394 KOps/s | |
| test_creation_nested_1 | 31.8500μs | 10.2736μs | 97.3371 KOps/s | 97.4462 KOps/s | |
| test_creation_nested_2 | 39.7210μs | 11.3006μs | 88.4907 KOps/s | 90.4903 KOps/s | |
| test_creation_many_keys[10] | 51.6210μs | 17.1622μs | 58.2675 KOps/s | 59.5464 KOps/s | |
| test_creation_many_keys[50] | 95.8520μs | 72.0606μs | 13.8772 KOps/s | 13.9863 KOps/s | |
| test_creation_many_keys[100] | 0.2025ms | 0.1429ms | 6.9960 KOps/s | 7.0912 KOps/s | |
| test_creation_nested_many_keys[10] | 63.3110μs | 36.9392μs | 27.0715 KOps/s | 27.2077 KOps/s | |
| test_creation_nested_many_keys[50] | 0.1865ms | 0.1494ms | 6.6947 KOps/s | 6.7845 KOps/s | |
| test_clone | 44.7710μs | 12.9296μs | 77.3420 KOps/s | 75.3930 KOps/s | |
| test_getitem[int] | 1.6189ms | 14.0510μs | 71.1693 KOps/s | 57.6310 KOps/s | |
| test_getitem[slice_int] | 0.1346ms | 24.0831μs | 41.5229 KOps/s | 40.9517 KOps/s | |
| test_getitem[range] | 0.1627ms | 60.2635μs | 16.5938 KOps/s | 16.4883 KOps/s | |
| test_getitem[tuple] | 0.1426ms | 23.4613μs | 42.6234 KOps/s | 42.3250 KOps/s | |
| test_getitem[list] | 0.1778ms | 57.9105μs | 17.2680 KOps/s | 17.7068 KOps/s | |
| test_setitem_dim[int] | 49.2310μs | 25.5992μs | 39.0638 KOps/s | 39.1756 KOps/s | |
| test_setitem_dim[slice_int] | 87.9920μs | 43.3572μs | 23.0642 KOps/s | 23.1140 KOps/s | |
| test_setitem_dim[range] | 0.1286ms | 91.4503μs | 10.9349 KOps/s | 10.9478 KOps/s | |
| test_setitem_dim[tuple] | 77.9910μs | 39.8109μs | 25.1188 KOps/s | 24.7635 KOps/s | |
| test_setitem | 53.5810μs | 17.4864μs | 57.1872 KOps/s | 55.5233 KOps/s | |
| test_set | 53.3910μs | 16.6399μs | 60.0964 KOps/s | 58.8948 KOps/s | |
| test_set_shared | 0.5056ms | 0.2029ms | 4.9294 KOps/s | 4.8883 KOps/s | |
| test_update | 0.3462ms | 21.3863μs | 46.7589 KOps/s | 45.6155 KOps/s | |
| test_update_nested | 78.6410μs | 33.6984μs | 29.6750 KOps/s | 29.4573 KOps/s | |
| test_update__nested | 0.4699ms | 33.0917μs | 30.2191 KOps/s | 29.6685 KOps/s | |
| test_set_nested | 50.3810μs | 18.6610μs | 53.5878 KOps/s | 52.0623 KOps/s | |
| test_set_nested_new | 61.8910μs | 23.4307μs | 42.6790 KOps/s | 41.5905 KOps/s | |
| test_select | 80.4820μs | 40.7945μs | 24.5131 KOps/s | 24.0638 KOps/s | |
| test_select_nested | 97.9020μs | 71.1593μs | 14.0530 KOps/s | 14.2497 KOps/s | |
| test_exclude_nested | 0.1237ms | 93.0099μs | 10.7515 KOps/s | 11.0332 KOps/s | |
| test_empty[True] | 0.4792ms | 0.4233ms | 2.3626 KOps/s | 2.4148 KOps/s | |
| test_empty[False] | 8.4250μs | 1.2655μs | 790.2289 KOps/s | 799.3109 KOps/s | |
| test_to | 0.1007ms | 73.0300μs | 13.6930 KOps/s | 13.9880 KOps/s | |
| test_to_nonblocking | 0.1172ms | 63.6546μs | 15.7098 KOps/s | 15.7518 KOps/s | |
| test_unbind_speed | 0.3490ms | 0.3139ms | 3.1861 KOps/s | 3.1830 KOps/s | |
| test_unbind_speed_stack0 | 0.3772ms | 0.3131ms | 3.1938 KOps/s | 3.2299 KOps/s | |
| test_unbind_speed_stack1 | 0.1032s | 0.8844ms | 1.1307 KOps/s | 1.1191 KOps/s | |
| test_split | 1.1557ms | 1.0930ms | 914.9265 Ops/s | 915.3271 Ops/s | |
| test_chunk | 0.1029s | 1.1638ms | 859.2684 Ops/s | 958.8233 Ops/s | |
| test_to_cpu_blocking | 19.1052ms | 19.0012ms | 52.6283 Ops/s | 40.1713 Ops/s | |
| test_to_cpu_global_sync | 11.0030ms | 10.9206ms | 91.5701 Ops/s | 89.1188 Ops/s | |
| test_to_cpu_event_sync | 0.1151s | 13.1330ms | 76.1440 Ops/s | 81.9111 Ops/s | |
| test_to_cpu_default | 12.2201ms | 11.9364ms | 83.7771 Ops/s | 82.0550 Ops/s | |
| test_consolidate[False-None] | 4.0531ms | 3.9457ms | 253.4393 Ops/s | 223.5667 Ops/s | |
| test_consolidate[default-None] | 2.0082ms | 1.9179ms | 521.3978 Ops/s | 494.8778 Ops/s | |
| test_consolidate[reduce-overhead-None] | 1.9546ms | 1.8544ms | 539.2481 Ops/s | 515.8135 Ops/s | |
| test_consolidate_njt[False-None] | 8.3073ms | 8.1317ms | 122.9748 Ops/s | 120.7718 Ops/s | |
| test_to[False-False-None] | 2.0990ms | 1.9854ms | 503.6722 Ops/s | 490.1707 Ops/s | |
| test_to[True-False-None] | 2.1327ms | 1.8684ms | 535.2163 Ops/s | 535.3325 Ops/s | |
| test_to[within-False-None] | 6.1437ms | 5.8949ms | 169.6383 Ops/s | 167.4598 Ops/s | |
| test_to[True-default-None] | 7.4509ms | 7.3428ms | 136.1878 Ops/s | 125.6146 Ops/s | |
| test_to_njt[False-False-None] | 8.3378ms | 8.2460ms | 121.2715 Ops/s | 116.8855 Ops/s | |
| test_to_njt[True-False-None] | 6.8284ms | 6.7145ms | 148.9306 Ops/s | 142.0950 Ops/s | |
| test_to_njt[within-False-None] | 15.1388ms | 15.0029ms | 66.6539 Ops/s | 64.3186 Ops/s | |
| test_creation[device0] | 0.4683ms | 0.1135ms | 8.8113 KOps/s | 8.3675 KOps/s | |
| test_creation_from_tensor | 0.4633ms | 0.1117ms | 8.9510 KOps/s | 8.6310 KOps/s | |
| test_add_one[memmap_tensor0] | 0.2264ms | 6.2435μs | 160.1659 KOps/s | 152.0520 KOps/s | |
| test_contiguous[memmap_tensor0] | 28.2000μs | 0.6149μs | 1.6264 MOps/s | 2.2730 MOps/s | |
| test_stack[memmap_tensor0] | 24.7300μs | 4.3388μs | 230.4759 KOps/s | 216.5658 KOps/s | |
| test_memmaptd_index | 1.0700ms | 0.2560ms | 3.9066 KOps/s | 3.9381 KOps/s | |
| test_memmaptd_index_astensor | 0.5001ms | 0.3429ms | 2.9166 KOps/s | 2.8997 KOps/s | |
| test_memmaptd_index_op | 0.8295ms | 0.5830ms | 1.7154 KOps/s | 1.6490 KOps/s | |
| test_serialize_model | 0.1379s | 0.1365s | 7.3252 Ops/s | 7.2578 Ops/s | |
| test_serialize_model_pickle | 1.6670s | 1.2574s | 0.7953 Ops/s | 0.8259 Ops/s | |
| test_serialize_weights | 0.1366s | 0.1340s | 7.4610 Ops/s | 7.3250 Ops/s | |
| test_serialize_weights_returnearly | 0.4074s | 88.2321ms | 11.3337 Ops/s | 11.3279 Ops/s | |
| test_serialize_weights_pickle | 1.3759s | 1.2185s | 0.8207 Ops/s | 0.8170 Ops/s | |
| test_reshape_pytree | 0.2008ms | 31.8796μs | 31.3680 KOps/s | 30.4521 KOps/s | |
| test_reshape_td | 74.7620μs | 42.3039μs | 23.6385 KOps/s | 23.1217 KOps/s | |
| test_view_pytree | 0.2006ms | 31.4213μs | 31.8255 KOps/s | 30.7457 KOps/s | |
| test_view_td | 89.6810μs | 49.5621μs | 20.1767 KOps/s | 19.5401 KOps/s | |
| test_unbind_pytree | 0.2357ms | 35.3877μs | 28.2584 KOps/s | 27.9350 KOps/s | |
| test_unbind_td | 0.1068ms | 46.5485μs | 21.4830 KOps/s | 21.4871 KOps/s | |
| test_split_pytree | 0.2457ms | 40.5302μs | 24.6730 KOps/s | 24.1129 KOps/s | |
| test_split_td | 0.1147ms | 62.1331μs | 16.0945 KOps/s | 15.6781 KOps/s | |
| test_add_pytree | 0.1898ms | 41.2913μs | 24.2182 KOps/s | 24.2468 KOps/s | |
| test_add_td | 0.1056ms | 51.3955μs | 19.4570 KOps/s | 19.0094 KOps/s | |
| test_compile_add_one_nested[tensordict-compile] | 0.2032ms | 0.1358ms | 7.3653 KOps/s | 6.9480 KOps/s | |
| test_compile_add_one_nested[tensordict-eager] | 0.4104ms | 0.1884ms | 5.3082 KOps/s | 5.3728 KOps/s | |
| test_compile_add_one_nested[pytree-compile] | 0.1739ms | 0.1110ms | 9.0107 KOps/s | 9.2231 KOps/s | |
| test_compile_add_one_nested[pytree-eager] | 0.4291ms | 0.1759ms | 5.6854 KOps/s | 5.6691 KOps/s | |
| test_compile_copy_nested[tensordict-compile] | 60.3220μs | 28.8608μs | 34.6491 KOps/s | 30.5084 KOps/s | |
| test_compile_copy_nested[tensordict-eager] | 79.0320μs | 49.7057μs | 20.1184 KOps/s | 20.1272 KOps/s | |
| test_compile_copy_nested[pytree-compile] | 0.1482ms | 9.5114μs | 105.1369 KOps/s | 105.7495 KOps/s | |
| test_compile_copy_nested[pytree-eager] | 0.4493ms | 66.6374μs | 15.0066 KOps/s | 15.0087 KOps/s | |
| test_compile_add_one_flat[tensordict-compile] | 0.2220ms | 0.1717ms | 5.8226 KOps/s | 5.5139 KOps/s | |
| test_compile_add_one_flat[tensordict-eager] | 0.3087ms | 0.2477ms | 4.0376 KOps/s | 4.0288 KOps/s | |
| test_compile_add_one_flat[tensorclass-compile] | 0.1553ms | 0.1127ms | 8.8732 KOps/s | 8.5601 KOps/s | |
| test_compile_add_one_flat[tensorclass-eager] | 0.1128ms | 66.7636μs | 14.9782 KOps/s | 14.9741 KOps/s | |
| test_compile_add_one_flat[pytree-compile] | 0.2240ms | 0.1549ms | 6.4553 KOps/s | 6.2238 KOps/s | |
| test_compile_add_one_flat[pytree-eager] | 0.8910ms | 0.5187ms | 1.9278 KOps/s | 1.9040 KOps/s | |
| test_compile_add_self_flat[tensordict-eager] | 0.3368ms | 0.3013ms | 3.3186 KOps/s | 3.2952 KOps/s | |
| test_compile_add_self_flat[tensordict-compile] | 0.2309ms | 0.1818ms | 5.5015 KOps/s | 5.5024 KOps/s | |
| test_compile_add_self_flat[tensorclass-eager] | 0.1433ms | 84.1232μs | 11.8873 KOps/s | 12.1298 KOps/s | |
| test_compile_add_self_flat[tensorclass-compile] | 0.1939ms | 0.1157ms | 8.6428 KOps/s | 8.4293 KOps/s | |
| test_compile_add_self_flat[pytree-eager] | 0.6498ms | 0.4284ms | 2.3343 KOps/s | 2.3152 KOps/s | |
| test_compile_add_self_flat[pytree-compile] | 0.2459ms | 0.1628ms | 6.1426 KOps/s | 6.3447 KOps/s | |
| test_compile_copy_flat[tensordict-compile] | 56.2410μs | 25.3189μs | 39.4961 KOps/s | 40.8239 KOps/s | |
| test_compile_copy_flat[tensordict-eager] | 71.9610μs | 39.8086μs | 25.1202 KOps/s | 25.0221 KOps/s | |
| test_compile_copy_flat[pytree-compile] | 0.1272ms | 10.3785μs | 96.3527 KOps/s | 95.3937 KOps/s | |
| test_compile_copy_flat[pytree-eager] | 0.4170ms | 51.1698μs | 19.5428 KOps/s | 19.7130 KOps/s | |
| test_compile_assign_and_add[tensordict-compile] | 1.9882ms | 0.1698ms | 5.8883 KOps/s | 5.6537 KOps/s | |
| test_compile_assign_and_add[tensordict-eager] | 3.5138ms | 3.2575ms | 306.9813 Ops/s | 303.3602 Ops/s | |
| test_compile_assign_and_add[pytree-compile] | 1.8803ms | 0.1577ms | 6.3424 KOps/s | 6.2618 KOps/s | |
| test_compile_assign_and_add[pytree-eager] | 2.8960ms | 2.7696ms | 361.0595 Ops/s | 360.9127 Ops/s | |
| test_compile_indexing[tensor-tensordict-compile] | 0.2256ms | 0.1054ms | 9.4840 KOps/s | 9.3731 KOps/s | |
| test_compile_indexing[tensor-tensordict-eager] | 0.3504ms | 70.5159μs | 14.1812 KOps/s | 14.0349 KOps/s | |
| test_compile_indexing[tensor-tensorclass-compile] | 0.1385ms | 93.5312μs | 10.6916 KOps/s | 10.5406 KOps/s | |
| test_compile_indexing[tensor-tensorclass-eager] | 0.2459ms | 45.2819μs | 22.0839 KOps/s | 22.2790 KOps/s | |
| test_compile_indexing[tensor-pytree-compile] | 0.1456ms | 95.7553μs | 10.4433 KOps/s | 10.4425 KOps/s | |
| test_compile_indexing[tensor-pytree-eager] | 0.2679ms | 47.7385μs | 20.9475 KOps/s | 22.2248 KOps/s | |
| test_compile_indexing[slice-tensordict-compile] | 0.2165ms | 57.8222μs | 17.2944 KOps/s | 17.9740 KOps/s | |
| test_compile_indexing[slice-tensordict-eager] | 0.2290ms | 28.9019μs | 34.5999 KOps/s | 36.9335 KOps/s | |
| test_compile_indexing[slice-tensorclass-compile] | 0.1495ms | 46.9594μs | 21.2950 KOps/s | 22.2643 KOps/s | |
| test_compile_indexing[slice-tensorclass-eager] | 0.2711ms | 23.1290μs | 43.2357 KOps/s | 45.2093 KOps/s | |
| test_compile_indexing[slice-pytree-compile] | 0.1416ms | 44.6762μs | 22.3833 KOps/s | 21.7462 KOps/s | |
| test_compile_indexing[slice-pytree-eager] | 0.2643ms | 21.6736μs | 46.1391 KOps/s | 45.0502 KOps/s | |
| test_compile_indexing[int-tensordict-compile] | 0.2340ms | 55.2173μs | 18.1103 KOps/s | 17.4530 KOps/s | |
| test_compile_indexing[int-tensordict-eager] | 0.2277ms | 26.9743μs | 37.0724 KOps/s | 37.0184 KOps/s | |
| test_compile_indexing[int-tensorclass-compile] | 80.8420μs | 44.2830μs | 22.5820 KOps/s | 21.6281 KOps/s | |
| test_compile_indexing[int-tensorclass-eager] | 0.2671ms | 22.0402μs | 45.3715 KOps/s | 44.9964 KOps/s | |
| test_compile_indexing[int-pytree-compile] | 80.1920μs | 44.6408μs | 22.4010 KOps/s | 21.7069 KOps/s | |
| test_compile_indexing[int-pytree-eager] | 0.2611ms | 21.4755μs | 46.5646 KOps/s | 45.1241 KOps/s | |
| test_mod_add[eager] | 0.1045ms | 49.0265μs | 20.3971 KOps/s | 19.8954 KOps/s | |
| test_mod_add[compile] | 0.4700ms | 0.1067ms | 9.3703 KOps/s | 9.2756 KOps/s | |
| test_mod_add[compile-overhead] | 0.3824ms | 0.1440ms | 6.9452 KOps/s | 6.8098 KOps/s | |
| test_mod_wrap[eager] | 0.3629ms | 0.2986ms | 3.3491 KOps/s | 3.4976 KOps/s | |
| test_mod_wrap[compile] | 0.4414ms | 0.3546ms | 2.8204 KOps/s | 2.9048 KOps/s | |
| test_mod_wrap[compile-overhead] | 7.3767ms | 4.0461ms | 247.1489 Ops/s | 246.2275 Ops/s | |
| test_mod_wrap_and_backward[eager] | 1.6020ms | 1.4859ms | 672.9949 Ops/s | 668.6457 Ops/s | |
| test_mod_wrap_and_backward[compile] | 1.5071ms | 1.4112ms | 708.6105 Ops/s | 699.7653 Ops/s | |
| test_mod_wrap_and_backward[compile-overhead] | 1.2261ms | 0.8627ms | 1.1591 KOps/s | 1.1317 KOps/s | |
| test_seq_add[eager] | 0.2049ms | 0.1526ms | 6.5542 KOps/s | 6.5149 KOps/s | |
| test_seq_add[compile] | 0.2053ms | 0.1117ms | 8.9490 KOps/s | 8.5914 KOps/s | |
| test_seq_add[compile-overhead] | 0.1838ms | 0.1494ms | 6.6952 KOps/s | 6.4165 KOps/s | |
| test_seq_wrap[eager] | 0.8940ms | 0.5137ms | 1.9468 KOps/s | 1.9382 KOps/s | |
| test_seq_wrap[compile] | 0.4032ms | 0.3568ms | 2.8026 KOps/s | 2.7371 KOps/s | |
| test_seq_wrap[compile-overhead] | 0.3205ms | 0.2555ms | 3.9138 KOps/s | 3.7810 KOps/s | |
| test_func_call_runtime[False-eager] | 0.9288ms | 0.8245ms | 1.2129 KOps/s | 1.2048 KOps/s | |
| test_func_call_runtime[False-compile] | 0.9466ms | 0.8850ms | 1.1299 KOps/s | 1.1147 KOps/s | |
| test_func_call_runtime[False-compile-overhead] | 0.4914ms | 0.4435ms | 2.2550 KOps/s | 2.2185 KOps/s | |
| test_func_call_runtime[True-eager] | 1.1454ms | 1.0619ms | 941.7006 Ops/s | 939.6734 Ops/s | |
| test_func_call_runtime[True-compile] | 1.0000ms | 0.8926ms | 1.1203 KOps/s | 1.1010 KOps/s | |
| test_func_call_runtime[True-compile-overhead] | 0.5024ms | 0.4525ms | 2.2102 KOps/s | 2.1575 KOps/s | |
| test_func_call_cm_runtime[False-eager] | 0.8838ms | 0.8199ms | 1.2196 KOps/s | 1.2076 KOps/s | |
| test_func_call_cm_runtime[False-compile] | 0.9637ms | 0.8868ms | 1.1277 KOps/s | 1.0879 KOps/s | |
| test_func_call_cm_runtime[False-compile-overhead] | 0.4911ms | 0.4462ms | 2.2409 KOps/s | 2.2156 KOps/s | |
| test_func_call_cm_runtime[True-eager] | 1.2903ms | 1.1912ms | 839.4815 Ops/s | 824.0648 Ops/s | |
| test_func_call_cm_runtime[True-compile] | 0.9786ms | 0.9288ms | 1.0767 KOps/s | 1.0613 KOps/s | |
| test_func_call_cm_runtime[True-compile-overhead] | 0.5429ms | 0.4865ms | 2.0555 KOps/s | 2.0206 KOps/s | |
| test_vmap_func_call_cm_runtime[eager] | 2.8163ms | 2.3217ms | 430.7264 Ops/s | 424.7075 Ops/s | |
| test_vmap_func_call_cm_runtime[compile] | 1.0654ms | 0.9471ms | 1.0558 KOps/s | 1.0034 KOps/s | |
| test_vmap_func_call_cm_runtime[compile-overhead] | 0.5430ms | 0.4929ms | 2.0288 KOps/s | 1.9930 KOps/s | |
| test_distributed | 2.6297ms | 0.1632ms | 6.1260 KOps/s | 6.5214 KOps/s | |
| test_tdmodule | 75.5420μs | 27.8906μs | 35.8544 KOps/s | 34.5936 KOps/s | |
| test_tdmodule_dispatch | 76.1120μs | 46.1871μs | 21.6510 KOps/s | 21.7286 KOps/s | |
| test_tdseq | 45.8010μs | 26.9731μs | 37.0739 KOps/s | 36.7498 KOps/s | |
| test_tdseq_dispatch | 69.2720μs | 47.5839μs | 21.0155 KOps/s | 20.7978 KOps/s | |
| test_instantiation_functorch | 2.1670ms | 1.9647ms | 508.9750 Ops/s | 502.5926 Ops/s | |
| test_exec_functorch | 0.2140ms | 0.1741ms | 5.7443 KOps/s | 5.6974 KOps/s | |
| test_exec_functional_call | 0.1988ms | 0.1585ms | 6.3103 KOps/s | 6.2686 KOps/s | |
| test_exec_td_decorator | 0.4466ms | 0.2301ms | 4.3461 KOps/s | 4.3200 KOps/s | |
| test_vmap_mlp_speed_decorator[True-True] | 1.0177ms | 0.8107ms | 1.2336 KOps/s | 1.2273 KOps/s | |
| test_vmap_mlp_speed_decorator[True-False] | 0.9836ms | 0.8111ms | 1.2329 KOps/s | 1.2267 KOps/s | |
| test_vmap_mlp_speed_decorator[False-True] | 0.8491ms | 0.7027ms | 1.4231 KOps/s | 1.4076 KOps/s | |
| test_vmap_mlp_speed_decorator[False-False] | 0.8906ms | 0.7016ms | 1.4252 KOps/s | 1.4119 KOps/s | |
| test_vmap_transformer_speed_decorator[True-True] | 20.3016ms | 20.1978ms | 49.5103 Ops/s | 48.4598 Ops/s | |
| test_vmap_transformer_speed_decorator[True-False] | 20.3492ms | 20.2443ms | 49.3966 Ops/s | 48.5519 Ops/s | |
| test_vmap_transformer_speed_decorator[False-True] | 20.1639ms | 20.0555ms | 49.8616 Ops/s | 48.9241 Ops/s | |
| test_vmap_transformer_speed_decorator[False-False] | 20.2047ms | 20.0841ms | 49.7907 Ops/s | 48.9483 Ops/s | |
| test_to_module_speed[True] | 1.5025ms | 1.4206ms | 703.9347 Ops/s | 704.6668 Ops/s | |
| test_to_module_speed[False] | 1.4868ms | 1.4027ms | 712.9128 Ops/s | 710.5752 Ops/s | |
| test_tc_init | 65.7920μs | 44.8455μs | 22.2988 KOps/s | 22.2967 KOps/s | |
| test_tc_init_tensor_only | 32.7300μs | 9.4073μs | 106.3004 KOps/s | 107.4280 KOps/s | |
| test_tc_init_nested | 0.1187ms | 89.9750μs | 11.1142 KOps/s | 11.0648 KOps/s | |
| test_tc_init_many_fields | 37.7510μs | 15.7732μs | 63.3988 KOps/s | 63.9272 KOps/s | |
| test_tc_first_layer_tensor | 19.2500μs | 1.7441μs | 573.3553 KOps/s | 586.8186 KOps/s | |
| test_tc_first_layer_tensor_only | 5.5687μs | 0.7110μs | 1.4065 MOps/s | 1.4271 MOps/s | |
| test_tc_first_layer_tensor_set | 31.5300μs | 4.0038μs | 249.7650 KOps/s | 253.9699 KOps/s | |
| test_tc_first_layer_tensor_only_set | 28.3610μs | 3.0020μs | 333.1152 KOps/s | 332.5503 KOps/s | |
| test_tc_first_layer_nontensor | 2.3987ms | 5.8798μs | 170.0735 KOps/s | 172.7744 KOps/s | |
| test_tc_second_layer_tensor | 36.8600μs | 4.1609μs | 240.3347 KOps/s | 244.2335 KOps/s | |
| test_tc_second_layer_nontensor | 30.6800μs | 8.2575μs | 121.1016 KOps/s | 126.8796 KOps/s | |
| test_unbind | 0.2411s | 13.2413ms | 75.5214 Ops/s | 69.3823 Ops/s | |
| test_full_like | 5.4671ms | 4.3308ms | 230.9068 Ops/s | 227.3441 Ops/s | |
| test_zeros_like | 5.0319ms | 4.3547ms | 229.6389 Ops/s | 228.6198 Ops/s | |
| test_ones_like | 4.5191ms | 4.3574ms | 229.4938 Ops/s | 228.2051 Ops/s | |
| test_clone | 6.7262ms | 6.4512ms | 155.0088 Ops/s | 153.6532 Ops/s | |
| test_squeeze | 0.1786ms | 13.5712μs | 73.6852 KOps/s | 66.4650 KOps/s | |
| test_unsqueeze | 0.1536ms | 0.1102ms | 9.0741 KOps/s | 8.8629 KOps/s | |
| test_split | 0.3475ms | 0.1760ms | 5.6814 KOps/s | 5.3556 KOps/s | |
| test_permute | 0.2670ms | 0.2015ms | 4.9639 KOps/s | 4.7897 KOps/s | |
| test_stack | 52.2511ms | 51.4144ms | 19.4498 Ops/s | 19.4800 Ops/s | |
| test_cat | 51.6251ms | 51.3769ms | 19.4640 Ops/s | 19.4537 Ops/s |
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_plain_set_nested | 57.7130μs | 14.8731μs | 67.2353 KOps/s | 67.0241 KOps/s | |
| test_plain_set_stack_nested | 40.2320μs | 15.2653μs | 65.5080 KOps/s | 66.3270 KOps/s | |
| test_plain_set_nested_inplace | 43.4020μs | 16.5050μs | 60.5877 KOps/s | 59.9730 KOps/s | |
| test_plain_set_stack_nested_inplace | 59.0140μs | 16.4465μs | 60.8033 KOps/s | 61.0029 KOps/s | |
| test_items | 35.4630μs | 5.8655μs | 170.4884 KOps/s | 172.7090 KOps/s | |
| test_items_nested | 0.5975ms | 0.5322ms | 1.8790 KOps/s | 1.8670 KOps/s | |
| test_items_nested_locked | 0.6395ms | 0.5261ms | 1.9008 KOps/s | 1.8457 KOps/s | |
| test_items_nested_leaf | 0.1346ms | 95.5464μs | 10.4661 KOps/s | 10.4033 KOps/s | |
| test_items_stack_nested | 0.6039ms | 0.5420ms | 1.8449 KOps/s | 1.8717 KOps/s | |
| test_items_stack_nested_leaf | 0.1372ms | 97.7904μs | 10.2260 KOps/s | 10.4305 KOps/s | |
| test_items_stack_nested_locked | 0.6230ms | 0.5373ms | 1.8612 KOps/s | 1.8475 KOps/s | |
| test_keys | 30.0520μs | 4.1966μs | 238.2897 KOps/s | 223.5863 KOps/s | |
| test_keys_nested | 0.1629ms | 0.1195ms | 8.3695 KOps/s | 8.2741 KOps/s | |
| test_keys_nested_locked | 88.4651ms | 0.1406ms | 7.1117 KOps/s | 7.7009 KOps/s | |
| test_keys_nested_leaf | 0.1612ms | 0.1108ms | 9.0223 KOps/s | 9.0666 KOps/s | |
| test_keys_stack_nested | 0.1647ms | 0.1207ms | 8.2849 KOps/s | 8.3159 KOps/s | |
| test_keys_stack_nested_leaf | 0.1478ms | 0.1105ms | 9.0487 KOps/s | 9.0245 KOps/s | |
| test_keys_stack_nested_locked | 0.1672ms | 0.1282ms | 7.7990 KOps/s | 7.7077 KOps/s | |
| test_values | 6.0184μs | 1.0173μs | 983.0414 KOps/s | 976.3358 KOps/s | |
| test_values_nested | 92.0660μs | 47.7094μs | 20.9602 KOps/s | 20.6621 KOps/s | |
| test_values_nested_locked | 85.7150μs | 50.7432μs | 19.7071 KOps/s | 19.6926 KOps/s | |
| test_values_nested_leaf | 85.6250μs | 54.2007μs | 18.4499 KOps/s | 18.2171 KOps/s | |
| test_values_stack_nested | 84.9150μs | 47.8319μs | 20.9065 KOps/s | 20.6023 KOps/s | |
| test_values_stack_nested_leaf | 93.0550μs | 53.8305μs | 18.5768 KOps/s | 18.3135 KOps/s | |
| test_values_stack_nested_locked | 0.7847ms | 50.4946μs | 19.8041 KOps/s | 19.4765 KOps/s | |
| test_membership | 4.1587μs | 0.8595μs | 1.1635 MOps/s | 1.1748 MOps/s | |
| test_membership_nested | 26.5420μs | 3.2050μs | 312.0151 KOps/s | 312.2958 KOps/s | |
| test_membership_nested_leaf | 33.9320μs | 3.1655μs | 315.9061 KOps/s | 310.6203 KOps/s | |
| test_membership_stacked_nested | 27.1020μs | 3.2246μs | 310.1147 KOps/s | 312.4235 KOps/s | |
| test_membership_stacked_nested_leaf | 36.9020μs | 3.1729μs | 315.1667 KOps/s | 313.0597 KOps/s | |
| test_membership_nested_last | 34.4620μs | 4.6629μs | 214.4610 KOps/s | 217.1786 KOps/s | |
| test_membership_nested_leaf_last | 32.3120μs | 4.6911μs | 213.1694 KOps/s | 215.3491 KOps/s | |
| test_membership_stacked_nested_last | 25.5720μs | 4.6472μs | 215.1837 KOps/s | 213.6375 KOps/s | |
| test_membership_stacked_nested_leaf_last | 34.8520μs | 4.6454μs | 215.2661 KOps/s | 215.1544 KOps/s | |
| test_nested_getleaf | 46.7630μs | 22.0064μs | 45.4413 KOps/s | 45.6060 KOps/s | |
| test_nested_get | 51.8030μs | 20.5686μs | 48.6177 KOps/s | 49.1273 KOps/s | |
| test_stacked_getleaf | 47.8220μs | 22.1499μs | 45.1470 KOps/s | 46.2246 KOps/s | |
| test_stacked_get | 49.8630μs | 20.8434μs | 47.9767 KOps/s | 48.1779 KOps/s | |
| test_nested_getitemleaf | 51.4730μs | 22.3251μs | 44.7925 KOps/s | 44.2700 KOps/s | |
| test_nested_getitem | 47.1030μs | 21.1587μs | 47.2620 KOps/s | 47.0771 KOps/s | |
| test_stacked_getitemleaf | 52.8030μs | 22.6638μs | 44.1232 KOps/s | 44.6083 KOps/s | |
| test_stacked_getitem | 47.0930μs | 21.3417μs | 46.8567 KOps/s | 46.3611 KOps/s | |
| test_lock_nested | 7.5813ms | 0.4749ms | 2.1059 KOps/s | 2.0760 KOps/s | |
| test_lock_stack_nested | 0.5365ms | 0.4751ms | 2.1049 KOps/s | 2.0483 KOps/s | |
| test_unlock_nested | 0.5114ms | 0.3784ms | 2.6428 KOps/s | 2.5966 KOps/s | |
| test_unlock_stack_nested | 0.4307ms | 0.3815ms | 2.6210 KOps/s | 2.5684 KOps/s | |
| test_flatten_speed | 0.1557ms | 0.1212ms | 8.2510 KOps/s | 8.0311 KOps/s | |
| test_unflatten_speed | 0.6597ms | 0.5912ms | 1.6916 KOps/s | 1.6663 KOps/s | |
| test_common_ops | 0.8116ms | 0.6858ms | 1.4582 KOps/s | 1.4275 KOps/s | |
| test_creation | 69.7640μs | 2.8121μs | 355.6042 KOps/s | 340.2768 KOps/s | |
| test_creation_empty | 31.0220μs | 6.1402μs | 162.8616 KOps/s | 161.4765 KOps/s | |
| test_creation_nested_1 | 31.5010μs | 10.9426μs | 91.3863 KOps/s | 91.1254 KOps/s | |
| test_creation_nested_2 | 34.1630μs | 12.0034μs | 83.3097 KOps/s | 82.4825 KOps/s | |
| test_creation_many_keys[10] | 55.2740μs | 18.2064μs | 54.9258 KOps/s | 53.8795 KOps/s | |
| test_creation_many_keys[50] | 0.1062ms | 77.9965μs | 12.8211 KOps/s | 12.6029 KOps/s | |
| test_creation_many_keys[100] | 0.1907ms | 0.1521ms | 6.5744 KOps/s | 6.5242 KOps/s | |
| test_creation_nested_many_keys[10] | 63.5340μs | 39.3007μs | 25.4449 KOps/s | 25.0940 KOps/s | |
| test_creation_nested_many_keys[50] | 0.1907ms | 0.1585ms | 6.3106 KOps/s | 6.2140 KOps/s | |
| test_clone | 43.4130μs | 13.4554μs | 74.3196 KOps/s | 73.8232 KOps/s | |
| test_getitem[int] | 1.6938ms | 14.6725μs | 68.1549 KOps/s | 56.5448 KOps/s | |
| test_getitem[slice_int] | 0.1403ms | 25.2186μs | 39.6533 KOps/s | 39.5271 KOps/s | |
| test_getitem[range] | 0.1789ms | 61.9277μs | 16.1479 KOps/s | 15.9466 KOps/s | |
| test_getitem[tuple] | 0.1525ms | 24.4048μs | 40.9755 KOps/s | 41.0721 KOps/s | |
| test_getitem[list] | 0.1868ms | 56.7096μs | 17.6337 KOps/s | 17.2111 KOps/s | |
| test_setitem_dim[int] | 47.5530μs | 26.1626μs | 38.2225 KOps/s | 38.1474 KOps/s | |
| test_setitem_dim[slice_int] | 67.2650μs | 44.9331μs | 22.2553 KOps/s | 22.3278 KOps/s | |
| test_setitem_dim[range] | 0.1185ms | 92.9428μs | 10.7593 KOps/s | 10.6128 KOps/s | |
| test_setitem_dim[tuple] | 64.0340μs | 40.9599μs | 24.4141 KOps/s | 24.0456 KOps/s | |
| test_setitem | 56.4730μs | 18.6125μs | 53.7272 KOps/s | 54.2373 KOps/s | |
| test_set | 44.8130μs | 17.5337μs | 57.0330 KOps/s | 56.9990 KOps/s | |
| test_set_shared | 0.5886ms | 0.2072ms | 4.8263 KOps/s | 4.8242 KOps/s | |
| test_update | 0.1620ms | 22.6801μs | 44.0915 KOps/s | 44.4513 KOps/s | |
| test_update_nested | 75.1550μs | 35.0669μs | 28.5169 KOps/s | 28.7394 KOps/s | |
| test_update__nested | 0.4492ms | 34.4823μs | 29.0003 KOps/s | 28.7670 KOps/s | |
| test_set_nested | 63.9340μs | 19.5732μs | 51.0904 KOps/s | 51.4158 KOps/s | |
| test_set_nested_new | 62.4040μs | 24.4314μs | 40.9310 KOps/s | 40.1596 KOps/s | |
| test_select | 80.7650μs | 42.6247μs | 23.4606 KOps/s | 23.2035 KOps/s | |
| test_select_nested | 0.1071ms | 75.3083μs | 13.2788 KOps/s | 13.2925 KOps/s | |
| test_exclude_nested | 0.1345ms | 98.0142μs | 10.2026 KOps/s | 10.2168 KOps/s | |
| test_empty[True] | 0.5031ms | 0.4415ms | 2.2648 KOps/s | 2.2652 KOps/s | |
| test_empty[False] | 7.4430μs | 1.3264μs | 753.8969 KOps/s | 759.1225 KOps/s | |
| test_to | 0.1031ms | 73.6931μs | 13.5698 KOps/s | 13.6123 KOps/s | |
| test_to_nonblocking | 0.1004ms | 65.6954μs | 15.2218 KOps/s | 15.1455 KOps/s | |
| test_unbind_speed | 0.3704ms | 0.3261ms | 3.0666 KOps/s | 3.0538 KOps/s | |
| test_unbind_speed_stack0 | 0.3758ms | 0.3236ms | 3.0899 KOps/s | 3.0464 KOps/s | |
| test_unbind_speed_stack1 | 0.1026s | 0.9122ms | 1.0963 KOps/s | 1.1870 KOps/s | |
| test_split | 1.2214ms | 1.1458ms | 872.7158 Ops/s | 784.4757 Ops/s | |
| test_chunk | 0.1018s | 1.2085ms | 827.4524 Ops/s | 916.4067 Ops/s | |
| test_to_cpu_blocking | 28.6797ms | 28.5949ms | 34.9713 Ops/s | 46.8310 Ops/s | |
| test_to_cpu_global_sync | 11.3268ms | 11.2254ms | 89.0834 Ops/s | 89.2523 Ops/s | |
| test_to_cpu_event_sync | 12.3964ms | 12.2001ms | 81.9663 Ops/s | 81.8864 Ops/s | |
| test_to_cpu_default | 0.1141s | 13.4967ms | 74.0924 Ops/s | 81.9372 Ops/s | |
| test_consolidate[False-None] | 4.1992ms | 4.1425ms | 241.4012 Ops/s | 219.7046 Ops/s | |
| test_consolidate[default-None] | 2.4503ms | 2.0288ms | 492.8990 Ops/s | 489.2368 Ops/s | |
| test_consolidate[reduce-overhead-None] | 2.3563ms | 1.9410ms | 515.2033 Ops/s | 513.1994 Ops/s | |
| test_consolidate_njt[False-None] | 8.9696ms | 8.5466ms | 117.0054 Ops/s | 117.0215 Ops/s | |
| test_to[False-False-None] | 2.4840ms | 2.0686ms | 483.4101 Ops/s | 478.3350 Ops/s | |
| test_to[True-False-None] | 2.3379ms | 1.9444ms | 514.3039 Ops/s | 511.3351 Ops/s | |
| test_to[within-False-None] | 6.6408ms | 6.2160ms | 160.8741 Ops/s | 161.4492 Ops/s | |
| test_to[True-default-None] | 8.0069ms | 7.6223ms | 131.1943 Ops/s | 129.0495 Ops/s | |
| test_to_njt[False-False-None] | 8.9258ms | 8.5347ms | 117.1685 Ops/s | 115.4941 Ops/s | |
| test_to_njt[True-False-None] | 7.4462ms | 7.0399ms | 142.0474 Ops/s | 137.3797 Ops/s | |
| test_to_njt[within-False-None] | 16.9708ms | 16.0244ms | 62.4048 Ops/s | 63.6655 Ops/s | |
| test_creation[device0] | 0.5119ms | 0.1215ms | 8.2306 KOps/s | 8.4433 KOps/s | |
| test_creation_from_tensor | 0.5345ms | 0.1156ms | 8.6474 KOps/s | 8.6200 KOps/s | |
| test_add_one[memmap_tensor0] | 0.4496ms | 6.4104μs | 155.9960 KOps/s | 151.0374 KOps/s | |
| test_contiguous[memmap_tensor0] | 14.4910μs | 0.7123μs | 1.4038 MOps/s | 1.9360 MOps/s | |
| test_stack[memmap_tensor0] | 27.5920μs | 4.6022μs | 217.2870 KOps/s | 222.4626 KOps/s | |
| test_memmaptd_index | 1.1114ms | 0.2660ms | 3.7599 KOps/s | 3.7686 KOps/s | |
| test_memmaptd_index_astensor | 0.7878ms | 0.3628ms | 2.7566 KOps/s | 2.7754 KOps/s | |
| test_memmaptd_index_op | 1.0371ms | 0.6061ms | 1.6500 KOps/s | 1.6322 KOps/s | |
| test_serialize_model | 0.1383s | 0.1363s | 7.3374 Ops/s | 7.3509 Ops/s | |
| test_serialize_model_pickle | 1.3654s | 1.2167s | 0.8219 Ops/s | 0.8264 Ops/s | |
| test_serialize_weights | 0.1391s | 0.1351s | 7.4044 Ops/s | 7.3257 Ops/s | |
| test_serialize_weights_returnearly | 0.4290s | 92.8567ms | 10.7693 Ops/s | 5.9994 Ops/s | |
| test_serialize_weights_pickle | 1.3659s | 1.1890s | 0.8410 Ops/s | 0.8218 Ops/s | |
| test_reshape_pytree | 0.1996ms | 33.6639μs | 29.7054 KOps/s | 29.8638 KOps/s | |
| test_reshape_td | 72.1040μs | 44.1659μs | 22.6419 KOps/s | 21.4426 KOps/s | |
| test_view_pytree | 0.2195ms | 33.4321μs | 29.9114 KOps/s | 30.0761 KOps/s | |
| test_view_td | 85.4450μs | 51.5663μs | 19.3925 KOps/s | 18.9994 KOps/s | |
| test_unbind_pytree | 0.2375ms | 36.9467μs | 27.0660 KOps/s | 26.5668 KOps/s | |
| test_unbind_td | 0.1993ms | 48.7119μs | 20.5289 KOps/s | 20.2239 KOps/s | |
| test_split_pytree | 0.2305ms | 43.5573μs | 22.9583 KOps/s | 23.4280 KOps/s | |
| test_split_td | 0.1852ms | 65.6773μs | 15.2260 KOps/s | 15.3263 KOps/s | |
| test_add_pytree | 0.2371ms | 42.6719μs | 23.4346 KOps/s | 23.7134 KOps/s | |
| test_add_td | 85.3950μs | 53.7773μs | 18.5952 KOps/s | 18.9010 KOps/s | |
| test_compile_add_one_nested[tensordict-compile] | 0.1912ms | 0.1390ms | 7.1933 KOps/s | 6.7170 KOps/s | |
| test_compile_add_one_nested[tensordict-eager] | 0.4088ms | 0.1925ms | 5.1955 KOps/s | 5.1568 KOps/s | |
| test_compile_add_one_nested[pytree-compile] | 0.1550ms | 0.1083ms | 9.2362 KOps/s | 9.1199 KOps/s | |
| test_compile_add_one_nested[pytree-eager] | 0.4393ms | 0.1825ms | 5.4793 KOps/s | 5.5181 KOps/s | |
| test_compile_copy_nested[tensordict-compile] | 0.1505ms | 29.9605μs | 33.3773 KOps/s | 30.2496 KOps/s | |
| test_compile_copy_nested[tensordict-eager] | 90.9460μs | 52.6651μs | 18.9879 KOps/s | 18.4666 KOps/s | |
| test_compile_copy_nested[pytree-compile] | 45.0430μs | 10.0935μs | 99.0736 KOps/s | 102.4606 KOps/s | |
| test_compile_copy_nested[pytree-eager] | 0.4365ms | 70.7620μs | 14.1319 KOps/s | 14.2688 KOps/s | |
| test_compile_add_one_flat[tensordict-compile] | 0.3040ms | 0.1790ms | 5.5852 KOps/s | 5.3138 KOps/s | |
| test_compile_add_one_flat[tensordict-eager] | 0.3052ms | 0.2530ms | 3.9520 KOps/s | 3.8473 KOps/s | |
| test_compile_add_one_flat[tensorclass-compile] | 0.1777ms | 0.1183ms | 8.4561 KOps/s | 8.2495 KOps/s | |
| test_compile_add_one_flat[tensorclass-eager] | 0.1038ms | 70.2497μs | 14.2349 KOps/s | 14.2666 KOps/s | |
| test_compile_add_one_flat[pytree-compile] | 0.2189ms | 0.1594ms | 6.2747 KOps/s | 6.1735 KOps/s | |
| test_compile_add_one_flat[pytree-eager] | 0.8351ms | 0.5285ms | 1.8920 KOps/s | 1.8701 KOps/s | |
| test_compile_add_self_flat[tensordict-eager] | 0.4601ms | 0.3082ms | 3.2442 KOps/s | 3.1722 KOps/s | |
| test_compile_add_self_flat[tensordict-compile] | 0.2312ms | 0.1795ms | 5.5706 KOps/s | 5.2660 KOps/s | |
| test_compile_add_self_flat[tensorclass-eager] | 0.1263ms | 86.1686μs | 11.6052 KOps/s | 11.6273 KOps/s | |
| test_compile_add_self_flat[tensorclass-compile] | 0.2020ms | 0.1202ms | 8.3224 KOps/s | 8.0717 KOps/s | |
| test_compile_add_self_flat[pytree-eager] | 0.6595ms | 0.4360ms | 2.2938 KOps/s | 2.2500 KOps/s | |
| test_compile_add_self_flat[pytree-compile] | 0.1948ms | 0.1595ms | 6.2700 KOps/s | 6.1377 KOps/s | |
| test_compile_copy_flat[tensordict-compile] | 75.4440μs | 24.2088μs | 41.3073 KOps/s | 39.6893 KOps/s | |
| test_compile_copy_flat[tensordict-eager] | 97.9260μs | 41.5771μs | 24.0517 KOps/s | 23.8854 KOps/s | |
| test_compile_copy_flat[pytree-compile] | 36.7020μs | 11.0939μs | 90.1396 KOps/s | 91.2262 KOps/s | |
| test_compile_copy_flat[pytree-eager] | 0.1777s | 62.5607μs | 15.9845 KOps/s | 18.9695 KOps/s | |
| test_compile_assign_and_add[tensordict-compile] | 2.0217ms | 0.1742ms | 5.7416 KOps/s | 5.3944 KOps/s | |
| test_compile_assign_and_add[tensordict-eager] | 3.4254ms | 3.3108ms | 302.0375 Ops/s | 298.5079 Ops/s | |
| test_compile_assign_and_add[pytree-compile] | 1.9667ms | 0.1620ms | 6.1711 KOps/s | 6.0638 KOps/s | |
| test_compile_assign_and_add[pytree-eager] | 2.9084ms | 2.7747ms | 360.3953 Ops/s | 357.4238 Ops/s | |
| test_compile_indexing[tensor-tensordict-compile] | 0.1443ms | 0.1091ms | 9.1664 KOps/s | 8.8923 KOps/s | |
| test_compile_indexing[tensor-tensordict-eager] | 0.3139ms | 73.1973μs | 13.6617 KOps/s | 13.5795 KOps/s | |
| test_compile_indexing[tensor-tensorclass-compile] | 0.1441ms | 96.7933μs | 10.3313 KOps/s | 10.1571 KOps/s | |
| test_compile_indexing[tensor-tensorclass-eager] | 0.2568ms | 44.8076μs | 22.3176 KOps/s | 21.8954 KOps/s | |
| test_compile_indexing[tensor-pytree-compile] | 0.1607ms | 97.9076μs | 10.2137 KOps/s | 10.1779 KOps/s | |
| test_compile_indexing[tensor-pytree-eager] | 0.2588ms | 44.4854μs | 22.4793 KOps/s | 21.9225 KOps/s | |
| test_compile_indexing[slice-tensordict-compile] | 0.1013ms | 56.0597μs | 17.8381 KOps/s | 17.4101 KOps/s | |
| test_compile_indexing[slice-tensordict-eager] | 0.2277ms | 27.7994μs | 35.9720 KOps/s | 34.2563 KOps/s | |
| test_compile_indexing[slice-tensorclass-compile] | 0.1050ms | 45.9639μs | 21.7562 KOps/s | 21.7703 KOps/s | |
| test_compile_indexing[slice-tensorclass-eager] | 0.2682ms | 22.5207μs | 44.4035 KOps/s | 42.8107 KOps/s | |
| test_compile_indexing[slice-pytree-compile] | 90.5760μs | 46.4560μs | 21.5258 KOps/s | 21.5461 KOps/s | |
| test_compile_indexing[slice-pytree-eager] | 0.2694ms | 22.6455μs | 44.1589 KOps/s | 43.3897 KOps/s | |
| test_compile_indexing[int-tensordict-compile] | 95.1760μs | 56.5719μs | 17.6766 KOps/s | 17.2876 KOps/s | |
| test_compile_indexing[int-tensordict-eager] | 0.2197ms | 27.6473μs | 36.1699 KOps/s | 34.6056 KOps/s | |
| test_compile_indexing[int-tensorclass-compile] | 86.4950μs | 46.0830μs | 21.7000 KOps/s | 21.5035 KOps/s | |
| test_compile_indexing[int-tensorclass-eager] | 0.2594ms | 22.5338μs | 44.3777 KOps/s | 43.3487 KOps/s | |
| test_compile_indexing[int-pytree-compile] | 92.5550μs | 46.3123μs | 21.5926 KOps/s | 21.5913 KOps/s | |
| test_compile_indexing[int-pytree-eager] | 0.2784ms | 22.5845μs | 44.2782 KOps/s | 43.4472 KOps/s | |
| test_mod_add[eager] | 0.1031ms | 51.9254μs | 19.2584 KOps/s | 19.5087 KOps/s | |
| test_mod_add[compile] | 0.1818ms | 0.1056ms | 9.4732 KOps/s | 9.3938 KOps/s | |
| test_mod_add[compile-overhead] | 0.2339ms | 0.1493ms | 6.6990 KOps/s | 6.5770 KOps/s | |
| test_mod_wrap[eager] | 0.3867ms | 0.2942ms | 3.3986 KOps/s | 3.4054 KOps/s | |
| test_mod_wrap[compile] | 0.4236ms | 0.3497ms | 2.8600 KOps/s | 2.8368 KOps/s | |
| test_mod_wrap[compile-overhead] | 7.4151ms | 4.0725ms | 245.5487 Ops/s | 249.4051 Ops/s | |
| test_mod_wrap_and_backward[eager] | 1.6404ms | 1.4990ms | 667.1143 Ops/s | 641.6356 Ops/s | |
| test_mod_wrap_and_backward[compile] | 1.9471ms | 1.4896ms | 671.3062 Ops/s | 680.1041 Ops/s | |
| test_mod_wrap_and_backward[compile-overhead] | 1.2648ms | 0.8990ms | 1.1123 KOps/s | 1.1067 KOps/s | |
| test_seq_add[eager] | 0.5942ms | 0.1681ms | 5.9474 KOps/s | 6.3232 KOps/s | |
| test_seq_add[compile] | 0.5637ms | 0.1217ms | 8.2194 KOps/s | 8.4139 KOps/s | |
| test_seq_add[compile-overhead] | 0.6055ms | 0.1641ms | 6.0952 KOps/s | 6.2743 KOps/s | |
| test_seq_wrap[eager] | 1.0336ms | 0.5575ms | 1.7937 KOps/s | 1.8452 KOps/s | |
| test_seq_wrap[compile] | 0.4700ms | 0.3682ms | 2.7162 KOps/s | 2.5907 KOps/s | |
| test_seq_wrap[compile-overhead] | 0.3730ms | 0.2645ms | 3.7810 KOps/s | 3.7111 KOps/s | |
| test_func_call_runtime[False-eager] | 0.9143ms | 0.8432ms | 1.1860 KOps/s | 1.1039 KOps/s | |
| test_func_call_runtime[False-compile] | 0.9995ms | 0.9144ms | 1.0936 KOps/s | 1.0425 KOps/s | |
| test_func_call_runtime[False-compile-overhead] | 0.5232ms | 0.4628ms | 2.1607 KOps/s | 2.1482 KOps/s | |
| test_func_call_runtime[True-eager] | 1.1588ms | 1.0958ms | 912.5944 Ops/s | 913.0356 Ops/s | |
| test_func_call_runtime[True-compile] | 0.9918ms | 0.9228ms | 1.0836 KOps/s | 1.0758 KOps/s | |
| test_func_call_runtime[True-compile-overhead] | 0.5375ms | 0.4726ms | 2.1159 KOps/s | 2.0724 KOps/s | |
| test_func_call_cm_runtime[False-eager] | 0.9085ms | 0.8414ms | 1.1884 KOps/s | 1.1740 KOps/s | |
| test_func_call_cm_runtime[False-compile] | 1.1157ms | 0.9185ms | 1.0887 KOps/s | 1.0839 KOps/s | |
| test_func_call_cm_runtime[False-compile-overhead] | 0.5525ms | 0.4633ms | 2.1584 KOps/s | 2.1344 KOps/s | |
| test_func_call_cm_runtime[True-eager] | 1.3616ms | 1.2323ms | 811.5107 Ops/s | 797.2967 Ops/s | |
| test_func_call_cm_runtime[True-compile] | 1.1708ms | 0.9541ms | 1.0481 KOps/s | 1.0329 KOps/s | |
| test_func_call_cm_runtime[True-compile-overhead] | 0.5887ms | 0.5088ms | 1.9652 KOps/s | 1.9475 KOps/s | |
| test_vmap_func_call_cm_runtime[eager] | 2.8421ms | 2.3551ms | 424.6035 Ops/s | 419.1372 Ops/s | |
| test_vmap_func_call_cm_runtime[compile] | 1.0543ms | 0.9801ms | 1.0203 KOps/s | 1.0132 KOps/s | |
| test_vmap_func_call_cm_runtime[compile-overhead] | 0.5594ms | 0.5125ms | 1.9511 KOps/s | 1.9274 KOps/s | |
| test_distributed | 0.7950ms | 0.1533ms | 6.5217 KOps/s | 6.4605 KOps/s | |
| test_tdmodule | 0.3039ms | 29.3107μs | 34.1172 KOps/s | 33.3891 KOps/s | |
| test_tdmodule_dispatch | 78.2150μs | 47.5921μs | 21.0119 KOps/s | 20.6943 KOps/s | |
| test_tdseq | 48.6130μs | 27.9095μs | 35.8301 KOps/s | 35.3509 KOps/s | |
| test_tdseq_dispatch | 71.4640μs | 49.1078μs | 20.3634 KOps/s | 19.6114 KOps/s | |
| test_instantiation_functorch | 2.1494ms | 2.0778ms | 481.2786 Ops/s | 480.6254 Ops/s | |
| test_exec_functorch | 0.2228ms | 0.1812ms | 5.5196 KOps/s | 5.5931 KOps/s | |
| test_exec_functional_call | 0.2319ms | 0.1618ms | 6.1802 KOps/s | 6.2185 KOps/s | |
| test_exec_td_decorator | 0.4500ms | 0.2373ms | 4.2142 KOps/s | 4.1639 KOps/s | |
| test_vmap_mlp_speed_decorator[True-True] | 1.0394ms | 0.8224ms | 1.2160 KOps/s | 1.1970 KOps/s | |
| test_vmap_mlp_speed_decorator[True-False] | 0.9949ms | 0.8218ms | 1.2169 KOps/s | 1.2004 KOps/s | |
| test_vmap_mlp_speed_decorator[False-True] | 0.8870ms | 0.7108ms | 1.4069 KOps/s | 1.3834 KOps/s | |
| test_vmap_mlp_speed_decorator[False-False] | 0.8879ms | 0.7111ms | 1.4063 KOps/s | 1.3873 KOps/s | |
| test_vmap_transformer_speed_decorator[True-True] | 21.3341ms | 20.5888ms | 48.5702 Ops/s | 48.4520 Ops/s | |
| test_vmap_transformer_speed_decorator[True-False] | 20.9925ms | 20.5484ms | 48.6656 Ops/s | 48.4422 Ops/s | |
| test_vmap_transformer_speed_decorator[False-True] | 21.2605ms | 20.3721ms | 49.0868 Ops/s | 48.8741 Ops/s | |
| test_vmap_transformer_speed_decorator[False-False] | 20.5323ms | 20.3666ms | 49.1000 Ops/s | 48.8326 Ops/s | |
| test_to_module_speed[True] | 1.5587ms | 1.4730ms | 678.8871 Ops/s | 669.3019 Ops/s | |
| test_to_module_speed[False] | 1.5779ms | 1.4379ms | 695.4353 Ops/s | 684.2293 Ops/s | |
| test_tc_init | 78.5750μs | 46.2440μs | 21.6244 KOps/s | 20.9390 KOps/s | |
| test_tc_init_tensor_only | 29.0510μs | 10.0325μs | 99.6762 KOps/s | 99.9967 KOps/s | |
| test_tc_init_nested | 0.1283ms | 93.3694μs | 10.7101 KOps/s | 10.4802 KOps/s | |
| test_tc_init_many_fields | 46.2130μs | 16.6994μs | 59.8823 KOps/s | 59.8179 KOps/s | |
| test_tc_first_layer_tensor | 19.9310μs | 1.8698μs | 534.8207 KOps/s | 542.5069 KOps/s | |
| test_tc_first_layer_tensor_only | 5.1631μs | 0.7544μs | 1.3256 MOps/s | 1.3106 MOps/s | |
| test_tc_first_layer_tensor_set | 33.6220μs | 4.2429μs | 235.6856 KOps/s | 235.6754 KOps/s | |
| test_tc_first_layer_tensor_only_set | 20.7510μs | 3.1855μs | 313.9203 KOps/s | 309.2807 KOps/s | |
| test_tc_first_layer_nontensor | 55.0040μs | 6.2494μs | 160.0162 KOps/s | 160.7073 KOps/s | |
| test_tc_second_layer_tensor | 35.7630μs | 4.4877μs | 222.8313 KOps/s | 225.1473 KOps/s | |
| test_tc_second_layer_nontensor | 67.3940μs | 8.8011μs | 113.6223 KOps/s | 112.6837 KOps/s | |
| test_unbind | 0.2429s | 14.1877ms | 70.4837 Ops/s | 57.3203 Ops/s | |
| test_full_like | 4.6989ms | 4.3685ms | 228.9135 Ops/s | 227.8356 Ops/s | |
| test_zeros_like | 11.0574ms | 10.5606ms | 94.6920 Ops/s | 228.9871 Ops/s | |
| test_ones_like | 10.6587ms | 10.5408ms | 94.8692 Ops/s | 229.1657 Ops/s | |
| test_clone | 15.1738ms | 15.0345ms | 66.5135 Ops/s | 154.1283 Ops/s | |
| test_squeeze | 0.1521ms | 14.2765μs | 70.0453 KOps/s | 69.3117 KOps/s | |
| test_unsqueeze | 0.2615ms | 0.1111ms | 9.0018 KOps/s | 9.0721 KOps/s | |
| test_split | 0.2464ms | 0.1834ms | 5.4523 KOps/s | 5.3162 KOps/s | |
| test_permute | 0.2802ms | 0.2046ms | 4.8867 KOps/s | 4.8910 KOps/s | |
| test_stack | 53.1967ms | 51.6606ms | 19.3571 Ops/s | 19.4468 Ops/s | |
| test_cat | 51.7298ms | 51.5539ms | 19.3972 Ops/s | 23.2310 Ops/s |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
Replace per-row GETRANGE/SETRANGE pipeline commands with server-side
Lua scripts for tensor/list/bool indices. Each key now emits exactly
one EVAL command regardless of the number of indexed positions.
Hybrid strategy:
This gives deterministic O(K) commands for K keys with O(N*row_size)
bandwidth, eliminating the covering-range waste for sparse indices
like td[tensor([0, 1000])].
Fancy/bool writes improved ~2.5x (5.9ms -> 2.4ms).