Skip to content

[Performance] Use Lua scripts for scattered index operations#1571

Open
vmoens wants to merge 1 commit intogh/vmoens/57/basefrom
gh/vmoens/57/head
Open

[Performance] Use Lua scripts for scattered index operations#1571
vmoens wants to merge 1 commit intogh/vmoens/57/basefrom
gh/vmoens/57/head

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Feb 14, 2026

Stack from ghstack (oldest at bottom):

Replace per-row GETRANGE/SETRANGE pipeline commands with server-side
Lua scripts for tensor/list/bool indices. Each key now emits exactly
one EVAL command regardless of the number of indexed positions.

Hybrid strategy:

  • int / slice (any step): single GETRANGE + local stride (unchanged)
  • tensor / list / bool: Lua GETRANGES script (new)
  • step>1 writes: covering-range RMW (unchanged)
  • scattered writes: Lua SETRANGES script (new)

This gives deterministic O(K) commands for K keys with O(N*row_size)
bandwidth, eliminating the covering-range waste for sparse indices
like td[tensor([0, 1000])].

Fancy/bool writes improved ~2.5x (5.9ms -> 2.4ms).

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 14, 2026
Replace per-row GETRANGE/SETRANGE pipeline commands with server-side
Lua scripts for tensor/list/bool indices. Each key now emits exactly
one EVAL command regardless of the number of indexed positions.

Hybrid strategy:
- int / slice (any step): single GETRANGE + local stride (unchanged)
- tensor / list / bool: Lua GETRANGES script (new)
- step>1 writes: covering-range RMW (unchanged)
- scattered writes: Lua SETRANGES script (new)

This gives deterministic O(K) commands for K keys with O(N*row_size)
bandwidth, eliminating the covering-range waste for sparse indices
like td[tensor([0, 1000])].

Fancy/bool writes improved ~2.5x (5.9ms -> 2.4ms).


ghstack-source-id: 59fb136
Pull-Request: #1571
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 14, 2026
@github-actions
Copy link
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 243. Improved: $\large\color{#35bf28}16$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 35.9310μs 14.5620μs 68.6718 KOps/s 69.3438 KOps/s $\color{#d91a1a}-0.97\%$
test_plain_set_stack_nested 36.5010μs 14.7715μs 67.6979 KOps/s 69.0744 KOps/s $\color{#d91a1a}-1.99\%$
test_plain_set_nested_inplace 40.6010μs 16.1382μs 61.9648 KOps/s 61.8277 KOps/s $\color{#35bf28}+0.22\%$
test_plain_set_stack_nested_inplace 46.5110μs 15.9019μs 62.8857 KOps/s 62.5348 KOps/s $\color{#35bf28}+0.56\%$
test_items 34.0610μs 5.5538μs 180.0572 KOps/s 181.3411 KOps/s $\color{#d91a1a}-0.71\%$
test_items_nested 0.5763ms 0.5225ms 1.9140 KOps/s 1.9554 KOps/s $\color{#d91a1a}-2.11\%$
test_items_nested_locked 0.5717ms 0.5213ms 1.9184 KOps/s 1.9419 KOps/s $\color{#d91a1a}-1.21\%$
test_items_nested_leaf 0.1180ms 91.9692μs 10.8732 KOps/s 10.8536 KOps/s $\color{#35bf28}+0.18\%$
test_items_stack_nested 0.6002ms 0.5216ms 1.9173 KOps/s 1.9492 KOps/s $\color{#d91a1a}-1.63\%$
test_items_stack_nested_leaf 0.1223ms 91.7531μs 10.8988 KOps/s 10.8081 KOps/s $\color{#35bf28}+0.84\%$
test_items_stack_nested_locked 0.5772ms 0.5225ms 1.9138 KOps/s 1.9367 KOps/s $\color{#d91a1a}-1.19\%$
test_keys 30.2800μs 4.1159μs 242.9602 KOps/s 241.8597 KOps/s $\color{#35bf28}+0.46\%$
test_keys_nested 0.1470ms 0.1179ms 8.4794 KOps/s 8.5366 KOps/s $\color{#d91a1a}-0.67\%$
test_keys_nested_locked 88.8760ms 0.1398ms 7.1518 KOps/s 7.9307 KOps/s $\textbf{\color{#d91a1a}-9.82\%}$
test_keys_nested_leaf 0.1384ms 0.1090ms 9.1707 KOps/s 9.2673 KOps/s $\color{#d91a1a}-1.04\%$
test_keys_stack_nested 0.1493ms 0.1192ms 8.3893 KOps/s 8.5260 KOps/s $\color{#d91a1a}-1.60\%$
test_keys_stack_nested_leaf 0.1353ms 0.1085ms 9.2190 KOps/s 9.2959 KOps/s $\color{#d91a1a}-0.83\%$
test_keys_stack_nested_locked 0.1610ms 0.1285ms 7.7795 KOps/s 7.9669 KOps/s $\color{#d91a1a}-2.35\%$
test_values 6.1380μs 0.9961μs 1.0039 MOps/s 1.0025 MOps/s $\color{#35bf28}+0.14\%$
test_values_nested 73.8410μs 46.7094μs 21.4090 KOps/s 21.3443 KOps/s $\color{#35bf28}+0.30\%$
test_values_nested_locked 83.6020μs 49.5439μs 20.1841 KOps/s 20.2728 KOps/s $\color{#d91a1a}-0.44\%$
test_values_nested_leaf 82.5720μs 52.8102μs 18.9357 KOps/s 18.9705 KOps/s $\color{#d91a1a}-0.18\%$
test_values_stack_nested 0.1224ms 46.1585μs 21.6645 KOps/s 21.4608 KOps/s $\color{#35bf28}+0.95\%$
test_values_stack_nested_leaf 80.6520μs 52.2008μs 19.1568 KOps/s 18.9401 KOps/s $\color{#35bf28}+1.14\%$
test_values_stack_nested_locked 85.5520μs 49.4662μs 20.2158 KOps/s 20.1835 KOps/s $\color{#35bf28}+0.16\%$
test_membership 4.1852μs 0.8129μs 1.2301 MOps/s 1.2377 MOps/s $\color{#d91a1a}-0.61\%$
test_membership_nested 24.2410μs 2.9749μs 336.1479 KOps/s 335.5553 KOps/s $\color{#35bf28}+0.18\%$
test_membership_nested_leaf 26.1900μs 3.0172μs 331.4340 KOps/s 335.7182 KOps/s $\color{#d91a1a}-1.28\%$
test_membership_stacked_nested 33.8410μs 3.0119μs 332.0129 KOps/s 334.0291 KOps/s $\color{#d91a1a}-0.60\%$
test_membership_stacked_nested_leaf 27.6210μs 3.0132μs 331.8706 KOps/s 336.2902 KOps/s $\color{#d91a1a}-1.31\%$
test_membership_nested_last 33.7900μs 4.3708μs 228.7928 KOps/s 230.4465 KOps/s $\color{#d91a1a}-0.72\%$
test_membership_nested_leaf_last 29.6510μs 4.4110μs 226.7035 KOps/s 229.5998 KOps/s $\color{#d91a1a}-1.26\%$
test_membership_stacked_nested_last 34.0710μs 4.3407μs 230.3781 KOps/s 228.7252 KOps/s $\color{#35bf28}+0.72\%$
test_membership_stacked_nested_leaf_last 25.1300μs 4.3407μs 230.3787 KOps/s 230.0922 KOps/s $\color{#35bf28}+0.12\%$
test_nested_getleaf 48.1110μs 20.6752μs 48.3672 KOps/s 46.7511 KOps/s $\color{#35bf28}+3.46\%$
test_nested_get 58.1810μs 19.1893μs 52.1123 KOps/s 49.8121 KOps/s $\color{#35bf28}+4.62\%$
test_stacked_getleaf 57.6110μs 20.2224μs 49.4500 KOps/s 46.7939 KOps/s $\textbf{\color{#35bf28}+5.68\%}$
test_stacked_get 45.6710μs 19.4251μs 51.4798 KOps/s 49.2546 KOps/s $\color{#35bf28}+4.52\%$
test_nested_getitemleaf 46.8310μs 21.2409μs 47.0791 KOps/s 46.6567 KOps/s $\color{#35bf28}+0.91\%$
test_nested_getitem 45.9710μs 20.1906μs 49.5280 KOps/s 49.2473 KOps/s $\color{#35bf28}+0.57\%$
test_stacked_getitemleaf 45.4710μs 21.3594μs 46.8178 KOps/s 46.6820 KOps/s $\color{#35bf28}+0.29\%$
test_stacked_getitem 52.5710μs 20.3599μs 49.1161 KOps/s 48.8905 KOps/s $\color{#35bf28}+0.46\%$
test_lock_nested 7.8776ms 0.4591ms 2.1780 KOps/s 2.1835 KOps/s $\color{#d91a1a}-0.25\%$
test_lock_stack_nested 0.5002ms 0.4586ms 2.1807 KOps/s 2.1493 KOps/s $\color{#35bf28}+1.46\%$
test_unlock_nested 0.4685ms 0.3659ms 2.7328 KOps/s 2.7084 KOps/s $\color{#35bf28}+0.90\%$
test_unlock_stack_nested 0.4215ms 0.3682ms 2.7156 KOps/s 2.6606 KOps/s $\color{#35bf28}+2.07\%$
test_flatten_speed 0.1655ms 0.1178ms 8.4881 KOps/s 8.5872 KOps/s $\color{#d91a1a}-1.15\%$
test_unflatten_speed 0.6387ms 0.5727ms 1.7460 KOps/s 1.7513 KOps/s $\color{#d91a1a}-0.30\%$
test_common_ops 0.8418ms 0.6848ms 1.4603 KOps/s 1.4680 KOps/s $\color{#d91a1a}-0.53\%$
test_creation 0.1226ms 2.7499μs 363.6447 KOps/s 366.3530 KOps/s $\color{#d91a1a}-0.74\%$
test_creation_empty 29.5500μs 5.7996μs 172.4252 KOps/s 175.2394 KOps/s $\color{#d91a1a}-1.61\%$
test_creation_nested_1 31.8500μs 10.2736μs 97.3371 KOps/s 97.4462 KOps/s $\color{#d91a1a}-0.11\%$
test_creation_nested_2 39.7210μs 11.3006μs 88.4907 KOps/s 90.4903 KOps/s $\color{#d91a1a}-2.21\%$
test_creation_many_keys[10] 51.6210μs 17.1622μs 58.2675 KOps/s 59.5464 KOps/s $\color{#d91a1a}-2.15\%$
test_creation_many_keys[50] 95.8520μs 72.0606μs 13.8772 KOps/s 13.9863 KOps/s $\color{#d91a1a}-0.78\%$
test_creation_many_keys[100] 0.2025ms 0.1429ms 6.9960 KOps/s 7.0912 KOps/s $\color{#d91a1a}-1.34\%$
test_creation_nested_many_keys[10] 63.3110μs 36.9392μs 27.0715 KOps/s 27.2077 KOps/s $\color{#d91a1a}-0.50\%$
test_creation_nested_many_keys[50] 0.1865ms 0.1494ms 6.6947 KOps/s 6.7845 KOps/s $\color{#d91a1a}-1.32\%$
test_clone 44.7710μs 12.9296μs 77.3420 KOps/s 75.3930 KOps/s $\color{#35bf28}+2.59\%$
test_getitem[int] 1.6189ms 14.0510μs 71.1693 KOps/s 57.6310 KOps/s $\textbf{\color{#35bf28}+23.49\%}$
test_getitem[slice_int] 0.1346ms 24.0831μs 41.5229 KOps/s 40.9517 KOps/s $\color{#35bf28}+1.39\%$
test_getitem[range] 0.1627ms 60.2635μs 16.5938 KOps/s 16.4883 KOps/s $\color{#35bf28}+0.64\%$
test_getitem[tuple] 0.1426ms 23.4613μs 42.6234 KOps/s 42.3250 KOps/s $\color{#35bf28}+0.70\%$
test_getitem[list] 0.1778ms 57.9105μs 17.2680 KOps/s 17.7068 KOps/s $\color{#d91a1a}-2.48\%$
test_setitem_dim[int] 49.2310μs 25.5992μs 39.0638 KOps/s 39.1756 KOps/s $\color{#d91a1a}-0.29\%$
test_setitem_dim[slice_int] 87.9920μs 43.3572μs 23.0642 KOps/s 23.1140 KOps/s $\color{#d91a1a}-0.22\%$
test_setitem_dim[range] 0.1286ms 91.4503μs 10.9349 KOps/s 10.9478 KOps/s $\color{#d91a1a}-0.12\%$
test_setitem_dim[tuple] 77.9910μs 39.8109μs 25.1188 KOps/s 24.7635 KOps/s $\color{#35bf28}+1.43\%$
test_setitem 53.5810μs 17.4864μs 57.1872 KOps/s 55.5233 KOps/s $\color{#35bf28}+3.00\%$
test_set 53.3910μs 16.6399μs 60.0964 KOps/s 58.8948 KOps/s $\color{#35bf28}+2.04\%$
test_set_shared 0.5056ms 0.2029ms 4.9294 KOps/s 4.8883 KOps/s $\color{#35bf28}+0.84\%$
test_update 0.3462ms 21.3863μs 46.7589 KOps/s 45.6155 KOps/s $\color{#35bf28}+2.51\%$
test_update_nested 78.6410μs 33.6984μs 29.6750 KOps/s 29.4573 KOps/s $\color{#35bf28}+0.74\%$
test_update__nested 0.4699ms 33.0917μs 30.2191 KOps/s 29.6685 KOps/s $\color{#35bf28}+1.86\%$
test_set_nested 50.3810μs 18.6610μs 53.5878 KOps/s 52.0623 KOps/s $\color{#35bf28}+2.93\%$
test_set_nested_new 61.8910μs 23.4307μs 42.6790 KOps/s 41.5905 KOps/s $\color{#35bf28}+2.62\%$
test_select 80.4820μs 40.7945μs 24.5131 KOps/s 24.0638 KOps/s $\color{#35bf28}+1.87\%$
test_select_nested 97.9020μs 71.1593μs 14.0530 KOps/s 14.2497 KOps/s $\color{#d91a1a}-1.38\%$
test_exclude_nested 0.1237ms 93.0099μs 10.7515 KOps/s 11.0332 KOps/s $\color{#d91a1a}-2.55\%$
test_empty[True] 0.4792ms 0.4233ms 2.3626 KOps/s 2.4148 KOps/s $\color{#d91a1a}-2.16\%$
test_empty[False] 8.4250μs 1.2655μs 790.2289 KOps/s 799.3109 KOps/s $\color{#d91a1a}-1.14\%$
test_to 0.1007ms 73.0300μs 13.6930 KOps/s 13.9880 KOps/s $\color{#d91a1a}-2.11\%$
test_to_nonblocking 0.1172ms 63.6546μs 15.7098 KOps/s 15.7518 KOps/s $\color{#d91a1a}-0.27\%$
test_unbind_speed 0.3490ms 0.3139ms 3.1861 KOps/s 3.1830 KOps/s $\color{#35bf28}+0.10\%$
test_unbind_speed_stack0 0.3772ms 0.3131ms 3.1938 KOps/s 3.2299 KOps/s $\color{#d91a1a}-1.12\%$
test_unbind_speed_stack1 0.1032s 0.8844ms 1.1307 KOps/s 1.1191 KOps/s $\color{#35bf28}+1.04\%$
test_split 1.1557ms 1.0930ms 914.9265 Ops/s 915.3271 Ops/s $\color{#d91a1a}-0.04\%$
test_chunk 0.1029s 1.1638ms 859.2684 Ops/s 958.8233 Ops/s $\textbf{\color{#d91a1a}-10.38\%}$
test_to_cpu_blocking 19.1052ms 19.0012ms 52.6283 Ops/s 40.1713 Ops/s $\textbf{\color{#35bf28}+31.01\%}$
test_to_cpu_global_sync 11.0030ms 10.9206ms 91.5701 Ops/s 89.1188 Ops/s $\color{#35bf28}+2.75\%$
test_to_cpu_event_sync 0.1151s 13.1330ms 76.1440 Ops/s 81.9111 Ops/s $\textbf{\color{#d91a1a}-7.04\%}$
test_to_cpu_default 12.2201ms 11.9364ms 83.7771 Ops/s 82.0550 Ops/s $\color{#35bf28}+2.10\%$
test_consolidate[False-None] 4.0531ms 3.9457ms 253.4393 Ops/s 223.5667 Ops/s $\textbf{\color{#35bf28}+13.36\%}$
test_consolidate[default-None] 2.0082ms 1.9179ms 521.3978 Ops/s 494.8778 Ops/s $\textbf{\color{#35bf28}+5.36\%}$
test_consolidate[reduce-overhead-None] 1.9546ms 1.8544ms 539.2481 Ops/s 515.8135 Ops/s $\color{#35bf28}+4.54\%$
test_consolidate_njt[False-None] 8.3073ms 8.1317ms 122.9748 Ops/s 120.7718 Ops/s $\color{#35bf28}+1.82\%$
test_to[False-False-None] 2.0990ms 1.9854ms 503.6722 Ops/s 490.1707 Ops/s $\color{#35bf28}+2.75\%$
test_to[True-False-None] 2.1327ms 1.8684ms 535.2163 Ops/s 535.3325 Ops/s $\color{#d91a1a}-0.02\%$
test_to[within-False-None] 6.1437ms 5.8949ms 169.6383 Ops/s 167.4598 Ops/s $\color{#35bf28}+1.30\%$
test_to[True-default-None] 7.4509ms 7.3428ms 136.1878 Ops/s 125.6146 Ops/s $\textbf{\color{#35bf28}+8.42\%}$
test_to_njt[False-False-None] 8.3378ms 8.2460ms 121.2715 Ops/s 116.8855 Ops/s $\color{#35bf28}+3.75\%$
test_to_njt[True-False-None] 6.8284ms 6.7145ms 148.9306 Ops/s 142.0950 Ops/s $\color{#35bf28}+4.81\%$
test_to_njt[within-False-None] 15.1388ms 15.0029ms 66.6539 Ops/s 64.3186 Ops/s $\color{#35bf28}+3.63\%$
test_creation[device0] 0.4683ms 0.1135ms 8.8113 KOps/s 8.3675 KOps/s $\textbf{\color{#35bf28}+5.30\%}$
test_creation_from_tensor 0.4633ms 0.1117ms 8.9510 KOps/s 8.6310 KOps/s $\color{#35bf28}+3.71\%$
test_add_one[memmap_tensor0] 0.2264ms 6.2435μs 160.1659 KOps/s 152.0520 KOps/s $\textbf{\color{#35bf28}+5.34\%}$
test_contiguous[memmap_tensor0] 28.2000μs 0.6149μs 1.6264 MOps/s 2.2730 MOps/s $\textbf{\color{#d91a1a}-28.45\%}$
test_stack[memmap_tensor0] 24.7300μs 4.3388μs 230.4759 KOps/s 216.5658 KOps/s $\textbf{\color{#35bf28}+6.42\%}$
test_memmaptd_index 1.0700ms 0.2560ms 3.9066 KOps/s 3.9381 KOps/s $\color{#d91a1a}-0.80\%$
test_memmaptd_index_astensor 0.5001ms 0.3429ms 2.9166 KOps/s 2.8997 KOps/s $\color{#35bf28}+0.58\%$
test_memmaptd_index_op 0.8295ms 0.5830ms 1.7154 KOps/s 1.6490 KOps/s $\color{#35bf28}+4.03\%$
test_serialize_model 0.1379s 0.1365s 7.3252 Ops/s 7.2578 Ops/s $\color{#35bf28}+0.93\%$
test_serialize_model_pickle 1.6670s 1.2574s 0.7953 Ops/s 0.8259 Ops/s $\color{#d91a1a}-3.71\%$
test_serialize_weights 0.1366s 0.1340s 7.4610 Ops/s 7.3250 Ops/s $\color{#35bf28}+1.86\%$
test_serialize_weights_returnearly 0.4074s 88.2321ms 11.3337 Ops/s 11.3279 Ops/s $\color{#35bf28}+0.05\%$
test_serialize_weights_pickle 1.3759s 1.2185s 0.8207 Ops/s 0.8170 Ops/s $\color{#35bf28}+0.45\%$
test_reshape_pytree 0.2008ms 31.8796μs 31.3680 KOps/s 30.4521 KOps/s $\color{#35bf28}+3.01\%$
test_reshape_td 74.7620μs 42.3039μs 23.6385 KOps/s 23.1217 KOps/s $\color{#35bf28}+2.24\%$
test_view_pytree 0.2006ms 31.4213μs 31.8255 KOps/s 30.7457 KOps/s $\color{#35bf28}+3.51\%$
test_view_td 89.6810μs 49.5621μs 20.1767 KOps/s 19.5401 KOps/s $\color{#35bf28}+3.26\%$
test_unbind_pytree 0.2357ms 35.3877μs 28.2584 KOps/s 27.9350 KOps/s $\color{#35bf28}+1.16\%$
test_unbind_td 0.1068ms 46.5485μs 21.4830 KOps/s 21.4871 KOps/s $\color{#d91a1a}-0.02\%$
test_split_pytree 0.2457ms 40.5302μs 24.6730 KOps/s 24.1129 KOps/s $\color{#35bf28}+2.32\%$
test_split_td 0.1147ms 62.1331μs 16.0945 KOps/s 15.6781 KOps/s $\color{#35bf28}+2.66\%$
test_add_pytree 0.1898ms 41.2913μs 24.2182 KOps/s 24.2468 KOps/s $\color{#d91a1a}-0.12\%$
test_add_td 0.1056ms 51.3955μs 19.4570 KOps/s 19.0094 KOps/s $\color{#35bf28}+2.35\%$
test_compile_add_one_nested[tensordict-compile] 0.2032ms 0.1358ms 7.3653 KOps/s 6.9480 KOps/s $\textbf{\color{#35bf28}+6.01\%}$
test_compile_add_one_nested[tensordict-eager] 0.4104ms 0.1884ms 5.3082 KOps/s 5.3728 KOps/s $\color{#d91a1a}-1.20\%$
test_compile_add_one_nested[pytree-compile] 0.1739ms 0.1110ms 9.0107 KOps/s 9.2231 KOps/s $\color{#d91a1a}-2.30\%$
test_compile_add_one_nested[pytree-eager] 0.4291ms 0.1759ms 5.6854 KOps/s 5.6691 KOps/s $\color{#35bf28}+0.29\%$
test_compile_copy_nested[tensordict-compile] 60.3220μs 28.8608μs 34.6491 KOps/s 30.5084 KOps/s $\textbf{\color{#35bf28}+13.57\%}$
test_compile_copy_nested[tensordict-eager] 79.0320μs 49.7057μs 20.1184 KOps/s 20.1272 KOps/s $\color{#d91a1a}-0.04\%$
test_compile_copy_nested[pytree-compile] 0.1482ms 9.5114μs 105.1369 KOps/s 105.7495 KOps/s $\color{#d91a1a}-0.58\%$
test_compile_copy_nested[pytree-eager] 0.4493ms 66.6374μs 15.0066 KOps/s 15.0087 KOps/s $\color{#d91a1a}-0.01\%$
test_compile_add_one_flat[tensordict-compile] 0.2220ms 0.1717ms 5.8226 KOps/s 5.5139 KOps/s $\textbf{\color{#35bf28}+5.60\%}$
test_compile_add_one_flat[tensordict-eager] 0.3087ms 0.2477ms 4.0376 KOps/s 4.0288 KOps/s $\color{#35bf28}+0.22\%$
test_compile_add_one_flat[tensorclass-compile] 0.1553ms 0.1127ms 8.8732 KOps/s 8.5601 KOps/s $\color{#35bf28}+3.66\%$
test_compile_add_one_flat[tensorclass-eager] 0.1128ms 66.7636μs 14.9782 KOps/s 14.9741 KOps/s $\color{#35bf28}+0.03\%$
test_compile_add_one_flat[pytree-compile] 0.2240ms 0.1549ms 6.4553 KOps/s 6.2238 KOps/s $\color{#35bf28}+3.72\%$
test_compile_add_one_flat[pytree-eager] 0.8910ms 0.5187ms 1.9278 KOps/s 1.9040 KOps/s $\color{#35bf28}+1.25\%$
test_compile_add_self_flat[tensordict-eager] 0.3368ms 0.3013ms 3.3186 KOps/s 3.2952 KOps/s $\color{#35bf28}+0.71\%$
test_compile_add_self_flat[tensordict-compile] 0.2309ms 0.1818ms 5.5015 KOps/s 5.5024 KOps/s $\color{#d91a1a}-0.02\%$
test_compile_add_self_flat[tensorclass-eager] 0.1433ms 84.1232μs 11.8873 KOps/s 12.1298 KOps/s $\color{#d91a1a}-2.00\%$
test_compile_add_self_flat[tensorclass-compile] 0.1939ms 0.1157ms 8.6428 KOps/s 8.4293 KOps/s $\color{#35bf28}+2.53\%$
test_compile_add_self_flat[pytree-eager] 0.6498ms 0.4284ms 2.3343 KOps/s 2.3152 KOps/s $\color{#35bf28}+0.82\%$
test_compile_add_self_flat[pytree-compile] 0.2459ms 0.1628ms 6.1426 KOps/s 6.3447 KOps/s $\color{#d91a1a}-3.18\%$
test_compile_copy_flat[tensordict-compile] 56.2410μs 25.3189μs 39.4961 KOps/s 40.8239 KOps/s $\color{#d91a1a}-3.25\%$
test_compile_copy_flat[tensordict-eager] 71.9610μs 39.8086μs 25.1202 KOps/s 25.0221 KOps/s $\color{#35bf28}+0.39\%$
test_compile_copy_flat[pytree-compile] 0.1272ms 10.3785μs 96.3527 KOps/s 95.3937 KOps/s $\color{#35bf28}+1.01\%$
test_compile_copy_flat[pytree-eager] 0.4170ms 51.1698μs 19.5428 KOps/s 19.7130 KOps/s $\color{#d91a1a}-0.86\%$
test_compile_assign_and_add[tensordict-compile] 1.9882ms 0.1698ms 5.8883 KOps/s 5.6537 KOps/s $\color{#35bf28}+4.15\%$
test_compile_assign_and_add[tensordict-eager] 3.5138ms 3.2575ms 306.9813 Ops/s 303.3602 Ops/s $\color{#35bf28}+1.19\%$
test_compile_assign_and_add[pytree-compile] 1.8803ms 0.1577ms 6.3424 KOps/s 6.2618 KOps/s $\color{#35bf28}+1.29\%$
test_compile_assign_and_add[pytree-eager] 2.8960ms 2.7696ms 361.0595 Ops/s 360.9127 Ops/s $\color{#35bf28}+0.04\%$
test_compile_indexing[tensor-tensordict-compile] 0.2256ms 0.1054ms 9.4840 KOps/s 9.3731 KOps/s $\color{#35bf28}+1.18\%$
test_compile_indexing[tensor-tensordict-eager] 0.3504ms 70.5159μs 14.1812 KOps/s 14.0349 KOps/s $\color{#35bf28}+1.04\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1385ms 93.5312μs 10.6916 KOps/s 10.5406 KOps/s $\color{#35bf28}+1.43\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2459ms 45.2819μs 22.0839 KOps/s 22.2790 KOps/s $\color{#d91a1a}-0.88\%$
test_compile_indexing[tensor-pytree-compile] 0.1456ms 95.7553μs 10.4433 KOps/s 10.4425 KOps/s $+0.01\%$
test_compile_indexing[tensor-pytree-eager] 0.2679ms 47.7385μs 20.9475 KOps/s 22.2248 KOps/s $\textbf{\color{#d91a1a}-5.75\%}$
test_compile_indexing[slice-tensordict-compile] 0.2165ms 57.8222μs 17.2944 KOps/s 17.9740 KOps/s $\color{#d91a1a}-3.78\%$
test_compile_indexing[slice-tensordict-eager] 0.2290ms 28.9019μs 34.5999 KOps/s 36.9335 KOps/s $\textbf{\color{#d91a1a}-6.32\%}$
test_compile_indexing[slice-tensorclass-compile] 0.1495ms 46.9594μs 21.2950 KOps/s 22.2643 KOps/s $\color{#d91a1a}-4.35\%$
test_compile_indexing[slice-tensorclass-eager] 0.2711ms 23.1290μs 43.2357 KOps/s 45.2093 KOps/s $\color{#d91a1a}-4.37\%$
test_compile_indexing[slice-pytree-compile] 0.1416ms 44.6762μs 22.3833 KOps/s 21.7462 KOps/s $\color{#35bf28}+2.93\%$
test_compile_indexing[slice-pytree-eager] 0.2643ms 21.6736μs 46.1391 KOps/s 45.0502 KOps/s $\color{#35bf28}+2.42\%$
test_compile_indexing[int-tensordict-compile] 0.2340ms 55.2173μs 18.1103 KOps/s 17.4530 KOps/s $\color{#35bf28}+3.77\%$
test_compile_indexing[int-tensordict-eager] 0.2277ms 26.9743μs 37.0724 KOps/s 37.0184 KOps/s $\color{#35bf28}+0.15\%$
test_compile_indexing[int-tensorclass-compile] 80.8420μs 44.2830μs 22.5820 KOps/s 21.6281 KOps/s $\color{#35bf28}+4.41\%$
test_compile_indexing[int-tensorclass-eager] 0.2671ms 22.0402μs 45.3715 KOps/s 44.9964 KOps/s $\color{#35bf28}+0.83\%$
test_compile_indexing[int-pytree-compile] 80.1920μs 44.6408μs 22.4010 KOps/s 21.7069 KOps/s $\color{#35bf28}+3.20\%$
test_compile_indexing[int-pytree-eager] 0.2611ms 21.4755μs 46.5646 KOps/s 45.1241 KOps/s $\color{#35bf28}+3.19\%$
test_mod_add[eager] 0.1045ms 49.0265μs 20.3971 KOps/s 19.8954 KOps/s $\color{#35bf28}+2.52\%$
test_mod_add[compile] 0.4700ms 0.1067ms 9.3703 KOps/s 9.2756 KOps/s $\color{#35bf28}+1.02\%$
test_mod_add[compile-overhead] 0.3824ms 0.1440ms 6.9452 KOps/s 6.8098 KOps/s $\color{#35bf28}+1.99\%$
test_mod_wrap[eager] 0.3629ms 0.2986ms 3.3491 KOps/s 3.4976 KOps/s $\color{#d91a1a}-4.25\%$
test_mod_wrap[compile] 0.4414ms 0.3546ms 2.8204 KOps/s 2.9048 KOps/s $\color{#d91a1a}-2.91\%$
test_mod_wrap[compile-overhead] 7.3767ms 4.0461ms 247.1489 Ops/s 246.2275 Ops/s $\color{#35bf28}+0.37\%$
test_mod_wrap_and_backward[eager] 1.6020ms 1.4859ms 672.9949 Ops/s 668.6457 Ops/s $\color{#35bf28}+0.65\%$
test_mod_wrap_and_backward[compile] 1.5071ms 1.4112ms 708.6105 Ops/s 699.7653 Ops/s $\color{#35bf28}+1.26\%$
test_mod_wrap_and_backward[compile-overhead] 1.2261ms 0.8627ms 1.1591 KOps/s 1.1317 KOps/s $\color{#35bf28}+2.42\%$
test_seq_add[eager] 0.2049ms 0.1526ms 6.5542 KOps/s 6.5149 KOps/s $\color{#35bf28}+0.60\%$
test_seq_add[compile] 0.2053ms 0.1117ms 8.9490 KOps/s 8.5914 KOps/s $\color{#35bf28}+4.16\%$
test_seq_add[compile-overhead] 0.1838ms 0.1494ms 6.6952 KOps/s 6.4165 KOps/s $\color{#35bf28}+4.34\%$
test_seq_wrap[eager] 0.8940ms 0.5137ms 1.9468 KOps/s 1.9382 KOps/s $\color{#35bf28}+0.45\%$
test_seq_wrap[compile] 0.4032ms 0.3568ms 2.8026 KOps/s 2.7371 KOps/s $\color{#35bf28}+2.40\%$
test_seq_wrap[compile-overhead] 0.3205ms 0.2555ms 3.9138 KOps/s 3.7810 KOps/s $\color{#35bf28}+3.51\%$
test_func_call_runtime[False-eager] 0.9288ms 0.8245ms 1.2129 KOps/s 1.2048 KOps/s $\color{#35bf28}+0.67\%$
test_func_call_runtime[False-compile] 0.9466ms 0.8850ms 1.1299 KOps/s 1.1147 KOps/s $\color{#35bf28}+1.36\%$
test_func_call_runtime[False-compile-overhead] 0.4914ms 0.4435ms 2.2550 KOps/s 2.2185 KOps/s $\color{#35bf28}+1.64\%$
test_func_call_runtime[True-eager] 1.1454ms 1.0619ms 941.7006 Ops/s 939.6734 Ops/s $\color{#35bf28}+0.22\%$
test_func_call_runtime[True-compile] 1.0000ms 0.8926ms 1.1203 KOps/s 1.1010 KOps/s $\color{#35bf28}+1.76\%$
test_func_call_runtime[True-compile-overhead] 0.5024ms 0.4525ms 2.2102 KOps/s 2.1575 KOps/s $\color{#35bf28}+2.44\%$
test_func_call_cm_runtime[False-eager] 0.8838ms 0.8199ms 1.2196 KOps/s 1.2076 KOps/s $\color{#35bf28}+1.00\%$
test_func_call_cm_runtime[False-compile] 0.9637ms 0.8868ms 1.1277 KOps/s 1.0879 KOps/s $\color{#35bf28}+3.66\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4911ms 0.4462ms 2.2409 KOps/s 2.2156 KOps/s $\color{#35bf28}+1.14\%$
test_func_call_cm_runtime[True-eager] 1.2903ms 1.1912ms 839.4815 Ops/s 824.0648 Ops/s $\color{#35bf28}+1.87\%$
test_func_call_cm_runtime[True-compile] 0.9786ms 0.9288ms 1.0767 KOps/s 1.0613 KOps/s $\color{#35bf28}+1.45\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5429ms 0.4865ms 2.0555 KOps/s 2.0206 KOps/s $\color{#35bf28}+1.73\%$
test_vmap_func_call_cm_runtime[eager] 2.8163ms 2.3217ms 430.7264 Ops/s 424.7075 Ops/s $\color{#35bf28}+1.42\%$
test_vmap_func_call_cm_runtime[compile] 1.0654ms 0.9471ms 1.0558 KOps/s 1.0034 KOps/s $\textbf{\color{#35bf28}+5.23\%}$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5430ms 0.4929ms 2.0288 KOps/s 1.9930 KOps/s $\color{#35bf28}+1.80\%$
test_distributed 2.6297ms 0.1632ms 6.1260 KOps/s 6.5214 KOps/s $\textbf{\color{#d91a1a}-6.06\%}$
test_tdmodule 75.5420μs 27.8906μs 35.8544 KOps/s 34.5936 KOps/s $\color{#35bf28}+3.64\%$
test_tdmodule_dispatch 76.1120μs 46.1871μs 21.6510 KOps/s 21.7286 KOps/s $\color{#d91a1a}-0.36\%$
test_tdseq 45.8010μs 26.9731μs 37.0739 KOps/s 36.7498 KOps/s $\color{#35bf28}+0.88\%$
test_tdseq_dispatch 69.2720μs 47.5839μs 21.0155 KOps/s 20.7978 KOps/s $\color{#35bf28}+1.05\%$
test_instantiation_functorch 2.1670ms 1.9647ms 508.9750 Ops/s 502.5926 Ops/s $\color{#35bf28}+1.27\%$
test_exec_functorch 0.2140ms 0.1741ms 5.7443 KOps/s 5.6974 KOps/s $\color{#35bf28}+0.82\%$
test_exec_functional_call 0.1988ms 0.1585ms 6.3103 KOps/s 6.2686 KOps/s $\color{#35bf28}+0.67\%$
test_exec_td_decorator 0.4466ms 0.2301ms 4.3461 KOps/s 4.3200 KOps/s $\color{#35bf28}+0.60\%$
test_vmap_mlp_speed_decorator[True-True] 1.0177ms 0.8107ms 1.2336 KOps/s 1.2273 KOps/s $\color{#35bf28}+0.51\%$
test_vmap_mlp_speed_decorator[True-False] 0.9836ms 0.8111ms 1.2329 KOps/s 1.2267 KOps/s $\color{#35bf28}+0.51\%$
test_vmap_mlp_speed_decorator[False-True] 0.8491ms 0.7027ms 1.4231 KOps/s 1.4076 KOps/s $\color{#35bf28}+1.10\%$
test_vmap_mlp_speed_decorator[False-False] 0.8906ms 0.7016ms 1.4252 KOps/s 1.4119 KOps/s $\color{#35bf28}+0.94\%$
test_vmap_transformer_speed_decorator[True-True] 20.3016ms 20.1978ms 49.5103 Ops/s 48.4598 Ops/s $\color{#35bf28}+2.17\%$
test_vmap_transformer_speed_decorator[True-False] 20.3492ms 20.2443ms 49.3966 Ops/s 48.5519 Ops/s $\color{#35bf28}+1.74\%$
test_vmap_transformer_speed_decorator[False-True] 20.1639ms 20.0555ms 49.8616 Ops/s 48.9241 Ops/s $\color{#35bf28}+1.92\%$
test_vmap_transformer_speed_decorator[False-False] 20.2047ms 20.0841ms 49.7907 Ops/s 48.9483 Ops/s $\color{#35bf28}+1.72\%$
test_to_module_speed[True] 1.5025ms 1.4206ms 703.9347 Ops/s 704.6668 Ops/s $\color{#d91a1a}-0.10\%$
test_to_module_speed[False] 1.4868ms 1.4027ms 712.9128 Ops/s 710.5752 Ops/s $\color{#35bf28}+0.33\%$
test_tc_init 65.7920μs 44.8455μs 22.2988 KOps/s 22.2967 KOps/s $+0.01\%$
test_tc_init_tensor_only 32.7300μs 9.4073μs 106.3004 KOps/s 107.4280 KOps/s $\color{#d91a1a}-1.05\%$
test_tc_init_nested 0.1187ms 89.9750μs 11.1142 KOps/s 11.0648 KOps/s $\color{#35bf28}+0.45\%$
test_tc_init_many_fields 37.7510μs 15.7732μs 63.3988 KOps/s 63.9272 KOps/s $\color{#d91a1a}-0.83\%$
test_tc_first_layer_tensor 19.2500μs 1.7441μs 573.3553 KOps/s 586.8186 KOps/s $\color{#d91a1a}-2.29\%$
test_tc_first_layer_tensor_only 5.5687μs 0.7110μs 1.4065 MOps/s 1.4271 MOps/s $\color{#d91a1a}-1.44\%$
test_tc_first_layer_tensor_set 31.5300μs 4.0038μs 249.7650 KOps/s 253.9699 KOps/s $\color{#d91a1a}-1.66\%$
test_tc_first_layer_tensor_only_set 28.3610μs 3.0020μs 333.1152 KOps/s 332.5503 KOps/s $\color{#35bf28}+0.17\%$
test_tc_first_layer_nontensor 2.3987ms 5.8798μs 170.0735 KOps/s 172.7744 KOps/s $\color{#d91a1a}-1.56\%$
test_tc_second_layer_tensor 36.8600μs 4.1609μs 240.3347 KOps/s 244.2335 KOps/s $\color{#d91a1a}-1.60\%$
test_tc_second_layer_nontensor 30.6800μs 8.2575μs 121.1016 KOps/s 126.8796 KOps/s $\color{#d91a1a}-4.55\%$
test_unbind 0.2411s 13.2413ms 75.5214 Ops/s 69.3823 Ops/s $\textbf{\color{#35bf28}+8.85\%}$
test_full_like 5.4671ms 4.3308ms 230.9068 Ops/s 227.3441 Ops/s $\color{#35bf28}+1.57\%$
test_zeros_like 5.0319ms 4.3547ms 229.6389 Ops/s 228.6198 Ops/s $\color{#35bf28}+0.45\%$
test_ones_like 4.5191ms 4.3574ms 229.4938 Ops/s 228.2051 Ops/s $\color{#35bf28}+0.56\%$
test_clone 6.7262ms 6.4512ms 155.0088 Ops/s 153.6532 Ops/s $\color{#35bf28}+0.88\%$
test_squeeze 0.1786ms 13.5712μs 73.6852 KOps/s 66.4650 KOps/s $\textbf{\color{#35bf28}+10.86\%}$
test_unsqueeze 0.1536ms 0.1102ms 9.0741 KOps/s 8.8629 KOps/s $\color{#35bf28}+2.38\%$
test_split 0.3475ms 0.1760ms 5.6814 KOps/s 5.3556 KOps/s $\textbf{\color{#35bf28}+6.08\%}$
test_permute 0.2670ms 0.2015ms 4.9639 KOps/s 4.7897 KOps/s $\color{#35bf28}+3.64\%$
test_stack 52.2511ms 51.4144ms 19.4498 Ops/s 19.4800 Ops/s $\color{#d91a1a}-0.16\%$
test_cat 51.6251ms 51.3769ms 19.4640 Ops/s 19.4537 Ops/s $\color{#35bf28}+0.05\%$

@github-actions
Copy link
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 243. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}12$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 57.7130μs 14.8731μs 67.2353 KOps/s 67.0241 KOps/s $\color{#35bf28}+0.32\%$
test_plain_set_stack_nested 40.2320μs 15.2653μs 65.5080 KOps/s 66.3270 KOps/s $\color{#d91a1a}-1.23\%$
test_plain_set_nested_inplace 43.4020μs 16.5050μs 60.5877 KOps/s 59.9730 KOps/s $\color{#35bf28}+1.02\%$
test_plain_set_stack_nested_inplace 59.0140μs 16.4465μs 60.8033 KOps/s 61.0029 KOps/s $\color{#d91a1a}-0.33\%$
test_items 35.4630μs 5.8655μs 170.4884 KOps/s 172.7090 KOps/s $\color{#d91a1a}-1.29\%$
test_items_nested 0.5975ms 0.5322ms 1.8790 KOps/s 1.8670 KOps/s $\color{#35bf28}+0.64\%$
test_items_nested_locked 0.6395ms 0.5261ms 1.9008 KOps/s 1.8457 KOps/s $\color{#35bf28}+2.99\%$
test_items_nested_leaf 0.1346ms 95.5464μs 10.4661 KOps/s 10.4033 KOps/s $\color{#35bf28}+0.60\%$
test_items_stack_nested 0.6039ms 0.5420ms 1.8449 KOps/s 1.8717 KOps/s $\color{#d91a1a}-1.43\%$
test_items_stack_nested_leaf 0.1372ms 97.7904μs 10.2260 KOps/s 10.4305 KOps/s $\color{#d91a1a}-1.96\%$
test_items_stack_nested_locked 0.6230ms 0.5373ms 1.8612 KOps/s 1.8475 KOps/s $\color{#35bf28}+0.74\%$
test_keys 30.0520μs 4.1966μs 238.2897 KOps/s 223.5863 KOps/s $\textbf{\color{#35bf28}+6.58\%}$
test_keys_nested 0.1629ms 0.1195ms 8.3695 KOps/s 8.2741 KOps/s $\color{#35bf28}+1.15\%$
test_keys_nested_locked 88.4651ms 0.1406ms 7.1117 KOps/s 7.7009 KOps/s $\textbf{\color{#d91a1a}-7.65\%}$
test_keys_nested_leaf 0.1612ms 0.1108ms 9.0223 KOps/s 9.0666 KOps/s $\color{#d91a1a}-0.49\%$
test_keys_stack_nested 0.1647ms 0.1207ms 8.2849 KOps/s 8.3159 KOps/s $\color{#d91a1a}-0.37\%$
test_keys_stack_nested_leaf 0.1478ms 0.1105ms 9.0487 KOps/s 9.0245 KOps/s $\color{#35bf28}+0.27\%$
test_keys_stack_nested_locked 0.1672ms 0.1282ms 7.7990 KOps/s 7.7077 KOps/s $\color{#35bf28}+1.18\%$
test_values 6.0184μs 1.0173μs 983.0414 KOps/s 976.3358 KOps/s $\color{#35bf28}+0.69\%$
test_values_nested 92.0660μs 47.7094μs 20.9602 KOps/s 20.6621 KOps/s $\color{#35bf28}+1.44\%$
test_values_nested_locked 85.7150μs 50.7432μs 19.7071 KOps/s 19.6926 KOps/s $\color{#35bf28}+0.07\%$
test_values_nested_leaf 85.6250μs 54.2007μs 18.4499 KOps/s 18.2171 KOps/s $\color{#35bf28}+1.28\%$
test_values_stack_nested 84.9150μs 47.8319μs 20.9065 KOps/s 20.6023 KOps/s $\color{#35bf28}+1.48\%$
test_values_stack_nested_leaf 93.0550μs 53.8305μs 18.5768 KOps/s 18.3135 KOps/s $\color{#35bf28}+1.44\%$
test_values_stack_nested_locked 0.7847ms 50.4946μs 19.8041 KOps/s 19.4765 KOps/s $\color{#35bf28}+1.68\%$
test_membership 4.1587μs 0.8595μs 1.1635 MOps/s 1.1748 MOps/s $\color{#d91a1a}-0.96\%$
test_membership_nested 26.5420μs 3.2050μs 312.0151 KOps/s 312.2958 KOps/s $\color{#d91a1a}-0.09\%$
test_membership_nested_leaf 33.9320μs 3.1655μs 315.9061 KOps/s 310.6203 KOps/s $\color{#35bf28}+1.70\%$
test_membership_stacked_nested 27.1020μs 3.2246μs 310.1147 KOps/s 312.4235 KOps/s $\color{#d91a1a}-0.74\%$
test_membership_stacked_nested_leaf 36.9020μs 3.1729μs 315.1667 KOps/s 313.0597 KOps/s $\color{#35bf28}+0.67\%$
test_membership_nested_last 34.4620μs 4.6629μs 214.4610 KOps/s 217.1786 KOps/s $\color{#d91a1a}-1.25\%$
test_membership_nested_leaf_last 32.3120μs 4.6911μs 213.1694 KOps/s 215.3491 KOps/s $\color{#d91a1a}-1.01\%$
test_membership_stacked_nested_last 25.5720μs 4.6472μs 215.1837 KOps/s 213.6375 KOps/s $\color{#35bf28}+0.72\%$
test_membership_stacked_nested_leaf_last 34.8520μs 4.6454μs 215.2661 KOps/s 215.1544 KOps/s $\color{#35bf28}+0.05\%$
test_nested_getleaf 46.7630μs 22.0064μs 45.4413 KOps/s 45.6060 KOps/s $\color{#d91a1a}-0.36\%$
test_nested_get 51.8030μs 20.5686μs 48.6177 KOps/s 49.1273 KOps/s $\color{#d91a1a}-1.04\%$
test_stacked_getleaf 47.8220μs 22.1499μs 45.1470 KOps/s 46.2246 KOps/s $\color{#d91a1a}-2.33\%$
test_stacked_get 49.8630μs 20.8434μs 47.9767 KOps/s 48.1779 KOps/s $\color{#d91a1a}-0.42\%$
test_nested_getitemleaf 51.4730μs 22.3251μs 44.7925 KOps/s 44.2700 KOps/s $\color{#35bf28}+1.18\%$
test_nested_getitem 47.1030μs 21.1587μs 47.2620 KOps/s 47.0771 KOps/s $\color{#35bf28}+0.39\%$
test_stacked_getitemleaf 52.8030μs 22.6638μs 44.1232 KOps/s 44.6083 KOps/s $\color{#d91a1a}-1.09\%$
test_stacked_getitem 47.0930μs 21.3417μs 46.8567 KOps/s 46.3611 KOps/s $\color{#35bf28}+1.07\%$
test_lock_nested 7.5813ms 0.4749ms 2.1059 KOps/s 2.0760 KOps/s $\color{#35bf28}+1.44\%$
test_lock_stack_nested 0.5365ms 0.4751ms 2.1049 KOps/s 2.0483 KOps/s $\color{#35bf28}+2.76\%$
test_unlock_nested 0.5114ms 0.3784ms 2.6428 KOps/s 2.5966 KOps/s $\color{#35bf28}+1.78\%$
test_unlock_stack_nested 0.4307ms 0.3815ms 2.6210 KOps/s 2.5684 KOps/s $\color{#35bf28}+2.05\%$
test_flatten_speed 0.1557ms 0.1212ms 8.2510 KOps/s 8.0311 KOps/s $\color{#35bf28}+2.74\%$
test_unflatten_speed 0.6597ms 0.5912ms 1.6916 KOps/s 1.6663 KOps/s $\color{#35bf28}+1.52\%$
test_common_ops 0.8116ms 0.6858ms 1.4582 KOps/s 1.4275 KOps/s $\color{#35bf28}+2.15\%$
test_creation 69.7640μs 2.8121μs 355.6042 KOps/s 340.2768 KOps/s $\color{#35bf28}+4.50\%$
test_creation_empty 31.0220μs 6.1402μs 162.8616 KOps/s 161.4765 KOps/s $\color{#35bf28}+0.86\%$
test_creation_nested_1 31.5010μs 10.9426μs 91.3863 KOps/s 91.1254 KOps/s $\color{#35bf28}+0.29\%$
test_creation_nested_2 34.1630μs 12.0034μs 83.3097 KOps/s 82.4825 KOps/s $\color{#35bf28}+1.00\%$
test_creation_many_keys[10] 55.2740μs 18.2064μs 54.9258 KOps/s 53.8795 KOps/s $\color{#35bf28}+1.94\%$
test_creation_many_keys[50] 0.1062ms 77.9965μs 12.8211 KOps/s 12.6029 KOps/s $\color{#35bf28}+1.73\%$
test_creation_many_keys[100] 0.1907ms 0.1521ms 6.5744 KOps/s 6.5242 KOps/s $\color{#35bf28}+0.77\%$
test_creation_nested_many_keys[10] 63.5340μs 39.3007μs 25.4449 KOps/s 25.0940 KOps/s $\color{#35bf28}+1.40\%$
test_creation_nested_many_keys[50] 0.1907ms 0.1585ms 6.3106 KOps/s 6.2140 KOps/s $\color{#35bf28}+1.55\%$
test_clone 43.4130μs 13.4554μs 74.3196 KOps/s 73.8232 KOps/s $\color{#35bf28}+0.67\%$
test_getitem[int] 1.6938ms 14.6725μs 68.1549 KOps/s 56.5448 KOps/s $\textbf{\color{#35bf28}+20.53\%}$
test_getitem[slice_int] 0.1403ms 25.2186μs 39.6533 KOps/s 39.5271 KOps/s $\color{#35bf28}+0.32\%$
test_getitem[range] 0.1789ms 61.9277μs 16.1479 KOps/s 15.9466 KOps/s $\color{#35bf28}+1.26\%$
test_getitem[tuple] 0.1525ms 24.4048μs 40.9755 KOps/s 41.0721 KOps/s $\color{#d91a1a}-0.24\%$
test_getitem[list] 0.1868ms 56.7096μs 17.6337 KOps/s 17.2111 KOps/s $\color{#35bf28}+2.46\%$
test_setitem_dim[int] 47.5530μs 26.1626μs 38.2225 KOps/s 38.1474 KOps/s $\color{#35bf28}+0.20\%$
test_setitem_dim[slice_int] 67.2650μs 44.9331μs 22.2553 KOps/s 22.3278 KOps/s $\color{#d91a1a}-0.32\%$
test_setitem_dim[range] 0.1185ms 92.9428μs 10.7593 KOps/s 10.6128 KOps/s $\color{#35bf28}+1.38\%$
test_setitem_dim[tuple] 64.0340μs 40.9599μs 24.4141 KOps/s 24.0456 KOps/s $\color{#35bf28}+1.53\%$
test_setitem 56.4730μs 18.6125μs 53.7272 KOps/s 54.2373 KOps/s $\color{#d91a1a}-0.94\%$
test_set 44.8130μs 17.5337μs 57.0330 KOps/s 56.9990 KOps/s $\color{#35bf28}+0.06\%$
test_set_shared 0.5886ms 0.2072ms 4.8263 KOps/s 4.8242 KOps/s $\color{#35bf28}+0.04\%$
test_update 0.1620ms 22.6801μs 44.0915 KOps/s 44.4513 KOps/s $\color{#d91a1a}-0.81\%$
test_update_nested 75.1550μs 35.0669μs 28.5169 KOps/s 28.7394 KOps/s $\color{#d91a1a}-0.77\%$
test_update__nested 0.4492ms 34.4823μs 29.0003 KOps/s 28.7670 KOps/s $\color{#35bf28}+0.81\%$
test_set_nested 63.9340μs 19.5732μs 51.0904 KOps/s 51.4158 KOps/s $\color{#d91a1a}-0.63\%$
test_set_nested_new 62.4040μs 24.4314μs 40.9310 KOps/s 40.1596 KOps/s $\color{#35bf28}+1.92\%$
test_select 80.7650μs 42.6247μs 23.4606 KOps/s 23.2035 KOps/s $\color{#35bf28}+1.11\%$
test_select_nested 0.1071ms 75.3083μs 13.2788 KOps/s 13.2925 KOps/s $\color{#d91a1a}-0.10\%$
test_exclude_nested 0.1345ms 98.0142μs 10.2026 KOps/s 10.2168 KOps/s $\color{#d91a1a}-0.14\%$
test_empty[True] 0.5031ms 0.4415ms 2.2648 KOps/s 2.2652 KOps/s $\color{#d91a1a}-0.02\%$
test_empty[False] 7.4430μs 1.3264μs 753.8969 KOps/s 759.1225 KOps/s $\color{#d91a1a}-0.69\%$
test_to 0.1031ms 73.6931μs 13.5698 KOps/s 13.6123 KOps/s $\color{#d91a1a}-0.31\%$
test_to_nonblocking 0.1004ms 65.6954μs 15.2218 KOps/s 15.1455 KOps/s $\color{#35bf28}+0.50\%$
test_unbind_speed 0.3704ms 0.3261ms 3.0666 KOps/s 3.0538 KOps/s $\color{#35bf28}+0.42\%$
test_unbind_speed_stack0 0.3758ms 0.3236ms 3.0899 KOps/s 3.0464 KOps/s $\color{#35bf28}+1.43\%$
test_unbind_speed_stack1 0.1026s 0.9122ms 1.0963 KOps/s 1.1870 KOps/s $\textbf{\color{#d91a1a}-7.64\%}$
test_split 1.2214ms 1.1458ms 872.7158 Ops/s 784.4757 Ops/s $\textbf{\color{#35bf28}+11.25\%}$
test_chunk 0.1018s 1.2085ms 827.4524 Ops/s 916.4067 Ops/s $\textbf{\color{#d91a1a}-9.71\%}$
test_to_cpu_blocking 28.6797ms 28.5949ms 34.9713 Ops/s 46.8310 Ops/s $\textbf{\color{#d91a1a}-25.32\%}$
test_to_cpu_global_sync 11.3268ms 11.2254ms 89.0834 Ops/s 89.2523 Ops/s $\color{#d91a1a}-0.19\%$
test_to_cpu_event_sync 12.3964ms 12.2001ms 81.9663 Ops/s 81.8864 Ops/s $\color{#35bf28}+0.10\%$
test_to_cpu_default 0.1141s 13.4967ms 74.0924 Ops/s 81.9372 Ops/s $\textbf{\color{#d91a1a}-9.57\%}$
test_consolidate[False-None] 4.1992ms 4.1425ms 241.4012 Ops/s 219.7046 Ops/s $\textbf{\color{#35bf28}+9.88\%}$
test_consolidate[default-None] 2.4503ms 2.0288ms 492.8990 Ops/s 489.2368 Ops/s $\color{#35bf28}+0.75\%$
test_consolidate[reduce-overhead-None] 2.3563ms 1.9410ms 515.2033 Ops/s 513.1994 Ops/s $\color{#35bf28}+0.39\%$
test_consolidate_njt[False-None] 8.9696ms 8.5466ms 117.0054 Ops/s 117.0215 Ops/s $\color{#d91a1a}-0.01\%$
test_to[False-False-None] 2.4840ms 2.0686ms 483.4101 Ops/s 478.3350 Ops/s $\color{#35bf28}+1.06\%$
test_to[True-False-None] 2.3379ms 1.9444ms 514.3039 Ops/s 511.3351 Ops/s $\color{#35bf28}+0.58\%$
test_to[within-False-None] 6.6408ms 6.2160ms 160.8741 Ops/s 161.4492 Ops/s $\color{#d91a1a}-0.36\%$
test_to[True-default-None] 8.0069ms 7.6223ms 131.1943 Ops/s 129.0495 Ops/s $\color{#35bf28}+1.66\%$
test_to_njt[False-False-None] 8.9258ms 8.5347ms 117.1685 Ops/s 115.4941 Ops/s $\color{#35bf28}+1.45\%$
test_to_njt[True-False-None] 7.4462ms 7.0399ms 142.0474 Ops/s 137.3797 Ops/s $\color{#35bf28}+3.40\%$
test_to_njt[within-False-None] 16.9708ms 16.0244ms 62.4048 Ops/s 63.6655 Ops/s $\color{#d91a1a}-1.98\%$
test_creation[device0] 0.5119ms 0.1215ms 8.2306 KOps/s 8.4433 KOps/s $\color{#d91a1a}-2.52\%$
test_creation_from_tensor 0.5345ms 0.1156ms 8.6474 KOps/s 8.6200 KOps/s $\color{#35bf28}+0.32\%$
test_add_one[memmap_tensor0] 0.4496ms 6.4104μs 155.9960 KOps/s 151.0374 KOps/s $\color{#35bf28}+3.28\%$
test_contiguous[memmap_tensor0] 14.4910μs 0.7123μs 1.4038 MOps/s 1.9360 MOps/s $\textbf{\color{#d91a1a}-27.49\%}$
test_stack[memmap_tensor0] 27.5920μs 4.6022μs 217.2870 KOps/s 222.4626 KOps/s $\color{#d91a1a}-2.33\%$
test_memmaptd_index 1.1114ms 0.2660ms 3.7599 KOps/s 3.7686 KOps/s $\color{#d91a1a}-0.23\%$
test_memmaptd_index_astensor 0.7878ms 0.3628ms 2.7566 KOps/s 2.7754 KOps/s $\color{#d91a1a}-0.68\%$
test_memmaptd_index_op 1.0371ms 0.6061ms 1.6500 KOps/s 1.6322 KOps/s $\color{#35bf28}+1.09\%$
test_serialize_model 0.1383s 0.1363s 7.3374 Ops/s 7.3509 Ops/s $\color{#d91a1a}-0.18\%$
test_serialize_model_pickle 1.3654s 1.2167s 0.8219 Ops/s 0.8264 Ops/s $\color{#d91a1a}-0.55\%$
test_serialize_weights 0.1391s 0.1351s 7.4044 Ops/s 7.3257 Ops/s $\color{#35bf28}+1.07\%$
test_serialize_weights_returnearly 0.4290s 92.8567ms 10.7693 Ops/s 5.9994 Ops/s $\textbf{\color{#35bf28}+79.51\%}$
test_serialize_weights_pickle 1.3659s 1.1890s 0.8410 Ops/s 0.8218 Ops/s $\color{#35bf28}+2.34\%$
test_reshape_pytree 0.1996ms 33.6639μs 29.7054 KOps/s 29.8638 KOps/s $\color{#d91a1a}-0.53\%$
test_reshape_td 72.1040μs 44.1659μs 22.6419 KOps/s 21.4426 KOps/s $\textbf{\color{#35bf28}+5.59\%}$
test_view_pytree 0.2195ms 33.4321μs 29.9114 KOps/s 30.0761 KOps/s $\color{#d91a1a}-0.55\%$
test_view_td 85.4450μs 51.5663μs 19.3925 KOps/s 18.9994 KOps/s $\color{#35bf28}+2.07\%$
test_unbind_pytree 0.2375ms 36.9467μs 27.0660 KOps/s 26.5668 KOps/s $\color{#35bf28}+1.88\%$
test_unbind_td 0.1993ms 48.7119μs 20.5289 KOps/s 20.2239 KOps/s $\color{#35bf28}+1.51\%$
test_split_pytree 0.2305ms 43.5573μs 22.9583 KOps/s 23.4280 KOps/s $\color{#d91a1a}-2.01\%$
test_split_td 0.1852ms 65.6773μs 15.2260 KOps/s 15.3263 KOps/s $\color{#d91a1a}-0.65\%$
test_add_pytree 0.2371ms 42.6719μs 23.4346 KOps/s 23.7134 KOps/s $\color{#d91a1a}-1.18\%$
test_add_td 85.3950μs 53.7773μs 18.5952 KOps/s 18.9010 KOps/s $\color{#d91a1a}-1.62\%$
test_compile_add_one_nested[tensordict-compile] 0.1912ms 0.1390ms 7.1933 KOps/s 6.7170 KOps/s $\textbf{\color{#35bf28}+7.09\%}$
test_compile_add_one_nested[tensordict-eager] 0.4088ms 0.1925ms 5.1955 KOps/s 5.1568 KOps/s $\color{#35bf28}+0.75\%$
test_compile_add_one_nested[pytree-compile] 0.1550ms 0.1083ms 9.2362 KOps/s 9.1199 KOps/s $\color{#35bf28}+1.28\%$
test_compile_add_one_nested[pytree-eager] 0.4393ms 0.1825ms 5.4793 KOps/s 5.5181 KOps/s $\color{#d91a1a}-0.70\%$
test_compile_copy_nested[tensordict-compile] 0.1505ms 29.9605μs 33.3773 KOps/s 30.2496 KOps/s $\textbf{\color{#35bf28}+10.34\%}$
test_compile_copy_nested[tensordict-eager] 90.9460μs 52.6651μs 18.9879 KOps/s 18.4666 KOps/s $\color{#35bf28}+2.82\%$
test_compile_copy_nested[pytree-compile] 45.0430μs 10.0935μs 99.0736 KOps/s 102.4606 KOps/s $\color{#d91a1a}-3.31\%$
test_compile_copy_nested[pytree-eager] 0.4365ms 70.7620μs 14.1319 KOps/s 14.2688 KOps/s $\color{#d91a1a}-0.96\%$
test_compile_add_one_flat[tensordict-compile] 0.3040ms 0.1790ms 5.5852 KOps/s 5.3138 KOps/s $\textbf{\color{#35bf28}+5.11\%}$
test_compile_add_one_flat[tensordict-eager] 0.3052ms 0.2530ms 3.9520 KOps/s 3.8473 KOps/s $\color{#35bf28}+2.72\%$
test_compile_add_one_flat[tensorclass-compile] 0.1777ms 0.1183ms 8.4561 KOps/s 8.2495 KOps/s $\color{#35bf28}+2.50\%$
test_compile_add_one_flat[tensorclass-eager] 0.1038ms 70.2497μs 14.2349 KOps/s 14.2666 KOps/s $\color{#d91a1a}-0.22\%$
test_compile_add_one_flat[pytree-compile] 0.2189ms 0.1594ms 6.2747 KOps/s 6.1735 KOps/s $\color{#35bf28}+1.64\%$
test_compile_add_one_flat[pytree-eager] 0.8351ms 0.5285ms 1.8920 KOps/s 1.8701 KOps/s $\color{#35bf28}+1.17\%$
test_compile_add_self_flat[tensordict-eager] 0.4601ms 0.3082ms 3.2442 KOps/s 3.1722 KOps/s $\color{#35bf28}+2.27\%$
test_compile_add_self_flat[tensordict-compile] 0.2312ms 0.1795ms 5.5706 KOps/s 5.2660 KOps/s $\textbf{\color{#35bf28}+5.78\%}$
test_compile_add_self_flat[tensorclass-eager] 0.1263ms 86.1686μs 11.6052 KOps/s 11.6273 KOps/s $\color{#d91a1a}-0.19\%$
test_compile_add_self_flat[tensorclass-compile] 0.2020ms 0.1202ms 8.3224 KOps/s 8.0717 KOps/s $\color{#35bf28}+3.11\%$
test_compile_add_self_flat[pytree-eager] 0.6595ms 0.4360ms 2.2938 KOps/s 2.2500 KOps/s $\color{#35bf28}+1.94\%$
test_compile_add_self_flat[pytree-compile] 0.1948ms 0.1595ms 6.2700 KOps/s 6.1377 KOps/s $\color{#35bf28}+2.16\%$
test_compile_copy_flat[tensordict-compile] 75.4440μs 24.2088μs 41.3073 KOps/s 39.6893 KOps/s $\color{#35bf28}+4.08\%$
test_compile_copy_flat[tensordict-eager] 97.9260μs 41.5771μs 24.0517 KOps/s 23.8854 KOps/s $\color{#35bf28}+0.70\%$
test_compile_copy_flat[pytree-compile] 36.7020μs 11.0939μs 90.1396 KOps/s 91.2262 KOps/s $\color{#d91a1a}-1.19\%$
test_compile_copy_flat[pytree-eager] 0.1777s 62.5607μs 15.9845 KOps/s 18.9695 KOps/s $\textbf{\color{#d91a1a}-15.74\%}$
test_compile_assign_and_add[tensordict-compile] 2.0217ms 0.1742ms 5.7416 KOps/s 5.3944 KOps/s $\textbf{\color{#35bf28}+6.44\%}$
test_compile_assign_and_add[tensordict-eager] 3.4254ms 3.3108ms 302.0375 Ops/s 298.5079 Ops/s $\color{#35bf28}+1.18\%$
test_compile_assign_and_add[pytree-compile] 1.9667ms 0.1620ms 6.1711 KOps/s 6.0638 KOps/s $\color{#35bf28}+1.77\%$
test_compile_assign_and_add[pytree-eager] 2.9084ms 2.7747ms 360.3953 Ops/s 357.4238 Ops/s $\color{#35bf28}+0.83\%$
test_compile_indexing[tensor-tensordict-compile] 0.1443ms 0.1091ms 9.1664 KOps/s 8.8923 KOps/s $\color{#35bf28}+3.08\%$
test_compile_indexing[tensor-tensordict-eager] 0.3139ms 73.1973μs 13.6617 KOps/s 13.5795 KOps/s $\color{#35bf28}+0.61\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1441ms 96.7933μs 10.3313 KOps/s 10.1571 KOps/s $\color{#35bf28}+1.71\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2568ms 44.8076μs 22.3176 KOps/s 21.8954 KOps/s $\color{#35bf28}+1.93\%$
test_compile_indexing[tensor-pytree-compile] 0.1607ms 97.9076μs 10.2137 KOps/s 10.1779 KOps/s $\color{#35bf28}+0.35\%$
test_compile_indexing[tensor-pytree-eager] 0.2588ms 44.4854μs 22.4793 KOps/s 21.9225 KOps/s $\color{#35bf28}+2.54\%$
test_compile_indexing[slice-tensordict-compile] 0.1013ms 56.0597μs 17.8381 KOps/s 17.4101 KOps/s $\color{#35bf28}+2.46\%$
test_compile_indexing[slice-tensordict-eager] 0.2277ms 27.7994μs 35.9720 KOps/s 34.2563 KOps/s $\textbf{\color{#35bf28}+5.01\%}$
test_compile_indexing[slice-tensorclass-compile] 0.1050ms 45.9639μs 21.7562 KOps/s 21.7703 KOps/s $\color{#d91a1a}-0.06\%$
test_compile_indexing[slice-tensorclass-eager] 0.2682ms 22.5207μs 44.4035 KOps/s 42.8107 KOps/s $\color{#35bf28}+3.72\%$
test_compile_indexing[slice-pytree-compile] 90.5760μs 46.4560μs 21.5258 KOps/s 21.5461 KOps/s $\color{#d91a1a}-0.09\%$
test_compile_indexing[slice-pytree-eager] 0.2694ms 22.6455μs 44.1589 KOps/s 43.3897 KOps/s $\color{#35bf28}+1.77\%$
test_compile_indexing[int-tensordict-compile] 95.1760μs 56.5719μs 17.6766 KOps/s 17.2876 KOps/s $\color{#35bf28}+2.25\%$
test_compile_indexing[int-tensordict-eager] 0.2197ms 27.6473μs 36.1699 KOps/s 34.6056 KOps/s $\color{#35bf28}+4.52\%$
test_compile_indexing[int-tensorclass-compile] 86.4950μs 46.0830μs 21.7000 KOps/s 21.5035 KOps/s $\color{#35bf28}+0.91\%$
test_compile_indexing[int-tensorclass-eager] 0.2594ms 22.5338μs 44.3777 KOps/s 43.3487 KOps/s $\color{#35bf28}+2.37\%$
test_compile_indexing[int-pytree-compile] 92.5550μs 46.3123μs 21.5926 KOps/s 21.5913 KOps/s $+0.01\%$
test_compile_indexing[int-pytree-eager] 0.2784ms 22.5845μs 44.2782 KOps/s 43.4472 KOps/s $\color{#35bf28}+1.91\%$
test_mod_add[eager] 0.1031ms 51.9254μs 19.2584 KOps/s 19.5087 KOps/s $\color{#d91a1a}-1.28\%$
test_mod_add[compile] 0.1818ms 0.1056ms 9.4732 KOps/s 9.3938 KOps/s $\color{#35bf28}+0.85\%$
test_mod_add[compile-overhead] 0.2339ms 0.1493ms 6.6990 KOps/s 6.5770 KOps/s $\color{#35bf28}+1.86\%$
test_mod_wrap[eager] 0.3867ms 0.2942ms 3.3986 KOps/s 3.4054 KOps/s $\color{#d91a1a}-0.20\%$
test_mod_wrap[compile] 0.4236ms 0.3497ms 2.8600 KOps/s 2.8368 KOps/s $\color{#35bf28}+0.82\%$
test_mod_wrap[compile-overhead] 7.4151ms 4.0725ms 245.5487 Ops/s 249.4051 Ops/s $\color{#d91a1a}-1.55\%$
test_mod_wrap_and_backward[eager] 1.6404ms 1.4990ms 667.1143 Ops/s 641.6356 Ops/s $\color{#35bf28}+3.97\%$
test_mod_wrap_and_backward[compile] 1.9471ms 1.4896ms 671.3062 Ops/s 680.1041 Ops/s $\color{#d91a1a}-1.29\%$
test_mod_wrap_and_backward[compile-overhead] 1.2648ms 0.8990ms 1.1123 KOps/s 1.1067 KOps/s $\color{#35bf28}+0.50\%$
test_seq_add[eager] 0.5942ms 0.1681ms 5.9474 KOps/s 6.3232 KOps/s $\textbf{\color{#d91a1a}-5.94\%}$
test_seq_add[compile] 0.5637ms 0.1217ms 8.2194 KOps/s 8.4139 KOps/s $\color{#d91a1a}-2.31\%$
test_seq_add[compile-overhead] 0.6055ms 0.1641ms 6.0952 KOps/s 6.2743 KOps/s $\color{#d91a1a}-2.85\%$
test_seq_wrap[eager] 1.0336ms 0.5575ms 1.7937 KOps/s 1.8452 KOps/s $\color{#d91a1a}-2.79\%$
test_seq_wrap[compile] 0.4700ms 0.3682ms 2.7162 KOps/s 2.5907 KOps/s $\color{#35bf28}+4.84\%$
test_seq_wrap[compile-overhead] 0.3730ms 0.2645ms 3.7810 KOps/s 3.7111 KOps/s $\color{#35bf28}+1.88\%$
test_func_call_runtime[False-eager] 0.9143ms 0.8432ms 1.1860 KOps/s 1.1039 KOps/s $\textbf{\color{#35bf28}+7.44\%}$
test_func_call_runtime[False-compile] 0.9995ms 0.9144ms 1.0936 KOps/s 1.0425 KOps/s $\color{#35bf28}+4.90\%$
test_func_call_runtime[False-compile-overhead] 0.5232ms 0.4628ms 2.1607 KOps/s 2.1482 KOps/s $\color{#35bf28}+0.58\%$
test_func_call_runtime[True-eager] 1.1588ms 1.0958ms 912.5944 Ops/s 913.0356 Ops/s $\color{#d91a1a}-0.05\%$
test_func_call_runtime[True-compile] 0.9918ms 0.9228ms 1.0836 KOps/s 1.0758 KOps/s $\color{#35bf28}+0.73\%$
test_func_call_runtime[True-compile-overhead] 0.5375ms 0.4726ms 2.1159 KOps/s 2.0724 KOps/s $\color{#35bf28}+2.10\%$
test_func_call_cm_runtime[False-eager] 0.9085ms 0.8414ms 1.1884 KOps/s 1.1740 KOps/s $\color{#35bf28}+1.23\%$
test_func_call_cm_runtime[False-compile] 1.1157ms 0.9185ms 1.0887 KOps/s 1.0839 KOps/s $\color{#35bf28}+0.45\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5525ms 0.4633ms 2.1584 KOps/s 2.1344 KOps/s $\color{#35bf28}+1.13\%$
test_func_call_cm_runtime[True-eager] 1.3616ms 1.2323ms 811.5107 Ops/s 797.2967 Ops/s $\color{#35bf28}+1.78\%$
test_func_call_cm_runtime[True-compile] 1.1708ms 0.9541ms 1.0481 KOps/s 1.0329 KOps/s $\color{#35bf28}+1.47\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5887ms 0.5088ms 1.9652 KOps/s 1.9475 KOps/s $\color{#35bf28}+0.91\%$
test_vmap_func_call_cm_runtime[eager] 2.8421ms 2.3551ms 424.6035 Ops/s 419.1372 Ops/s $\color{#35bf28}+1.30\%$
test_vmap_func_call_cm_runtime[compile] 1.0543ms 0.9801ms 1.0203 KOps/s 1.0132 KOps/s $\color{#35bf28}+0.71\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5594ms 0.5125ms 1.9511 KOps/s 1.9274 KOps/s $\color{#35bf28}+1.23\%$
test_distributed 0.7950ms 0.1533ms 6.5217 KOps/s 6.4605 KOps/s $\color{#35bf28}+0.95\%$
test_tdmodule 0.3039ms 29.3107μs 34.1172 KOps/s 33.3891 KOps/s $\color{#35bf28}+2.18\%$
test_tdmodule_dispatch 78.2150μs 47.5921μs 21.0119 KOps/s 20.6943 KOps/s $\color{#35bf28}+1.53\%$
test_tdseq 48.6130μs 27.9095μs 35.8301 KOps/s 35.3509 KOps/s $\color{#35bf28}+1.36\%$
test_tdseq_dispatch 71.4640μs 49.1078μs 20.3634 KOps/s 19.6114 KOps/s $\color{#35bf28}+3.83\%$
test_instantiation_functorch 2.1494ms 2.0778ms 481.2786 Ops/s 480.6254 Ops/s $\color{#35bf28}+0.14\%$
test_exec_functorch 0.2228ms 0.1812ms 5.5196 KOps/s 5.5931 KOps/s $\color{#d91a1a}-1.31\%$
test_exec_functional_call 0.2319ms 0.1618ms 6.1802 KOps/s 6.2185 KOps/s $\color{#d91a1a}-0.62\%$
test_exec_td_decorator 0.4500ms 0.2373ms 4.2142 KOps/s 4.1639 KOps/s $\color{#35bf28}+1.21\%$
test_vmap_mlp_speed_decorator[True-True] 1.0394ms 0.8224ms 1.2160 KOps/s 1.1970 KOps/s $\color{#35bf28}+1.59\%$
test_vmap_mlp_speed_decorator[True-False] 0.9949ms 0.8218ms 1.2169 KOps/s 1.2004 KOps/s $\color{#35bf28}+1.37\%$
test_vmap_mlp_speed_decorator[False-True] 0.8870ms 0.7108ms 1.4069 KOps/s 1.3834 KOps/s $\color{#35bf28}+1.70\%$
test_vmap_mlp_speed_decorator[False-False] 0.8879ms 0.7111ms 1.4063 KOps/s 1.3873 KOps/s $\color{#35bf28}+1.37\%$
test_vmap_transformer_speed_decorator[True-True] 21.3341ms 20.5888ms 48.5702 Ops/s 48.4520 Ops/s $\color{#35bf28}+0.24\%$
test_vmap_transformer_speed_decorator[True-False] 20.9925ms 20.5484ms 48.6656 Ops/s 48.4422 Ops/s $\color{#35bf28}+0.46\%$
test_vmap_transformer_speed_decorator[False-True] 21.2605ms 20.3721ms 49.0868 Ops/s 48.8741 Ops/s $\color{#35bf28}+0.44\%$
test_vmap_transformer_speed_decorator[False-False] 20.5323ms 20.3666ms 49.1000 Ops/s 48.8326 Ops/s $\color{#35bf28}+0.55\%$
test_to_module_speed[True] 1.5587ms 1.4730ms 678.8871 Ops/s 669.3019 Ops/s $\color{#35bf28}+1.43\%$
test_to_module_speed[False] 1.5779ms 1.4379ms 695.4353 Ops/s 684.2293 Ops/s $\color{#35bf28}+1.64\%$
test_tc_init 78.5750μs 46.2440μs 21.6244 KOps/s 20.9390 KOps/s $\color{#35bf28}+3.27\%$
test_tc_init_tensor_only 29.0510μs 10.0325μs 99.6762 KOps/s 99.9967 KOps/s $\color{#d91a1a}-0.32\%$
test_tc_init_nested 0.1283ms 93.3694μs 10.7101 KOps/s 10.4802 KOps/s $\color{#35bf28}+2.19\%$
test_tc_init_many_fields 46.2130μs 16.6994μs 59.8823 KOps/s 59.8179 KOps/s $\color{#35bf28}+0.11\%$
test_tc_first_layer_tensor 19.9310μs 1.8698μs 534.8207 KOps/s 542.5069 KOps/s $\color{#d91a1a}-1.42\%$
test_tc_first_layer_tensor_only 5.1631μs 0.7544μs 1.3256 MOps/s 1.3106 MOps/s $\color{#35bf28}+1.14\%$
test_tc_first_layer_tensor_set 33.6220μs 4.2429μs 235.6856 KOps/s 235.6754 KOps/s $+0.00\%$
test_tc_first_layer_tensor_only_set 20.7510μs 3.1855μs 313.9203 KOps/s 309.2807 KOps/s $\color{#35bf28}+1.50\%$
test_tc_first_layer_nontensor 55.0040μs 6.2494μs 160.0162 KOps/s 160.7073 KOps/s $\color{#d91a1a}-0.43\%$
test_tc_second_layer_tensor 35.7630μs 4.4877μs 222.8313 KOps/s 225.1473 KOps/s $\color{#d91a1a}-1.03\%$
test_tc_second_layer_nontensor 67.3940μs 8.8011μs 113.6223 KOps/s 112.6837 KOps/s $\color{#35bf28}+0.83\%$
test_unbind 0.2429s 14.1877ms 70.4837 Ops/s 57.3203 Ops/s $\textbf{\color{#35bf28}+22.96\%}$
test_full_like 4.6989ms 4.3685ms 228.9135 Ops/s 227.8356 Ops/s $\color{#35bf28}+0.47\%$
test_zeros_like 11.0574ms 10.5606ms 94.6920 Ops/s 228.9871 Ops/s $\textbf{\color{#d91a1a}-58.65\%}$
test_ones_like 10.6587ms 10.5408ms 94.8692 Ops/s 229.1657 Ops/s $\textbf{\color{#d91a1a}-58.60\%}$
test_clone 15.1738ms 15.0345ms 66.5135 Ops/s 154.1283 Ops/s $\textbf{\color{#d91a1a}-56.85\%}$
test_squeeze 0.1521ms 14.2765μs 70.0453 KOps/s 69.3117 KOps/s $\color{#35bf28}+1.06\%$
test_unsqueeze 0.2615ms 0.1111ms 9.0018 KOps/s 9.0721 KOps/s $\color{#d91a1a}-0.78\%$
test_split 0.2464ms 0.1834ms 5.4523 KOps/s 5.3162 KOps/s $\color{#35bf28}+2.56\%$
test_permute 0.2802ms 0.2046ms 4.8867 KOps/s 4.8910 KOps/s $\color{#d91a1a}-0.09\%$
test_stack 53.1967ms 51.6606ms 19.3571 Ops/s 19.4468 Ops/s $\color{#d91a1a}-0.46\%$
test_cat 51.7298ms 51.5539ms 19.3972 Ops/s 23.2310 Ops/s $\textbf{\color{#d91a1a}-16.50\%}$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant