julia> @code_native broadcast(+, UInt8[], UInt8[]) # (**Edit:** FOR 1-D ARRAYS) at least it's SIMD vectorized.
...snip...
; ││││┌ @ simdloop.jl:77 within `macro expansion' @ broadcast.jl:932
; │││││┌ @ broadcast.jl:575 within `getindex'
; ││││││┌ @ broadcast.jl:620 within `_broadcast_getindex'
; │││││││┌ @ broadcast.jl:644 within `_getindex' @ broadcast.jl:645
; ││││││││┌ @ broadcast.jl:614 within `_broadcast_getindex'
; │││││││││┌ @ array.jl:809 within `getindex'
L1216:
vmovdqu (%rcx,%rbx), %ymm0
vmovdqu 32(%rcx,%rbx), %ymm1
vmovdqu 64(%rcx,%rbx), %ymm2
vmovdqu 96(%rcx,%rbx), %ymm3
; ││││││└└└└
; ││││││┌ @ broadcast.jl:621 within `_broadcast_getindex'
; │││││││┌ @ broadcast.jl:648 within `_broadcast_getindex_evalf'
; ││││││││┌ @ int.jl:87 within `+'
vpaddb (%r9,%rbx), %ymm0, %ymm0
vpaddb 32(%r9,%rbx), %ymm1, %ymm1
vpaddb 64(%r9,%rbx), %ymm2, %ymm2
vpaddb 96(%r9,%rbx), %ymm3, %ymm3
; │││││└└└└
; │││││┌ @ array.jl:847 within `setindex!'
vmovdqu %ymm0, (%rdx,%rbx)
vmovdqu %ymm1, 32(%rdx,%rbx)
vmovdqu %ymm2, 64(%rdx,%rbx)
vmovdqu %ymm3, 96(%rdx,%rbx)
; ││││└└
; ││││┌ @ simdloop.jl:78 within `macro expansion'
; │││││┌ @ int.jl:87 within `+'
subq $-128, %rbx
cmpq %rbx, %rdi
jne L1216
; ││││└└
...snip...
Although I haven't identified the cause, I've noticed ~10x slowdown in simple broadcasting operations on nightly.
`@code_native` result for "1-D" arrays (It's somewhat misleading. See comments below.)
The most noticeable difference is the LLVM version (i.e. 10 vs 11), but I have no evidence that the LLVM 11 is the cause at the moment.