effects: add new @consistent_overlay macro#54322
Conversation
|
could we also add a weaker version of this effect that only requires the overlay to return an egal answer when the regular function returns an answer? (e.g. for things like NaNMath) |
|
That sounds exactly like the requirements for |
|
If I'm reading this right, it wouldn't be valid to use this extension for |
|
Ah, did you mean |
146a43a to
c675f21
Compare
|
I'm ok with this, but it seems a bit weird to put this on |
c675f21 to
6b294ec
Compare
:consistent_overlay override@consistent_overlay macro
I agree. I've created a new |
|
Sorry for the slow response, upgrading the GPU stack to LLVM 17 took longer than expected. Using using CUDA
cudacall(f, types::Type, args...; kwargs...) = nothing
function outer(f)
@inline cudacall(f, Tuple{}; stream=Ref(42), shmem=1)
return
end
using InteractiveUtils
InteractiveUtils.code_llvm(outer, Tuple{Nothing})
# vs
CUDA.code_llvm(outer, Tuple{Nothing}); Function Signature: outer(Nothing)
; @ REPL[4]:1 within `outer`
define void @julia_outer_6994() #0 {
top:
; @ REPL[4] within `outer`
ret void
}vs ; @ REPL[4]:1 within `outer`
define void @julia_outer_9973() local_unnamed_addr {
top:
%jlcallframe1 = alloca [4 x ptr], align 8
; @ REPL[4]:2 within `outer`
; ┌ @ REPL[3]:1 within `cudacall`
; │┌ @ iterators.jl:276 within `pairs`
; ││┌ @ essentials.jl:473 within `Pairs`
; │││┌ @ namedtuple.jl:234 within `eltype`
; ││││┌ @ namedtuple.jl:236 within `nteltype`
; │││││┌ @ tuple.jl:272 within `eltype`
; ││││││┌ @ tuple.jl:292 within `_compute_eltype`
; │││││││┌ @ promotion.jl:175 within `promote_typejoin`
%0 = load ptr, ptr getelementptr inbounds (i8, ptr @jl_small_typeof, i64 256), align 8
%1 = call fastcc nonnull ptr @julia_typejoin_9980(ptr readonly %0, ptr readonly inttoptr (i64 128551383252656 to ptr))
; ││││││││ @ promotion.jl:176 within `promote_typejoin`
%2 = load ptr, ptr getelementptr inbounds (i8, ptr @jl_small_typeof, i64 64), align 8
store ptr %2, ptr %jlcallframe1, align 8
%3 = getelementptr inbounds ptr, ptr %jlcallframe1, i64 1
store ptr %0, ptr %3, align 8
%4 = getelementptr inbounds ptr, ptr %jlcallframe1, i64 2
store ptr inttoptr (i64 128551383252656 to ptr), ptr %4, align 8
%5 = getelementptr inbounds ptr, ptr %jlcallframe1, i64 3
store ptr %1, ptr %5, align 8
%6 = call nonnull ptr @jl_f_apply_type(ptr null, ptr nonnull %jlcallframe1, i32 4)
; └└└└└└└└
; @ REPL[4]:3 within `outer`
ret void
}Do you want me to reduce this to something that doesn't require CUDA.jl? I'm also wondering if Of course, the above is speculation without a good understanding of the optimizer / our effects system, so feel free to poke holes. |
To fix this exact case, we actually need is #54323, and don't need this one. We can
If overlay options other than |
What about the |
|
|
|
I guess I'm mixing up effects analysis with concrete evaluation. In any case, I was suggesting a more relaxed version of the In any case, for the GPU use case I only care about |
|
For me
E.g. we need |
|
Thank you for clarifying. That definition of |
6b294ec to
a406fcd
Compare
Backported PRs: - [x] #53665 <!-- use afoldl instead of tail recursion for tuples --> - [x] #53976 <!-- LinearAlgebra: LazyString in interpolated error messages --> - [x] #54005 <!-- make `view(::Memory, ::Colon)` produce a Vector --> - [x] #54010 <!-- Overload `Base.literal_pow` for `AbstractQ` --> - [x] #54069 <!-- Allow PrecompileTools to see MI's inferred by foreign abstract interpreters --> - [x] #53750 <!-- inference correctness: fields and globals can revert to undef --> - [x] #53984 <!-- Profile: fix heap snapshot is valid char check --> - [x] #54102 <!-- Explicitly compute stride in unaliascopy for SubArray --> - [x] #54070 <!-- Fix integer overflow in `skip(s::IOBuffer, typemax(Int64))` --> - [x] #54013 <!-- Support case-changes to Annotated{String,Char}s --> - [x] #53941 <!-- Fix writing of AnnotatedChars to AnnotatedIOBuffer --> - [x] #54137 <!-- Fix typo in docs for `partialsortperm` --> - [x] #54129 <!-- use correct size when creating output data from an IOBuffer --> - [x] #54153 <!-- Fixup IdSet docstring --> - [x] #54143 <!-- Fix `make install` from tarballs --> - [x] #54151 <!-- LinearAlgebra: Correct zero element in `_generic_matvecmul!` for block adj/trans --> - [x] #54213 <!-- Add `public` statement to `Base.GC` --> - [x] #54222 <!-- Utilize correct tbaa when emitting stores of unions. --> - [x] #54233 <!-- set MAX_OS_WRITE on unix --> - [x] #54255 <!-- fix `_checked_mul_dims` in the presence of 0s and overflow. --> - [x] #54259 <!-- Fix typo in `readuntil` --> - [x] #54251 <!-- fix typo in gc_mark_memory8 when chunking a large array --> - [x] #54276 <!-- Fix solve for complex `Hermitian` with non-vanishing imaginary part on diagonal --> - [x] #54248 <!-- ensure package callbacks are invoked when no valid precompile file exists for an "auto loaded" stdlib --> - [x] #54308 <!-- Implement eval-able AnnotatedString 2-arg show --> - [x] #54302 <!-- Specialised substring equality for annotated strs --> - [x] #54243 <!-- prevent `package_callbacks` to run multiple time for a single package --> - [x] #54350 <!-- add a precompile signature to Artifacts code that is used by JLLs --> - [x] #54331 <!-- correctly track freed bytes in jl_genericmemory_to_string --> - [x] #53509 <!-- revert moving "creating packages" from Pkg.jl --> - [x] #54335 <!-- When accessing the data pointer for an array, first decay it to a Derived Pointer --> - [x] #54239 <!-- Make sure `fieldcount` constant-folds for `Tuple{...}` --> - [x] #54288 - [x] #54067 - [x] #53715 <!-- Add read/write specialisation for IOContext{AnnIO} --> - [x] #54289 <!-- Rework annotation ordering/optimisations --> - [x] #53815 <!-- create phantom task for GC threads --> - [x] #54130 <!-- inference: handle `LimitedAccuracy` in `handle_global_assignment!` --> - [x] #54428 <!-- Move ConsoleLogging.jl into Base --> - [x] #54332 <!-- Revert "add unsetindex support to more copyto methods (#51760)" --> - [x] #53826 <!-- Make all command-line options documented in all related files --> - [x] #54465 <!-- typeintersect: conservative typevar subtitution during `finish_unionall` --> - [x] #54514 <!-- typeintersect: followup cleanup for the nothrow path of type instantiation --> - [x] #54499 <!-- make `@doc x` work without REPL loaded --> - [x] #54210 <!-- attach finalizer in `mmap` to the correct object --> - [x] #54359 <!-- Pkg REPL: cache `pkg_mode` lookup --> Non-merged PRs with backport label: - [ ] #54471 <!-- Actually setup jit targets when compiling packageimages instead of targeting only one --> - [ ] #54457 <!-- Make `String(::Memory)` copy --> - [ ] #54323 <!-- inference: fix too conservative effects for recursive cycles --> - [ ] #54322 <!-- effects: add new `@consistent_overlay` macro --> - [ ] #54191 <!-- make `AbstractPipe` public --> - [ ] #53957 <!-- tweak how filtering is done for what packages should be precompiled --> - [ ] #53882 <!-- Warn about cycles in extension precompilation --> - [ ] #53707 <!-- Make ScopedValue public --> - [ ] #53452 <!-- RFC: allow Tuple{Union{}}, returning Union{} --> - [ ] #53402 <!-- Add `jl_getaffinity` and `jl_setaffinity` --> - [ ] #53286 <!-- Raise an error when using `include_dependency` with non-existent file or directory --> - [ ] #52694 <!-- Reinstate similar for AbstractQ for backward compatibility --> - [ ] #51479 <!-- prevent code loading from lookin in the versioned environment when building Julia -->
a406fcd to
34a94d8
Compare
ee313e5 to
f3ff647
Compare
f3ff647 to
dfca4c2
Compare
32c6c99 to
f005c05
Compare
f005c05 to
b1e031e
Compare
|
CI failures are unrelated to this PR. Going to merge. |
This PR serves to replace #51080 and close #52940. It extends the `:nonoverlayed` to `UInt8` and introduces the `CONSISTENT_OVERLAY` effect bit, allowing for concrete evaluation of overlay methods using the original non-overlayed counterparts when applied. This newly added `:nonoverlayed`-bit is enabled through the newly added `Base.Experimental.@consistent_overlay mt def` macro. `@consistent_overlay` is similar to `@overlay`, but it sets the `:nonoverlayed`-bit to `CONSISTENT_OVERLAY` for the target method definition, allowing the method to be concrete-evaluated. To use this feature safely, I have also added quite precise documentation to `@consistent_overlay`.
This PR serves to replace #51080 and close #52940. It extends the `:nonoverlayed` to `UInt8` and introduces the `CONSISTENT_OVERLAY` effect bit, allowing for concrete evaluation of overlay methods using the original non-overlayed counterparts when applied. This newly added `:nonoverlayed`-bit is enabled through the newly added `Base.Experimental.@consistent_overlay mt def` macro. `@consistent_overlay` is similar to `@overlay`, but it sets the `:nonoverlayed`-bit to `CONSISTENT_OVERLAY` for the target method definition, allowing the method to be concrete-evaluated. To use this feature safely, I have also added quite precise documentation to `@consistent_overlay`.
This PR serves to replace #51080 and close #52940. It extends the `:nonoverlayed` to `UInt8` and introduces the `CONSISTENT_OVERLAY` effect bit, allowing for concrete evaluation of overlay methods using the original non-overlayed counterparts when applied. This newly added `:nonoverlayed`-bit is enabled through the newly added `Base.Experimental.@consistent_overlay mt def` macro. `@consistent_overlay` is similar to `@overlay`, but it sets the `:nonoverlayed`-bit to `CONSISTENT_OVERLAY` for the target method definition, allowing the method to be concrete-evaluated. To use this feature safely, I have also added quite precise documentation to `@consistent_overlay`.
Backported PRs: - [x] #54361 <!-- [LBT] Upgrade to v5.9.0 --> - [x] #54474 <!-- Unalias source from dest in copytrito --> - [x] #54548 <!-- Fixes for bitcast bugs with LLVM 17 / opaque pointers --> - [x] #54191 <!-- make `AbstractPipe` public --> - [x] #53402 <!-- Add `jl_getaffinity` and `jl_setaffinity` --> - [x] #53356 <!-- Rename at-scriptdir project argument to at-script and search upwards for Project.toml --> - [x] #54545 <!-- typeintersect: fix incorrect innervar handling under circular env --> - [x] #54586 <!-- Set storage class of julia globals to dllimport on windows to avoid auto-import weirdness. Forward port of #54572 --> - [x] #54587 <!-- Accomodate for rectangular matrices in `copytrito!` --> - [x] #54617 <!-- CLI: Use `GetModuleHandleExW` to locate libjulia.dll --> - [x] #54605 <!-- Allow libquadmath to also fail as it is not available on all systems --> - [x] #54634 <!-- Fix trampoline assembly for build on clang 18 on apple silicon --> - [x] #54635 <!-- Aggressive constprop in trevc! to stabilize triangular eigvec --> - [x] #54645 <!-- ensure we set the right value to gc_first_tid --> - [x] #54554 <!-- make elsize public --> - [x] #54648 <!-- Construct LazyString in error paths for tridiag --> - [x] #54658 <!-- fix missing uuid check on extension when finding the location of an extension --> - [x] #54594 <!-- Switch to Pkg mode prompt immediately and load Pkg in the background --> - [x] #54669 <!-- Improve error message in inplace transpose --> - [x] #54671 <!-- Add boundscheck in bindingkey_eq to avoid OOB access due to data race --> - [x] #54672 <!-- make: Fix `sed` command for LLVM libraries with no symbol versioning --> - [x] #54624 <!-- more precise aliasing checks for SubArray --> - [x] #54679 <!-- 🤖 [master] Bump the Distributed stdlib from 6a07d98 to 6c7cdb5 --> - [x] #54604 <!-- Fix tbaa annotation on union selector bytes inside of structs --> - [x] #54690 <!-- Fix assertion/crash when optimizing function with dead basic block --> - [x] #54704 <!-- LazyString in reinterpretarray error messages --> - [x] #54718 <!-- fix prepend StackOverflow issue --> - [x] #54674 <!-- Reimplement dummy pkg prompt as standard prompt --> - [x] #54737 <!-- LazyString in interpolated error messages involving types --> - [x] #54642 <!-- Document GenericMemory and AtomicMemory --> - [x] #54713 <!-- make: use `readelf` for LLVM symbol version detection --> - [x] #54760 <!-- REPL: improve prompt! async function handler --> - [x] #54606 <!-- fix double-counting and non-deterministic results in `summarysize` --> - [x] #54759 <!-- REPL: Fully populate the dummy Pkg prompt --> - [x] #54702 <!-- lowering: Recognize argument destructuring inside macro hygiene --> - [x] #54678 <!-- Don't let setglobal! implicitly create bindings --> - [x] #54730 <!-- Fix uuidkey of exts in fast path of `require_stdlib` --> - [x] #54765 <!-- Handle no-postdominator case in finalizer pass --> - [x] #54591 <!-- Don't expose guard pages to malloc_stack API consumers --> - [x] #54755 <!-- [TOML] remove Dates hack, replace with explicit usage --> - [x] #54721 <!-- add try/catch around scheduler to reset sleep state --> - [x] #54631 <!-- Avoid concatenating LazyString in setindex! for triangular matrices --> - [x] #54322 <!-- effects: add new `@consistent_overlay` macro --> - [x] #54785 - [x] #54865 - [x] #54815 - [x] #54795 - [x] #54779 - [x] #54837 Contains multiple commits, manual intervention needed: - [ ] #52694 <!-- Reinstate similar for AbstractQ for backward compatibility --> - [ ] #54649 <!-- Less restrictive copyto! signature for triangular matrices --> Non-merged PRs with backport label: - [ ] #54779 <!-- make recommendation clearer on manifest version mismatch --> - [ ] #54739 <!-- finish implementation of upgradable stdlibs --> - [ ] #54738 <!-- serialization: fix relocatability bug --> - [ ] #54574 <!-- Make ScopedValues public --> - [ ] #54457 <!-- Make `String(::Memory)` copy --> - [ ] #53957 <!-- tweak how filtering is done for what packages should be precompiled --> - [ ] #53452 <!-- RFC: allow Tuple{Union{}}, returning Union{} --> - [ ] #53286 <!-- Raise an error when using `include_dependency` with non-existent file or directory --> - [ ] #51479 <!-- prevent code loading from lookin in the versioned environment when building Julia -->
This PR serves to replace #51080 and close #52940.
It extends the
:nonoverlayedtoUInt8and introduces theCONSISTENT_OVERLAYeffect bit, allowing for concrete evaluation of overlay methods using the original non-overlayed counterparts when applied. Additionally, this PR adds a new effect override called:consistent_overlay.I've also included a relatively accurate description of
:consistent_overlay, as pointed out in #51080. Quoting from the newly added docstrings:Still, the explanation might not be comprehensive enough. I welcome any feedback.
EDIT: Now this feature is implemented with a new macro
@consistent_overlayinstead of extending the override set of@assume_effects.