-
Notifications
You must be signed in to change notification settings - Fork 151
Initial GPU support for P4estMesh 2D and 3D using simple KernelAbstractions kernels #2590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
189 commits
Select commit
Hold shift + click to select a range
26759a8
Use Adapt.jl to change storage and element type
vchuravy fc610f9
add docs and CUDAExt
vchuravy 7b5d81b
Aqua set unbound_args
vchuravy f730ef4
lower bound CUDA to 5.2
vchuravy 13b7f59
add initial CUDA pipeline
vchuravy 02de7d2
add storage_type, real_type to semidiscretize
vchuravy 671f5b1
add GPU construction test
vchuravy ecd09a5
don't adapt Array{MArray}
vchuravy 312009a
add some more cuda adapt tests
vchuravy 690efd1
use sources for dev branch
vchuravy 15a898b
fixup! use sources for dev branch
vchuravy 45d344b
use released version of CUDA
vchuravy 7e72eff
Update .buildkite/pipeline.yml
vchuravy 3450ddd
Use Adapt.jl to change storage and element type
vchuravy cf2f590
add docs and CUDAExt
vchuravy de96f85
Aqua set unbound_args
vchuravy 1a7cff2
lower bound CUDA to 5.2
vchuravy 68edf29
add initial CUDA pipeline
vchuravy 11ff63a
add storage_type, real_type to semidiscretize
vchuravy 4d8a31f
add GPU construction test
vchuravy 6ca8c3d
don't adapt Array{MArray}
vchuravy 4ef2d98
add some more cuda adapt tests
vchuravy 77395f5
use sources for dev branch
vchuravy 1d78f07
fixup! use sources for dev branch
vchuravy 39535ee
use released version of CUDA
vchuravy b973758
Update .buildkite/pipeline.yml
vchuravy 7105da7
fix test_p4est_2d
vchuravy 1fd6fe6
fix first GPU test
vchuravy d8a4bc8
Merge branch 'vc/adapt' into feature-gpu-offloading
benegee 6ceef3a
address review comments
vchuravy 7a53362
offload compute_coefficients
benegee 68eb905
fmt
benegee 3d00bdf
fixup! address review comments
vchuravy 4b32fa0
add review comments
vchuravy 10f7593
convert fstar_* cache entries to VecOfArrays
benegee c83bdbd
restore elixir
benegee 8c6c57d
Merge branch 'vc/adapt' into feature-gpu-offloading
benegee d3b94fc
test native version as well
benegee 97e13ec
adapt 1D and 3D version
benegee 44f7134
Downgrade compat with Adapt
benegee abbcc56
Use Adapt.jl to change storage and element type
vchuravy a18e5d2
restore elixir
benegee 5c942fe
offload compute_coefficients
benegee 47a55f2
fmt
benegee 36b0e4a
test native version as well
benegee 153d828
adapt 1D and 3D version
benegee 819ba75
Downgrade compat with Adapt
benegee e75cac7
update requires to 1.3
vchuravy 4b6f63e
Merge branch 'vc/adapt' into feature-gpu-offloading
benegee 61b4da1
Merge branch 'main' into feature-gpu-offloading
benegee e7cde27
missed during merge
benegee b174d6d
mistakes during merge
benegee 489bb24
cleanup
benegee b4d1535
Basis kernels for 3D P4est
benegee 2443cf8
port stepsize computation
benegee fc13ea5
CPU workaround for analysis callback
benegee 2ff2f52
tests
benegee bc4ad17
add benchmark
benegee de06c61
fix max_dt
benegee 29298a5
profiler output
benegee 281a540
Merge branch 'main' into feature-gpu-offloading
benegee 962a383
fmt
benegee a60e27d
missed max_dt calls
benegee ce742a3
Merge branch 'main' into feature-gpu-offloading
benegee 2073d7c
some fixes
benegee 9a2f130
after merge fixes
benegee 9a47f29
some more fixes
benegee 94f5d90
Merge branch 'main' into feature-gpu-offloading
benegee 6ffb69f
post merge fixes
benegee fb25fa2
Merge branch 'main' into feature-gpu-offloading
benegee 307c3eb
more
benegee c39b4de
more
benegee a38cc03
Squashed commit of the following:
benegee 013244d
Apply suggestions from code review
benegee 5b2c0bf
Merge branch 'feature-gpu-offloading' of github.com:trixi-framework/T…
benegee 8d5a55b
Merge branch 'main' into feature-gpu-offloading
benegee 8a98d27
!fixup
benegee 7de1e57
fmt
benegee 31a65cb
pass backend through
benegee 4064e79
fixes
benegee af50cda
backends here and there
benegee 5893d4d
almost everywhere
benegee a1caa12
some more
benegee a5cded3
next round
benegee 7c6ab4a
could this be...
benegee 719c2d1
adapts until 2d prolong2interfaces!
vivimie 6bbc069
adds explicit mesh type in signature
vivimie e58c298
adapts the rest for the 2d basic advection gpu elixir
vivimie a570beb
Merge branch 'main' into feature-gpu-offloading
benegee b59239b
enable 2D CUDA tests
benegee c0dd4b5
fmt
benegee f90f5a8
fixes bugs in the CPU implementation
vivimie 0291d14
Merge branch 'main' into feature-gpu-offloading
benegee 68ad089
Merge branch 'main' into feature-gpu-offloading
benegee 4ce90ab
fix
benegee ae9719d
fixes
benegee a13dd61
fix
benegee 8ecb6c4
fix
benegee 3d69311
no nextfloat per element
benegee a2f0488
fmt
benegee ff6dfd5
Merge branch 'main' into feature-gpu-offloading
benegee 31490d3
another RealT_for_test_tolerances
benegee 8e802ee
readd Project.toml
benegee 0824647
Merge branch 'main' into feature-gpu-offloading
benegee 77c1569
Merge branch 'main' into feature-gpu-offloading
benegee 71d837b
fmt
benegee ae3e415
fixes
benegee a801ebe
more
benegee 8829787
Merge branch 'main' into feature-gpu-offloading
benegee 2831c9c
add @inline for inner functions
benegee 34c4684
more fixes
benegee da4652d
Merge branch 'main' into feature-gpu-offloading
benegee e320bc5
define unsafe_wrap_or_alloc fuer CUDA.KernelAdaptor
vchuravy f72fcc1
fixup! define unsafe_wrap_or_alloc fuer CUDA.KernelAdaptor
vchuravy 4805a70
fixup! define unsafe_wrap_or_alloc fuer CUDA.KernelAdaptor
vchuravy 0fa07c4
apply bandaid
vchuravy 287a113
final fix?
benegee f8a1696
Merge branch 'feature-gpu-offloading' of github.com:trixi-framework/T…
benegee 1c2ea9b
Merge branch 'main' into feature-gpu-offloading
benegee dc6455d
add method to filter the cache
benegee 6af1201
final^2
benegee 2ecdf14
setup kernelabstraction harness
vchuravy c83404a
add advection_basic to KA tests
benegee c470dc9
no allocation tests
benegee 518e348
Merge branch 'main' into feature-gpu-offloading
benegee 6a3567a
missed
benegee 96cdec4
Update Project.toml
benegee 5f123ee
add sources section to benchmark Project.toml
benegee cec6865
Merge branch 'feature-gpu-offloading' of github.com:trixi-framework/T…
benegee 5974c2a
fix meshT
benegee 0a3448f
add backend argument for coupled semis
benegee 70ea410
fmt
benegee 95f0f03
fix
benegee 0727ec4
fix mesh type
benegee 39d4957
fix
benegee dc7dbb6
move get_backend to within rhs!
benegee 476b54f
remove backend from max_dt
benegee fbe2171
here as well
benegee f344f65
fix
benegee a3eb8c8
add old method signatures to stay compatible with TrixiAtmo.jl
benegee 04e0e2b
fix
benegee a1cdae1
Merge branch 'main' into feature-gpu-offloading
benegee 331d704
Merge branch 'main' into feature-gpu-offloading
benegee 6a95f55
meshT -> MeshT
benegee d7910c7
Apply suggestions from code review
benegee 32d41ef
module TestCUDA2D
benegee cc3c78b
use log_base and enable flux differencing
benegee 418c944
add a short note to the benchmark problem
benegee 2962ed7
add device_override for Trixi.log
vchuravy e116f7b
fixup! add device_override for Trixi.log
vchuravy a1d4481
fixup! add device_override for Trixi.log
vchuravy 5daf2c6
typo?
benegee b93bba9
fixup! add device_override for Trixi.log
vchuravy fc1cdf5
unify naming of inner methods
benegee 2f52234
fmt
benegee 8b8aa01
Merge branch 'feature-gpu-offloading' of github.com:trixi-framework/T…
benegee 917b3a6
add ndims(MeshT)
benegee 8c88cfe
Merge branch 'main' into feature-gpu-offloading
ranocha 0b751ac
add NEWS
benegee 97fdccb
add comment on how to use GPU
benegee 44cb1ba
comment
benegee 7bc64e1
Merge branch 'feature-gpu-offloading' of github.com:trixi-framework/T…
benegee 484f587
activate test_allocations for GPU tests
vchuravy e9e318f
fixup! activate test_allocations for GPU tests
vchuravy e63e887
fixup! activate test_allocations for GPU tests
vchuravy 7fc5dfe
fixup! fixup! activate test_allocations for GPU tests
vchuravy 984c402
fixup! fixup! fixup! activate test_allocations for GPU tests
vchuravy e279fc1
fixup!
vchuravy d0301fb
fix spell check
vchuravy 63565dc
add @invokelatest
benegee 90d19e6
add compat bounds
benegee f30dd8d
remove finalize
benegee b000ffb
remove kernels for backward compatibilty
benegee ae43324
no source terms
benegee 575e0a2
Merge branch 'feature-gpu-offloading' of github.com:trixi-framework/T…
benegee 34773f2
Merge branch 'main' into feature-gpu-offloading
benegee 56d9392
comment on MeshT
benegee eee2168
fixes
benegee 70b62d5
fmt
benegee 7a27196
try different formatting
benegee fb7d10b
fix
benegee ef19445
missed ndims
benegee c8431d2
Add GPU parallel set_zero!
vchuravy 4dc17be
set version to v0.16.0-DEV
ranocha cb3f7f9
Apply suggestions from code review
vchuravy e98fcf7
use dispatch for indices2direction
vchuravy 958cd57
Merge branch 'main' into feature-gpu-offloading
ranocha 52a05ad
Merge branch 'main' into feature-gpu-offloading
ranocha cb56c8a
Apply suggestion from @ranocha
ranocha File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| [deps] | ||
| CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba" | ||
| JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6" | ||
| OrdinaryDiffEqLowStorageRK = "b0944070-b475-4768-8dec-fb6eb410534d" | ||
| TimerOutputs = "a759f4b9-e2f1-59dc-863e-4aeb61b1ea8f" | ||
| Trixi = "a7f1ee26-1774-49b1-8366-f1abc58fbfcb" | ||
benegee marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| [sources] | ||
| Trixi = {path = "../.."} | ||
|
|
||
| [compat] | ||
| CUDA = "5.8.2" | ||
| JSON = "1.4.0" | ||
| OrdinaryDiffEqLowStorageRK = "1.12.0" | ||
| TimerOutputs = "0.5.25" | ||
| Trixi = "0.16" | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,76 @@ | ||
| using OrdinaryDiffEqLowStorageRK | ||
| using Trixi | ||
|
|
||
| ############################################################################### | ||
| # semidiscretization of the compressible Euler equations | ||
|
|
||
| equations = CompressibleEulerEquations3D(1.4) | ||
|
|
||
| function initial_condition_taylor_green_vortex(x, t, | ||
| equations::CompressibleEulerEquations3D) | ||
| A = 1.0 # magnitude of speed | ||
| Ms = 0.1 # maximum Mach number | ||
|
|
||
| rho = 1.0 | ||
| v1 = A * sin(x[1]) * cos(x[2]) * cos(x[3]) | ||
| v2 = -A * cos(x[1]) * sin(x[2]) * cos(x[3]) | ||
| v3 = 0.0 | ||
| p = (A / Ms)^2 * rho / equations.gamma # scaling to get Ms | ||
| p = p + | ||
| 1.0 / 16.0 * A^2 * rho * | ||
| (cos(2 * x[1]) * cos(2 * x[3]) + | ||
| 2 * cos(2 * x[2]) + 2 * cos(2 * x[1]) + cos(2 * x[2]) * cos(2 * x[3])) | ||
|
|
||
| return prim2cons(SVector(rho, v1, v2, v3, p), equations) | ||
| end | ||
|
|
||
| initial_condition = initial_condition_taylor_green_vortex | ||
|
|
||
| volume_flux = flux_ranocha | ||
| surface_flux = flux_lax_friedrichs | ||
| volume_integral = VolumeIntegralFluxDifferencing(volume_flux) | ||
| solver = DGSEM(polydeg = 5, surface_flux = surface_flux, volume_integral = volume_integral) | ||
|
|
||
| coordinates_min = (-1.0, -1.0, -1.0) .* pi | ||
| coordinates_max = (1.0, 1.0, 1.0) .* pi | ||
|
|
||
| initial_refinement_level = 1 | ||
| trees_per_dimension = (4, 4, 4) | ||
|
|
||
| mesh = P4estMesh(trees_per_dimension, polydeg = 1, | ||
| coordinates_min = coordinates_min, coordinates_max = coordinates_max, | ||
| periodicity = true, initial_refinement_level = initial_refinement_level) | ||
|
|
||
| semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition, solver; | ||
| boundary_conditions = boundary_condition_periodic) | ||
|
|
||
| ############################################################################### | ||
| # ODE solvers, callbacks etc. | ||
|
|
||
| tspan = (0.0, 100.0) | ||
| ode = semidiscretize(semi, tspan; storage_type = nothing, real_type = nothing) | ||
|
|
||
| summary_callback = SummaryCallback() | ||
|
|
||
| stepsize_callback = StepsizeCallback(cfl = 0.1) | ||
|
|
||
| callbacks = CallbackSet(summary_callback, | ||
| stepsize_callback) | ||
|
|
||
| ############################################################################### | ||
| # run the simulation | ||
|
|
||
| maxiters = 200 | ||
| run_profiler = false | ||
|
|
||
| # disable warnings when maxiters is reached | ||
| integrator = init(ode, CarpenterKennedy2N54(williamson_condition = false), | ||
| dt = 1.0, | ||
| save_everystep = false, callback = callbacks, | ||
| maxiters = maxiters, verbose = false) | ||
| if run_profiler | ||
| prof_result = CUDA.@profile solve!(integrator) | ||
| else | ||
| solve!(integrator) | ||
| prof_result = nothing | ||
| end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,91 @@ | ||
| using Trixi | ||
| using CUDA | ||
| using TimerOutputs | ||
| using JSON | ||
|
|
||
| function main(elixir_path) | ||
|
|
||
| # setup | ||
| maxiters = 50 | ||
| initial_refinement_level = 3 | ||
| storage_type = CuArray | ||
| real_type = Float64 | ||
|
|
||
| println("Warming up...") | ||
|
|
||
| # start simulation with tiny final time to trigger compilation | ||
| duration_compile = @elapsed begin | ||
| trixi_include(elixir_path, | ||
| tspan = (0.0, 1e-14), | ||
| storage_type = storage_type, | ||
| real_type = real_type) | ||
| trixi_include(elixir_path, | ||
| tspan = (0.0, 1e-14), | ||
| storage_type = storage_type, | ||
| real_type = Float32) | ||
| end | ||
|
|
||
| println("Finished warm-up in $duration_compile seconds\n") | ||
| println("Starting simulation...") | ||
|
|
||
| # start the real simulation | ||
| duration_elixir = @elapsed trixi_include(elixir_path, | ||
| maxiters = maxiters, | ||
| initial_refinement_level = initial_refinement_level, | ||
| storage_type = storage_type, | ||
| real_type = real_type) | ||
|
|
||
| # store metrics (on every rank!) | ||
| metrics = Dict{String, Float64}("elapsed time" => duration_elixir) | ||
|
|
||
| # read TimerOutputs timings | ||
| timer = Trixi.timer() | ||
| metrics["total time"] = 1.0e-9 * TimerOutputs.tottime(timer) | ||
| metrics["rhs! time"] = 1.0e-9 * TimerOutputs.time(timer["rhs!"]) | ||
|
|
||
| # compute performance index | ||
| latest_semi = @invokelatest (@__MODULE__).semi | ||
| nrhscalls = Trixi.ncalls(latest_semi.performance_counter) | ||
| walltime = 1.0e-9 * take!(latest_semi.performance_counter) | ||
| metrics["PID"] = walltime * Trixi.mpi_nranks() / | ||
| (Trixi.ndofsglobal(latest_semi) * nrhscalls) | ||
|
|
||
| # write json file | ||
| open("metrics.out", "w") do f | ||
| indent = 2 | ||
| JSON.print(f, metrics, indent) | ||
| end | ||
|
|
||
| # run profiler | ||
| maxiters = 5 | ||
| initial_refinement_level = 1 | ||
|
|
||
| println("Running profiler (Float64)...") | ||
| trixi_include(elixir_path, | ||
| maxiters = maxiters, | ||
| initial_refinement_level = initial_refinement_level, | ||
| storage_type = storage_type, | ||
| real_type = Float64, | ||
| run_profiler = true) | ||
|
|
||
| open("profile_float64.txt", "w") do io | ||
| show(io, @invokelatest (@__MODULE__).prof_result) | ||
| end | ||
|
|
||
| println("Running profiler (Float32)...") | ||
| trixi_include(elixir_path, | ||
| maxiters = maxiters, | ||
| initial_refinement_level = initial_refinement_level, | ||
| storage_type = storage_type, | ||
| real_type = Float32, | ||
| run_profiler = true) | ||
|
|
||
| open("profile_float32.txt", "w") do io | ||
| show(io, @invokelatest (@__MODULE__).prof_result) | ||
| end | ||
| end | ||
|
|
||
| # hardcoded elixir | ||
| elixir_path = joinpath(@__DIR__(), "elixir_euler_taylor_green_vortex.jl") | ||
|
|
||
| main(elixir_path) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,63 @@ | ||
| # The same setup as tree_3d_dgsem/elixir_advection_basic.jl | ||
| # to verify GPU support and Adapt.jl support. | ||
|
|
||
| using OrdinaryDiffEqLowStorageRK | ||
| using Trixi | ||
|
|
||
| ############################################################################### | ||
| # semidiscretization of the linear advection equation | ||
|
|
||
| advection_velocity = (0.2, -0.7, 0.5) | ||
| equations = LinearScalarAdvectionEquation3D(advection_velocity) | ||
|
|
||
| # Create DG solver with polynomial degree = 3 and (local) Lax-Friedrichs/Rusanov flux as surface flux | ||
| solver = DGSEM(polydeg = 3, surface_flux = flux_lax_friedrichs) | ||
|
|
||
| coordinates_min = (-1.0, -1.0, -1.0) # minimum coordinates (min(x), min(y), min(z)) | ||
| coordinates_max = (1.0, 1.0, 1.0) # maximum coordinates (max(x), max(y), max(z)) | ||
|
|
||
| # Create P4estMesh with 8 x 8 x 8 elements (note `refinement_level=1`) | ||
| trees_per_dimension = (4, 4, 4) | ||
| mesh = P4estMesh(trees_per_dimension, polydeg = 3, | ||
| coordinates_min = coordinates_min, coordinates_max = coordinates_max, | ||
| initial_refinement_level = 1, | ||
| periodicity = true) | ||
|
|
||
| # A semidiscretization collects data structures and functions for the spatial discretization | ||
| semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition_convergence_test, | ||
| solver; | ||
| boundary_conditions = boundary_condition_periodic) | ||
|
|
||
| ############################################################################### | ||
| # ODE solvers, callbacks etc. | ||
|
|
||
| # Create ODE problem with time span from 0.0 to 1.0 | ||
| # Change `storage_type` to, e.g., `CuArray` to actually run on GPU | ||
| tspan = (0.0, 1.0) | ||
| ode = semidiscretize(semi, tspan; real_type = nothing, storage_type = nothing) | ||
|
|
||
| # At the beginning of the main loop, the SummaryCallback prints a summary of the simulation setup | ||
| # and resets the timers | ||
| summary_callback = SummaryCallback() | ||
|
|
||
| # The AnalysisCallback allows to analyse the solution in regular intervals and prints the results | ||
| analysis_callback = AnalysisCallback(semi, interval = 100) | ||
|
|
||
| # The SaveSolutionCallback allows to save the solution to a file in regular intervals | ||
| save_solution = SaveSolutionCallback(interval = 100, | ||
| solution_variables = cons2prim) | ||
|
|
||
| # The StepsizeCallback handles the re-calculation of the maximum Δt after each time step | ||
| stepsize_callback = StepsizeCallback(cfl = 1.2) | ||
|
|
||
| # Create a CallbackSet to collect all callbacks such that they can be passed to the ODE solver | ||
| callbacks = CallbackSet(summary_callback, analysis_callback, | ||
| save_solution, stepsize_callback) | ||
|
|
||
| ############################################################################### | ||
| # run the simulation | ||
|
|
||
| # OrdinaryDiffEq's `solve` method evolves the solution in time and executes the passed callbacks | ||
| sol = solve(ode, CarpenterKennedy2N54(williamson_condition = false); | ||
| dt = 0.05, # solve needs some value here but it will be overwritten by the stepsize_callback | ||
| ode_default_options()..., callback = callbacks); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,11 +1,27 @@ | ||
| # Package extension for adding CUDA-based features to Trixi.jl | ||
| module TrixiCUDAExt | ||
|
|
||
| import CUDA: CuArray | ||
| using CUDA: CUDA, CuArray, CuDeviceArray, KernelAdaptor, @device_override | ||
| import Trixi | ||
|
|
||
| function Trixi.storage_type(::Type{<:CuArray}) | ||
| return CuArray | ||
| end | ||
|
|
||
| function Trixi.unsafe_wrap_or_alloc(::KernelAdaptor, vec, size) | ||
| return Trixi.unsafe_wrap_or_alloc(CuDeviceArray, vec, size) | ||
| end | ||
|
|
||
| function Trixi.unsafe_wrap_or_alloc(::Type{<:CuDeviceArray}, vec::CuDeviceArray, size) | ||
| return reshape(vec, size) | ||
| end | ||
|
|
||
| @static if Trixi._PREFERENCE_LOG == "log_Trixi_NaN" | ||
| @device_override Trixi.log(x::Float64) = ccall("extern __nv_log", llvmcall, Cdouble, | ||
| (Cdouble,), x) | ||
| @device_override Trixi.log(x::Float32) = ccall("extern __nv_logf", llvmcall, Cfloat, | ||
| (Cfloat,), x) | ||
| # TODO: Trixi.log(x::Float16) | ||
| end | ||
|
|
||
| end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.