Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
189 commits
Select commit Hold shift + click to select a range
26759a8
Use Adapt.jl to change storage and element type
vchuravy Dec 17, 2024
fc610f9
add docs and CUDAExt
vchuravy Apr 21, 2025
7b5d81b
Aqua set unbound_args
vchuravy Apr 21, 2025
f730ef4
lower bound CUDA to 5.2
vchuravy Apr 22, 2025
13b7f59
add initial CUDA pipeline
vchuravy Apr 21, 2025
02de7d2
add storage_type, real_type to semidiscretize
vchuravy Apr 22, 2025
671f5b1
add GPU construction test
vchuravy Apr 22, 2025
ecd09a5
don't adapt Array{MArray}
vchuravy Apr 22, 2025
312009a
add some more cuda adapt tests
vchuravy Apr 22, 2025
690efd1
use sources for dev branch
vchuravy Apr 28, 2025
15a898b
fixup! use sources for dev branch
vchuravy May 8, 2025
45d344b
use released version of CUDA
vchuravy May 14, 2025
7e72eff
Update .buildkite/pipeline.yml
vchuravy May 14, 2025
3450ddd
Use Adapt.jl to change storage and element type
vchuravy Dec 17, 2024
cf2f590
add docs and CUDAExt
vchuravy Apr 21, 2025
de96f85
Aqua set unbound_args
vchuravy Apr 21, 2025
1a7cff2
lower bound CUDA to 5.2
vchuravy Apr 22, 2025
68edf29
add initial CUDA pipeline
vchuravy Apr 21, 2025
11ff63a
add storage_type, real_type to semidiscretize
vchuravy Apr 22, 2025
4d8a31f
add GPU construction test
vchuravy Apr 22, 2025
6ca8c3d
don't adapt Array{MArray}
vchuravy Apr 22, 2025
4ef2d98
add some more cuda adapt tests
vchuravy Apr 22, 2025
77395f5
use sources for dev branch
vchuravy Apr 28, 2025
1d78f07
fixup! use sources for dev branch
vchuravy May 8, 2025
39535ee
use released version of CUDA
vchuravy May 14, 2025
b973758
Update .buildkite/pipeline.yml
vchuravy May 14, 2025
7105da7
fix test_p4est_2d
vchuravy Jun 30, 2025
1fd6fe6
fix first GPU test
vchuravy Jun 30, 2025
d8a4bc8
Merge branch 'vc/adapt' into feature-gpu-offloading
benegee Jul 1, 2025
6ceef3a
address review comments
vchuravy Jul 1, 2025
7a53362
offload compute_coefficients
benegee Jul 1, 2025
68eb905
fmt
benegee Jul 1, 2025
3d00bdf
fixup! address review comments
vchuravy Jul 1, 2025
4b32fa0
add review comments
vchuravy Jul 1, 2025
10f7593
convert fstar_* cache entries to VecOfArrays
benegee Jul 1, 2025
c83bdbd
restore elixir
benegee Jul 1, 2025
8c6c57d
Merge branch 'vc/adapt' into feature-gpu-offloading
benegee Jul 2, 2025
d3b94fc
test native version as well
benegee Jul 2, 2025
97e13ec
adapt 1D and 3D version
benegee Jul 2, 2025
44f7134
Downgrade compat with Adapt
benegee Jul 2, 2025
abbcc56
Use Adapt.jl to change storage and element type
vchuravy Dec 17, 2024
a18e5d2
restore elixir
benegee Jul 1, 2025
5c942fe
offload compute_coefficients
benegee Jul 1, 2025
47a55f2
fmt
benegee Jul 1, 2025
36b0e4a
test native version as well
benegee Jul 2, 2025
153d828
adapt 1D and 3D version
benegee Jul 2, 2025
819ba75
Downgrade compat with Adapt
benegee Jul 2, 2025
e75cac7
update requires to 1.3
vchuravy Jul 2, 2025
4b6f63e
Merge branch 'vc/adapt' into feature-gpu-offloading
benegee Jul 2, 2025
61b4da1
Merge branch 'main' into feature-gpu-offloading
benegee Sep 16, 2025
e7cde27
missed during merge
benegee Sep 16, 2025
b174d6d
mistakes during merge
benegee Sep 16, 2025
489bb24
cleanup
benegee Sep 18, 2025
b4d1535
Basis kernels for 3D P4est
benegee Sep 18, 2025
2443cf8
port stepsize computation
benegee Sep 18, 2025
fc13ea5
CPU workaround for analysis callback
benegee Sep 18, 2025
2ff2f52
tests
benegee Sep 18, 2025
bc4ad17
add benchmark
benegee Sep 19, 2025
de06c61
fix max_dt
benegee Sep 19, 2025
29298a5
profiler output
benegee Sep 25, 2025
281a540
Merge branch 'main' into feature-gpu-offloading
benegee Sep 29, 2025
962a383
fmt
benegee Sep 29, 2025
a60e27d
missed max_dt calls
benegee Sep 29, 2025
ce742a3
Merge branch 'main' into feature-gpu-offloading
benegee Sep 30, 2025
2073d7c
some fixes
benegee Sep 30, 2025
9a2f130
after merge fixes
benegee Sep 30, 2025
9a47f29
some more fixes
benegee Sep 30, 2025
94f5d90
Merge branch 'main' into feature-gpu-offloading
benegee Oct 1, 2025
6ffb69f
post merge fixes
benegee Oct 1, 2025
fb25fa2
Merge branch 'main' into feature-gpu-offloading
benegee Oct 1, 2025
307c3eb
more
benegee Oct 1, 2025
c39b4de
more
benegee Oct 1, 2025
a38cc03
Squashed commit of the following:
benegee Oct 7, 2025
013244d
Apply suggestions from code review
benegee Oct 8, 2025
5b2c0bf
Merge branch 'feature-gpu-offloading' of github.com:trixi-framework/T…
benegee Oct 8, 2025
8d5a55b
Merge branch 'main' into feature-gpu-offloading
benegee Oct 8, 2025
8a98d27
!fixup
benegee Oct 8, 2025
7de1e57
fmt
benegee Oct 8, 2025
31a65cb
pass backend through
benegee Oct 8, 2025
4064e79
fixes
benegee Oct 8, 2025
af50cda
backends here and there
benegee Oct 8, 2025
5893d4d
almost everywhere
benegee Oct 8, 2025
a1caa12
some more
benegee Oct 8, 2025
a5cded3
next round
benegee Oct 8, 2025
7c6ab4a
could this be...
benegee Oct 9, 2025
719c2d1
adapts until 2d prolong2interfaces!
vivimie Nov 6, 2025
6bbc069
adds explicit mesh type in signature
vivimie Nov 6, 2025
e58c298
adapts the rest for the 2d basic advection gpu elixir
vivimie Nov 7, 2025
a570beb
Merge branch 'main' into feature-gpu-offloading
benegee Nov 27, 2025
b59239b
enable 2D CUDA tests
benegee Nov 27, 2025
c0dd4b5
fmt
benegee Nov 27, 2025
f90f5a8
fixes bugs in the CPU implementation
vivimie Dec 3, 2025
0291d14
Merge branch 'main' into feature-gpu-offloading
benegee Jan 19, 2026
68ad089
Merge branch 'main' into feature-gpu-offloading
benegee Jan 19, 2026
4ce90ab
fix
benegee Jan 19, 2026
ae9719d
fixes
benegee Jan 19, 2026
a13dd61
fix
benegee Jan 20, 2026
8ecb6c4
fix
benegee Jan 20, 2026
3d69311
no nextfloat per element
benegee Jan 20, 2026
a2f0488
fmt
benegee Jan 20, 2026
ff6dfd5
Merge branch 'main' into feature-gpu-offloading
benegee Jan 20, 2026
31490d3
another RealT_for_test_tolerances
benegee Jan 20, 2026
8e802ee
readd Project.toml
benegee Feb 10, 2026
0824647
Merge branch 'main' into feature-gpu-offloading
benegee Feb 23, 2026
77c1569
Merge branch 'main' into feature-gpu-offloading
benegee Feb 23, 2026
71d837b
fmt
benegee Feb 23, 2026
ae3e415
fixes
benegee Feb 23, 2026
a801ebe
more
benegee Feb 23, 2026
8829787
Merge branch 'main' into feature-gpu-offloading
benegee Feb 24, 2026
2831c9c
add @inline for inner functions
benegee Feb 24, 2026
34c4684
more fixes
benegee Feb 24, 2026
da4652d
Merge branch 'main' into feature-gpu-offloading
benegee Feb 24, 2026
e320bc5
define unsafe_wrap_or_alloc fuer CUDA.KernelAdaptor
vchuravy Feb 24, 2026
f72fcc1
fixup! define unsafe_wrap_or_alloc fuer CUDA.KernelAdaptor
vchuravy Feb 24, 2026
4805a70
fixup! define unsafe_wrap_or_alloc fuer CUDA.KernelAdaptor
vchuravy Feb 24, 2026
0fa07c4
apply bandaid
vchuravy Feb 24, 2026
287a113
final fix?
benegee Feb 24, 2026
f8a1696
Merge branch 'feature-gpu-offloading' of github.com:trixi-framework/T…
benegee Feb 24, 2026
1c2ea9b
Merge branch 'main' into feature-gpu-offloading
benegee Feb 24, 2026
dc6455d
add method to filter the cache
benegee Feb 24, 2026
6af1201
final^2
benegee Feb 24, 2026
2ecdf14
setup kernelabstraction harness
vchuravy Feb 25, 2026
c83404a
add advection_basic to KA tests
benegee Feb 25, 2026
c470dc9
no allocation tests
benegee Feb 26, 2026
518e348
Merge branch 'main' into feature-gpu-offloading
benegee Mar 16, 2026
6a3567a
missed
benegee Mar 16, 2026
96cdec4
Update Project.toml
benegee Mar 16, 2026
5f123ee
add sources section to benchmark Project.toml
benegee Mar 16, 2026
cec6865
Merge branch 'feature-gpu-offloading' of github.com:trixi-framework/T…
benegee Mar 16, 2026
5974c2a
fix meshT
benegee Mar 18, 2026
0a3448f
add backend argument for coupled semis
benegee Mar 18, 2026
70ea410
fmt
benegee Mar 18, 2026
95f0f03
fix
benegee Mar 18, 2026
0727ec4
fix mesh type
benegee Mar 18, 2026
39d4957
fix
benegee Mar 18, 2026
dc7dbb6
move get_backend to within rhs!
benegee Mar 18, 2026
476b54f
remove backend from max_dt
benegee Mar 18, 2026
fbe2171
here as well
benegee Mar 18, 2026
f344f65
fix
benegee Mar 18, 2026
a3eb8c8
add old method signatures to stay compatible with TrixiAtmo.jl
benegee Mar 18, 2026
04e0e2b
fix
benegee Mar 19, 2026
a1cdae1
Merge branch 'main' into feature-gpu-offloading
benegee Mar 19, 2026
331d704
Merge branch 'main' into feature-gpu-offloading
benegee Mar 24, 2026
6a95f55
meshT -> MeshT
benegee Mar 24, 2026
d7910c7
Apply suggestions from code review
benegee Mar 24, 2026
32d41ef
module TestCUDA2D
benegee Mar 24, 2026
cc3c78b
use log_base and enable flux differencing
benegee Mar 24, 2026
418c944
add a short note to the benchmark problem
benegee Mar 24, 2026
2962ed7
add device_override for Trixi.log
vchuravy Mar 24, 2026
e116f7b
fixup! add device_override for Trixi.log
vchuravy Mar 24, 2026
a1d4481
fixup! add device_override for Trixi.log
vchuravy Mar 24, 2026
5daf2c6
typo?
benegee Mar 24, 2026
b93bba9
fixup! add device_override for Trixi.log
vchuravy Mar 24, 2026
fc1cdf5
unify naming of inner methods
benegee Mar 24, 2026
2f52234
fmt
benegee Mar 24, 2026
8b8aa01
Merge branch 'feature-gpu-offloading' of github.com:trixi-framework/T…
benegee Mar 24, 2026
917b3a6
add ndims(MeshT)
benegee Mar 24, 2026
8c88cfe
Merge branch 'main' into feature-gpu-offloading
ranocha Mar 24, 2026
0b751ac
add NEWS
benegee Mar 25, 2026
97fdccb
add comment on how to use GPU
benegee Mar 25, 2026
44cb1ba
comment
benegee Mar 25, 2026
7bc64e1
Merge branch 'feature-gpu-offloading' of github.com:trixi-framework/T…
benegee Mar 25, 2026
484f587
activate test_allocations for GPU tests
vchuravy Mar 25, 2026
e9e318f
fixup! activate test_allocations for GPU tests
vchuravy Mar 25, 2026
e63e887
fixup! activate test_allocations for GPU tests
vchuravy Mar 25, 2026
7fc5dfe
fixup! fixup! activate test_allocations for GPU tests
vchuravy Mar 25, 2026
984c402
fixup! fixup! fixup! activate test_allocations for GPU tests
vchuravy Mar 25, 2026
e279fc1
fixup!
vchuravy Mar 25, 2026
d0301fb
fix spell check
vchuravy Mar 25, 2026
63565dc
add @invokelatest
benegee Mar 25, 2026
90d19e6
add compat bounds
benegee Mar 25, 2026
f30dd8d
remove finalize
benegee Mar 25, 2026
b000ffb
remove kernels for backward compatibilty
benegee Mar 25, 2026
ae43324
no source terms
benegee Mar 25, 2026
575e0a2
Merge branch 'feature-gpu-offloading' of github.com:trixi-framework/T…
benegee Mar 25, 2026
34773f2
Merge branch 'main' into feature-gpu-offloading
benegee Mar 25, 2026
56d9392
comment on MeshT
benegee Mar 25, 2026
eee2168
fixes
benegee Mar 25, 2026
70b62d5
fmt
benegee Mar 25, 2026
7a27196
try different formatting
benegee Mar 25, 2026
fb7d10b
fix
benegee Mar 25, 2026
ef19445
missed ndims
benegee Mar 26, 2026
c8431d2
Add GPU parallel set_zero!
vchuravy Mar 26, 2026
4dc17be
set version to v0.16.0-DEV
ranocha Mar 26, 2026
cb3f7f9
Apply suggestions from code review
vchuravy Mar 26, 2026
e98fcf7
use dispatch for indices2direction
vchuravy Mar 26, 2026
958cd57
Merge branch 'main' into feature-gpu-offloading
ranocha Mar 26, 2026
52a05ad
Merge branch 'main' into feature-gpu-offloading
ranocha Mar 26, 2026
cb56c8a
Apply suggestion from @ranocha
ranocha Mar 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ jobs:
- performance_specializations
- mpi
- threaded
- kernelabstractions
include:
- version: '1.11'
os: ubuntu-latest
Expand Down
17 changes: 17 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,23 @@ for human readability.

## Changes when updating to v0.16 from v0.15.x

#### Added

- Introducing GPU support: Based on work by Jan Kraus and Lars Christmann, Trixi.jl can
now partly be executed on GPUs. This includes simulations with flux differencing on
`P4estMesh` in 2D and 3D. Adaptive mesh refinement, multi-GPU, source terms, and callbacks
are not available, yet. Offloading is achieved via KernelAbstractions.jl kernels,
which, at the moment, execute the same code as usually run on CPUs. A backend is selected
by passing an appropriate data type as keyword argument `storage_type` to
`semidiscretize`. See the
[heterogeneous](https://trixi-framework.github.io/TrixiDocumentation/dev/heterogeneous/)
section for some instructions on how to port kernels. This is however still preliminaray
and will change.
GPU kernels are currently CI-tested on NVIDIA GPUs in a buildkite workflow using
`TRIXI_TEST=CUDA` ([#2590]).

#### Changed

- The implementation of the local DG (`ViscousFormulationLocalDG`) `solver_parabolic` has been changed for the `P4estMesh`.
In particular, instead of computing the `ldg_switch` as the dot product of the normal direction with ones,
i.e., summing up the normal components, the `ldg_switch` is now selected as
Expand Down
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ EllipsisNotation = "1.0"
FillArrays = "1.13"
ForwardDiff = "0.10.38, 1"
HDF5 = "0.17"
KernelAbstractions = "0.9.36"
KernelAbstractions = "0.9.38"
LinearAlgebra = "1"
LinearMaps = "2.7, 3.0"
LoopVectorization = "0.12.171"
Expand Down
16 changes: 16 additions & 0 deletions benchmark/CUDA/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[deps]
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
OrdinaryDiffEqLowStorageRK = "b0944070-b475-4768-8dec-fb6eb410534d"
TimerOutputs = "a759f4b9-e2f1-59dc-863e-4aeb61b1ea8f"
Trixi = "a7f1ee26-1774-49b1-8366-f1abc58fbfcb"

[sources]
Trixi = {path = "../.."}

[compat]
CUDA = "5.8.2"
JSON = "1.4.0"
OrdinaryDiffEqLowStorageRK = "1.12.0"
TimerOutputs = "0.5.25"
Trixi = "0.16"
76 changes: 76 additions & 0 deletions benchmark/CUDA/elixir_euler_taylor_green_vortex.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
using OrdinaryDiffEqLowStorageRK
using Trixi

###############################################################################
# semidiscretization of the compressible Euler equations

equations = CompressibleEulerEquations3D(1.4)

function initial_condition_taylor_green_vortex(x, t,
equations::CompressibleEulerEquations3D)
A = 1.0 # magnitude of speed
Ms = 0.1 # maximum Mach number

rho = 1.0
v1 = A * sin(x[1]) * cos(x[2]) * cos(x[3])
v2 = -A * cos(x[1]) * sin(x[2]) * cos(x[3])
v3 = 0.0
p = (A / Ms)^2 * rho / equations.gamma # scaling to get Ms
p = p +
1.0 / 16.0 * A^2 * rho *
(cos(2 * x[1]) * cos(2 * x[3]) +
2 * cos(2 * x[2]) + 2 * cos(2 * x[1]) + cos(2 * x[2]) * cos(2 * x[3]))

return prim2cons(SVector(rho, v1, v2, v3, p), equations)
end

initial_condition = initial_condition_taylor_green_vortex

volume_flux = flux_ranocha
surface_flux = flux_lax_friedrichs
volume_integral = VolumeIntegralFluxDifferencing(volume_flux)
solver = DGSEM(polydeg = 5, surface_flux = surface_flux, volume_integral = volume_integral)

coordinates_min = (-1.0, -1.0, -1.0) .* pi
coordinates_max = (1.0, 1.0, 1.0) .* pi

initial_refinement_level = 1
trees_per_dimension = (4, 4, 4)

mesh = P4estMesh(trees_per_dimension, polydeg = 1,
coordinates_min = coordinates_min, coordinates_max = coordinates_max,
periodicity = true, initial_refinement_level = initial_refinement_level)

semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition, solver;
boundary_conditions = boundary_condition_periodic)

###############################################################################
# ODE solvers, callbacks etc.

tspan = (0.0, 100.0)
ode = semidiscretize(semi, tspan; storage_type = nothing, real_type = nothing)

summary_callback = SummaryCallback()

stepsize_callback = StepsizeCallback(cfl = 0.1)

callbacks = CallbackSet(summary_callback,
stepsize_callback)

###############################################################################
# run the simulation

maxiters = 200
run_profiler = false

# disable warnings when maxiters is reached
integrator = init(ode, CarpenterKennedy2N54(williamson_condition = false),
dt = 1.0,
save_everystep = false, callback = callbacks,
maxiters = maxiters, verbose = false)
if run_profiler
prof_result = CUDA.@profile solve!(integrator)
else
solve!(integrator)
prof_result = nothing
end
91 changes: 91 additions & 0 deletions benchmark/CUDA/run.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
using Trixi
using CUDA
using TimerOutputs
using JSON

function main(elixir_path)

# setup
maxiters = 50
initial_refinement_level = 3
storage_type = CuArray
real_type = Float64

println("Warming up...")

# start simulation with tiny final time to trigger compilation
duration_compile = @elapsed begin
trixi_include(elixir_path,
tspan = (0.0, 1e-14),
storage_type = storage_type,
real_type = real_type)
trixi_include(elixir_path,
tspan = (0.0, 1e-14),
storage_type = storage_type,
real_type = Float32)
end

println("Finished warm-up in $duration_compile seconds\n")
println("Starting simulation...")

# start the real simulation
duration_elixir = @elapsed trixi_include(elixir_path,
maxiters = maxiters,
initial_refinement_level = initial_refinement_level,
storage_type = storage_type,
real_type = real_type)

# store metrics (on every rank!)
metrics = Dict{String, Float64}("elapsed time" => duration_elixir)

# read TimerOutputs timings
timer = Trixi.timer()
metrics["total time"] = 1.0e-9 * TimerOutputs.tottime(timer)
metrics["rhs! time"] = 1.0e-9 * TimerOutputs.time(timer["rhs!"])

# compute performance index
latest_semi = @invokelatest (@__MODULE__).semi
nrhscalls = Trixi.ncalls(latest_semi.performance_counter)
walltime = 1.0e-9 * take!(latest_semi.performance_counter)
metrics["PID"] = walltime * Trixi.mpi_nranks() /
(Trixi.ndofsglobal(latest_semi) * nrhscalls)

# write json file
open("metrics.out", "w") do f
indent = 2
JSON.print(f, metrics, indent)
end

# run profiler
maxiters = 5
initial_refinement_level = 1

println("Running profiler (Float64)...")
trixi_include(elixir_path,
maxiters = maxiters,
initial_refinement_level = initial_refinement_level,
storage_type = storage_type,
real_type = Float64,
run_profiler = true)

open("profile_float64.txt", "w") do io
show(io, @invokelatest (@__MODULE__).prof_result)
end

println("Running profiler (Float32)...")
trixi_include(elixir_path,
maxiters = maxiters,
initial_refinement_level = initial_refinement_level,
storage_type = storage_type,
real_type = Float32,
run_profiler = true)

open("profile_float32.txt", "w") do io
show(io, @invokelatest (@__MODULE__).prof_result)
end
end

# hardcoded elixir
elixir_path = joinpath(@__DIR__(), "elixir_euler_taylor_green_vortex.jl")

main(elixir_path)
11 changes: 8 additions & 3 deletions docs/src/heterogeneous.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,9 +120,14 @@ function trixi_rhs_fct(mesh, equations, solver, cache, args)
end
```

1. Put the inner code in a new function `rhs_fct_per_element`. Besides the index
`element`, pass all required fields as arguments, but make sure to `@unpack` them from
their structs in advance.
1. Move the inner code into a new inlined function `rhs_fct_per_element`.
```julia
@inline function rhs_fct_per_element(..., element, ...)
...
end
```
Besides the index `element`, pass all required fields as arguments, but make sure to
`@unpack` them from their structs in advance.
2. Where `trixi_rhs_fct` is called, get the backend, i.e., the hardware we are currently
running on via `trixi_backend(x)`.
This will, e.g., work with `u_ode`. Internally, KernelAbstractions.jl's `get_backend`
Expand Down
4 changes: 4 additions & 0 deletions docs/src/styleguide.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ conventions, we apply and enforce automated source code formatting
and its siblings, put the `cache` first.
* Some internal functions take a "computational backend" argument, this should always be passed as the first argument.
* Otherwise, use the order `mesh, equations, solver, cache`.
* In course of GPU offloading we sometimes pass `MeshT = typeof(mesh)` instead of
`mesh` when the called method needs the type of the mesh for dispatch only. This part
of the code is in active development and not considered to be stable API at the
moment.
* If something needs to be specified in more detail for dispatch, put the additional argument before the general one
that is specified in more detail. For example, we use `have_nonconservative_terms(equations), equations`
and `dg.mortar, dg`.
Expand Down
6 changes: 3 additions & 3 deletions examples/p4est_2d_dgsem/elixir_advection_basic_gpu.jl
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition_convergen
# ODE solvers, callbacks etc.

# Create ODE problem with time span from 0.0 to 1.0
# Change `storage_type` to, e.g., `CuArray` to actually run on GPU
ode = semidiscretize(semi, (0.0, 1.0); real_type = nothing, storage_type = nothing)

# At the beginning of the main loop, the SummaryCallback prints a summary of the simulation setup
Expand All @@ -50,9 +51,8 @@ save_solution = SaveSolutionCallback(interval = 100,
stepsize_callback = StepsizeCallback(cfl = 1.6)

# Create a CallbackSet to collect all callbacks such that they can be passed to the ODE solver
callbacks = CallbackSet(summary_callback, stepsize_callback)
# TODO: GPU. The `analysis_callback` needs to be updated for GPU support
# analysis_callback, save_solution, stepsize_callback)
callbacks = CallbackSet(summary_callback, analysis_callback,
save_solution, stepsize_callback)

###############################################################################
# run the simulation
Expand Down
63 changes: 63 additions & 0 deletions examples/p4est_3d_dgsem/elixir_advection_basic_gpu.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# The same setup as tree_3d_dgsem/elixir_advection_basic.jl
# to verify GPU support and Adapt.jl support.

using OrdinaryDiffEqLowStorageRK
using Trixi

###############################################################################
# semidiscretization of the linear advection equation

advection_velocity = (0.2, -0.7, 0.5)
equations = LinearScalarAdvectionEquation3D(advection_velocity)

# Create DG solver with polynomial degree = 3 and (local) Lax-Friedrichs/Rusanov flux as surface flux
solver = DGSEM(polydeg = 3, surface_flux = flux_lax_friedrichs)

coordinates_min = (-1.0, -1.0, -1.0) # minimum coordinates (min(x), min(y), min(z))
coordinates_max = (1.0, 1.0, 1.0) # maximum coordinates (max(x), max(y), max(z))

# Create P4estMesh with 8 x 8 x 8 elements (note `refinement_level=1`)
trees_per_dimension = (4, 4, 4)
mesh = P4estMesh(trees_per_dimension, polydeg = 3,
coordinates_min = coordinates_min, coordinates_max = coordinates_max,
initial_refinement_level = 1,
periodicity = true)

# A semidiscretization collects data structures and functions for the spatial discretization
semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition_convergence_test,
solver;
boundary_conditions = boundary_condition_periodic)

###############################################################################
# ODE solvers, callbacks etc.

# Create ODE problem with time span from 0.0 to 1.0
# Change `storage_type` to, e.g., `CuArray` to actually run on GPU
tspan = (0.0, 1.0)
ode = semidiscretize(semi, tspan; real_type = nothing, storage_type = nothing)

# At the beginning of the main loop, the SummaryCallback prints a summary of the simulation setup
# and resets the timers
summary_callback = SummaryCallback()

# The AnalysisCallback allows to analyse the solution in regular intervals and prints the results
analysis_callback = AnalysisCallback(semi, interval = 100)

# The SaveSolutionCallback allows to save the solution to a file in regular intervals
save_solution = SaveSolutionCallback(interval = 100,
solution_variables = cons2prim)

# The StepsizeCallback handles the re-calculation of the maximum Δt after each time step
stepsize_callback = StepsizeCallback(cfl = 1.2)

# Create a CallbackSet to collect all callbacks such that they can be passed to the ODE solver
callbacks = CallbackSet(summary_callback, analysis_callback,
save_solution, stepsize_callback)

###############################################################################
# run the simulation

# OrdinaryDiffEq's `solve` method evolves the solution in time and executes the passed callbacks
sol = solve(ode, CarpenterKennedy2N54(williamson_condition = false);
dt = 0.05, # solve needs some value here but it will be overwritten by the stepsize_callback
ode_default_options()..., callback = callbacks);
18 changes: 17 additions & 1 deletion ext/TrixiCUDAExt.jl
Original file line number Diff line number Diff line change
@@ -1,11 +1,27 @@
# Package extension for adding CUDA-based features to Trixi.jl
module TrixiCUDAExt

import CUDA: CuArray
using CUDA: CUDA, CuArray, CuDeviceArray, KernelAdaptor, @device_override
import Trixi

function Trixi.storage_type(::Type{<:CuArray})
return CuArray
end

function Trixi.unsafe_wrap_or_alloc(::KernelAdaptor, vec, size)
return Trixi.unsafe_wrap_or_alloc(CuDeviceArray, vec, size)
end

function Trixi.unsafe_wrap_or_alloc(::Type{<:CuDeviceArray}, vec::CuDeviceArray, size)
return reshape(vec, size)
end

@static if Trixi._PREFERENCE_LOG == "log_Trixi_NaN"
@device_override Trixi.log(x::Float64) = ccall("extern __nv_log", llvmcall, Cdouble,
(Cdouble,), x)
@device_override Trixi.log(x::Float32) = ccall("extern __nv_logf", llvmcall, Cfloat,
(Cfloat,), x)
# TODO: Trixi.log(x::Float16)
end

end
3 changes: 2 additions & 1 deletion src/Trixi.jl
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,8 @@ using DiffEqCallbacks: PeriodicCallback, PeriodicCallbackAffect
using FillArrays: Ones, Zeros
using ForwardDiff: ForwardDiff
using HDF5: HDF5, h5open, attributes, create_dataset, datatype, dataspace
using KernelAbstractions: KernelAbstractions, @index, @kernel, get_backend, Backend
using KernelAbstractions: KernelAbstractions, @index, @kernel, get_backend, Backend,
allocate
using LinearMaps: LinearMap
if _PREFERENCE_LOOPVECTORIZATION
using LoopVectorization: LoopVectorization, @turbo, indices
Expand Down
Loading
Loading