Skip to content

Add compiler stress test for broadcasts#2471

Draft
petebachant wants to merge 27 commits intomainfrom
pb/stress-tests
Draft

Add compiler stress test for broadcasts#2471
petebachant wants to merge 27 commits intomainfrom
pb/stress-tests

Conversation

@petebachant
Copy link
Copy Markdown
Member

@petebachant petebachant commented Mar 23, 2026

This adds a script that runs a bunch of different broadcast types and complexities with CUDA to see where inlining or overall compilation fails.

TODO

  • Add nested expressions?
  • Ensure we test something similar to Haakon's recent ClimaAtmos failure.
  • Refine results table output and automatically incorporate in docs?

Sample result

Test Type Time (μs) Baseline (μs) Δ Time Primary kernel Regs Base Regs Local B Base Local B Shared B Local memory Local-memory kernels Soft fail Soft-fail signals Expression
arithmetic_depth_1 arithmetic 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 no -
show
op(x) = (x * 1.0)
op.(f)
arithmetic_depth_24 arithmetic 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 no -
show
op(x) = ((((((((((((((((((((((((x * 1.0) / 2.0) - 3.0) + 4.0) * 5.0) / 6.0) - 7.0) + 8.0) * 9.0) / 10.0) - 11.0) + 12.0) * 13.0) / 14.0) - 15.0) + 16.0) * 17.0) / 18.0) - 19.0) + 20.0) * 21.0) / 22.0) - 23.0) + 24.0)
op.(f)
multiarg_2_args multiarg 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 no -
show
op(f1, f2) = (f1) / (f2 + 1.0)
op.(f1, f2)
multiarg_16_args multiarg 22.000 - - #knl_copyto! 48 - 32 - 0 local_memory_used 4/6 no -
show
op(f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15, f16) = (f1 + f2 + f3 + f4 + f5 + f6 + f7 + f8 + f9 + f10 + f11 + f12 + f13 + f14 + f15) / (f16 + 1.0)
op.(f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15, f16)
functions_log_depth_1 functions 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 no -
show
op(x) = log(abs(x + 0.5) + 1.5)
op.(f)
functions_log_depth_6 functions 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 no -
show
op(x) = log(abs(log(abs(log(abs(log(abs(log(abs(log(abs(x + 0.5) + 1.5)) + 1.5)) + 1.5)) + 1.5)) + 1.5)) + 1.5)
op.(f)
functions_sqrt_depth_1 functions 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 no -
show
op(x) = sqrt(abs(x + 0.5) + 1.5)
op.(f)
functions_sqrt_depth_6 functions 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 no -
show
op(x) = sqrt(abs(sqrt(abs(sqrt(abs(sqrt(abs(sqrt(abs(sqrt(abs(x + 0.5) + 1.5)) + 1.5)) + 1.5)) + 1.5)) + 1.5)) + 1.5)
op.(f)
functions_mixed_depth_1 functions 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 no -
show
op(x) = log(abs(x + 0.5) + 1.5)
op.(f)
functions_mixed_depth_4 functions 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 no -
show
op(x) = log(abs(sqrt(abs(abs(log(abs(x + 0.5) + 1.5))) + 1.5)) + 1.5)
op.(f)
nested_calls_depth_1 nested_calls 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 no -
show
helper_1(x) = (x + 1.0)
op(x) = helper_1(x)
op.(f)
nested_calls_depth_24 nested_calls 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 no -
show
helper_1(x) = (x + 1.0)
helper_2(x) = (helper_1(x) * 3.0)
helper_3(x) = (helper_2(x) / 4.0)
helper_4(x) = (helper_3(x) - 4.0)
helper_5(x) = (helper_4(x) + 5.0)
helper_6(x) = (helper_5(x) * 7.0)
helper_7(x) = (helper_6(x) / 8.0)
helper_8(x) = (helper_7(x) - 8.0)
helper_9(x) = (helper_8(x) + 9.0)
helper_10(x) = (helper_9(x) * 11.0)
helper_11(x) = (helper_10(x) / 12.0)
helper_12(x) = (helper_11(x) - 12.0)
helper_13(x) = (helper_12(x) + 13.0)
helper_14(x) = (helper_13(x) * 15.0)
helper_15(x) = (helper_14(x) / 16.0)
helper_16(x) = (helper_15(x) - 16.0)
helper_17(x) = (helper_16(x) + 17.0)
helper_18(x) = (helper_17(x) * 19.0)
helper_19(x) = (helper_18(x) / 20.0)
helper_20(x) = (helper_19(x) - 20.0)
helper_21(x) = (helper_20(x) + 21.0)
helper_22(x) = (helper_21(x) * 23.0)
helper_23(x) = (helper_22(x) / 24.0)
helper_24(x) = (helper_23(x) - 24.0)
op(x) = helper_24(x)
op.(f)
subexpression_args_bare_namedtuple subexpression_args expected failure - - - - - - - - - - - -
show
@. loglambda = my_get_distribution_loglambda(scheme, max(zero(rhoq_ice), rhoq_ice), max(zero(rhon_ice), rhon_ice), ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / rhoq_ice), ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / rhob_rim))
subexpression_args_closure_wrapped subexpression_args 72.000 - - #knl_copyto! 40 - 32 - 0 local_memory_used 4/6 no -
show
fn_with_scheme = let s = scheme
    (q, n, rqi, rqb) -> log(abs(s.c1 * q + s.c2 * n) + 1) + s.c3 * (rqi - rqb)
end
@. loglambda = fn_with_scheme(max(zero(rhoq_ice), rhoq_ice), max(zero(rhon_ice), rhon_ice), ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / rhoq_ice), ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / rhob_rim))
subexpression_args_precomputed subexpression_args 36.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 6/8 no -
show
@. rhoq_ice_pos = max(zero(rhoq_ice), rhoq_ice)
@. rhon_ice_pos = max(zero(rhon_ice), rhon_ice)
@. rim_over_ice = ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / rhoq_ice)
@. rim_over_bulk = ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / rhob_rim)
fn_with_scheme = let s = scheme
    (q, n, rqi, rqb) -> log(abs(s.c1 * q + s.c2 * n) + 1) + s.c3 * (rqi - rqb)
end
@. loglambda = fn_with_scheme(rhoq_ice_pos, rhon_ice_pos, rim_over_ice, rim_over_bulk)
projection_1x_chained projection 10.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 no -
show
@. Geometry.project(Geometry.Covariant12Axis(), v)
projection_8x_chained projection 20.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 no -
show
@. Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v)
projection_12x_chained projection 26.000 - - dss__FILE_ClimaCore_jl_src_Topologies_dss_jl_L702 32 - 32 - 0 local_memory_used 4/6 no -
show
@. Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v) .+ Geometry.project(Geometry.Covariant12Axis(), v)
div_1_ops divergence 9.000 - - materialize_FILE_ClimaCore_jl_src_Operators_common_jl_L54 32 - 32 - 4096 local_memory_used 4/6 no -
show
div_op.(v .* 1.0)
div_8_ops divergence 11.000 - - #copyto_spectral_kernel! 54 - 32 - 32768 local_memory_used 4/6 no -
show
div_op.(v .* 1.0) .+ div_op.(v .* 2.0) .+ div_op.(v .* 3.0) .+ div_op.(v .* 4.0) .+ div_op.(v .* 5.0) .+ div_op.(v .* 6.0) .+ div_op.(v .* 7.0) .+ div_op.(v .* 8.0)
div_12_ops divergence 15.000 - - #copyto_spectral_kernel! 54 - 32 - 49152 local_memory_used 4/6 no -
show
div_op.(v .* 1.0) .+ div_op.(v .* 2.0) .+ div_op.(v .* 3.0) .+ div_op.(v .* 4.0) .+ div_op.(v .* 5.0) .+ div_op.(v .* 6.0) .+ div_op.(v .* 7.0) .+ div_op.(v .* 8.0) .+ div_op.(v .* 9.0) .+ div_op.(v .* 10.0) .+ div_op.(v .* 11.0) .+ div_op.(v .* 12.0)
div_14_ops divergence FAILED - - - - - - - - - - - -
show
div_op.(v .* 1.0) .+ div_op.(v .* 2.0) .+ div_op.(v .* 3.0) .+ div_op.(v .* 4.0) .+ div_op.(v .* 5.0) .+ div_op.(v .* 6.0) .+ div_op.(v .* 7.0) .+ div_op.(v .* 8.0) .+ div_op.(v .* 9.0) .+ div_op.(v .* 10.0) .+ div_op.(v .* 11.0) .+ div_op.(v .* 12.0) .+ div_op.(v .* 13.0) .+ div_op.(v .* 14.0)
div_16_ops divergence FAILED - - - - - - - - - - - -
show
div_op.(v .* 1.0) .+ div_op.(v .* 2.0) .+ div_op.(v .* 3.0) .+ div_op.(v .* 4.0) .+ div_op.(v .* 5.0) .+ div_op.(v .* 6.0) .+ div_op.(v .* 7.0) .+ div_op.(v .* 8.0) .+ div_op.(v .* 9.0) .+ div_op.(v .* 10.0) .+ div_op.(v .* 11.0) .+ div_op.(v .* 12.0) .+ div_op.(v .* 13.0) .+ div_op.(v .* 14.0) .+ div_op.(v .* 15.0) .+ div_op.(v .* 16.0)
curl_1_ops curl 9.000 - - materialize_FILE_ClimaCore_jl_src_Operators_common_jl_L54 32 - 32 - 4096 local_memory_used 4/6 no -
show
curl_op.(v .* 1.0)
curl_8_ops curl 12.000 - - #copyto_spectral_kernel! 62 - 32 - 32768 local_memory_used 4/6 no -
show
curl_op.(v .* 1.0) .+ curl_op.(v .* 2.0) .+ curl_op.(v .* 3.0) .+ curl_op.(v .* 4.0) .+ curl_op.(v .* 5.0) .+ curl_op.(v .* 6.0) .+ curl_op.(v .* 7.0) .+ curl_op.(v .* 8.0)
curl_12_ops curl 13.000 - - #copyto_spectral_kernel! 72 - 32 - 49152 local_memory_used 4/6 no -
show
curl_op.(v .* 1.0) .+ curl_op.(v .* 2.0) .+ curl_op.(v .* 3.0) .+ curl_op.(v .* 4.0) .+ curl_op.(v .* 5.0) .+ curl_op.(v .* 6.0) .+ curl_op.(v .* 7.0) .+ curl_op.(v .* 8.0) .+ curl_op.(v .* 9.0) .+ curl_op.(v .* 10.0) .+ curl_op.(v .* 11.0) .+ curl_op.(v .* 12.0)
curl_14_ops curl FAILED - - - - - - - - - - - -
show
curl_op.(v .* 1.0) .+ curl_op.(v .* 2.0) .+ curl_op.(v .* 3.0) .+ curl_op.(v .* 4.0) .+ curl_op.(v .* 5.0) .+ curl_op.(v .* 6.0) .+ curl_op.(v .* 7.0) .+ curl_op.(v .* 8.0) .+ curl_op.(v .* 9.0) .+ curl_op.(v .* 10.0) .+ curl_op.(v .* 11.0) .+ curl_op.(v .* 12.0) .+ curl_op.(v .* 13.0) .+ curl_op.(v .* 14.0)
curl_16_ops curl FAILED - - - - - - - - - - - -
show
curl_op.(v .* 1.0) .+ curl_op.(v .* 2.0) .+ curl_op.(v .* 3.0) .+ curl_op.(v .* 4.0) .+ curl_op.(v .* 5.0) .+ curl_op.(v .* 6.0) .+ curl_op.(v .* 7.0) .+ curl_op.(v .* 8.0) .+ curl_op.(v .* 9.0) .+ curl_op.(v .* 10.0) .+ curl_op.(v .* 11.0) .+ curl_op.(v .* 12.0) .+ curl_op.(v .* 13.0) .+ curl_op.(v .* 14.0) .+ curl_op.(v .* 15.0) .+ curl_op.(v .* 16.0)
interp_c2f_1_ops interpolate 9.000 - - fill__FILE_ClimaCore_jl_src_DataLayouts_fill_jl_L2 32 - 32 - 0 local_memory_used 1/2 no -
show
interp.(ᶜf .* 1.0)
interp_c2f_8_ops interpolate 14.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 40 - 0 - 0 no_local_memory 1/2 no -
show
interp.(ᶜf .* 1.0) .+ interp.(ᶜf .* 2.0) .+ interp.(ᶜf .* 3.0) .+ interp.(ᶜf .* 4.0) .+ interp.(ᶜf .* 5.0) .+ interp.(ᶜf .* 6.0) .+ interp.(ᶜf .* 7.0) .+ interp.(ᶜf .* 8.0)
interp_c2f_12_ops interpolate 17.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 48 - 0 - 0 no_local_memory 1/2 no -
show
interp.(ᶜf .* 1.0) .+ interp.(ᶜf .* 2.0) .+ interp.(ᶜf .* 3.0) .+ interp.(ᶜf .* 4.0) .+ interp.(ᶜf .* 5.0) .+ interp.(ᶜf .* 6.0) .+ interp.(ᶜf .* 7.0) .+ interp.(ᶜf .* 8.0) .+ interp.(ᶜf .* 9.0) .+ interp.(ᶜf .* 10.0) .+ interp.(ᶜf .* 11.0) .+ interp.(ᶜf .* 12.0)
interp_c2f_14_ops interpolate 19.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 56 - 0 - 0 no_local_memory 1/2 no -
show
interp.(ᶜf .* 1.0) .+ interp.(ᶜf .* 2.0) .+ interp.(ᶜf .* 3.0) .+ interp.(ᶜf .* 4.0) .+ interp.(ᶜf .* 5.0) .+ interp.(ᶜf .* 6.0) .+ interp.(ᶜf .* 7.0) .+ interp.(ᶜf .* 8.0) .+ interp.(ᶜf .* 9.0) .+ interp.(ᶜf .* 10.0) .+ interp.(ᶜf .* 11.0) .+ interp.(ᶜf .* 12.0) .+ interp.(ᶜf .* 13.0) .+ interp.(ᶜf .* 14.0)
interp_c2f_16_ops interpolate 19.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 64 - 0 - 0 no_local_memory 1/2 no -
show
interp.(ᶜf .* 1.0) .+ interp.(ᶜf .* 2.0) .+ interp.(ᶜf .* 3.0) .+ interp.(ᶜf .* 4.0) .+ interp.(ᶜf .* 5.0) .+ interp.(ᶜf .* 6.0) .+ interp.(ᶜf .* 7.0) .+ interp.(ᶜf .* 8.0) .+ interp.(ᶜf .* 9.0) .+ interp.(ᶜf .* 10.0) .+ interp.(ᶜf .* 11.0) .+ interp.(ᶜf .* 12.0) .+ interp.(ᶜf .* 13.0) .+ interp.(ᶜf .* 14.0) .+ interp.(ᶜf .* 15.0) .+ interp.(ᶜf .* 16.0)
weighted_interp_c2f_1_ops weighted_interpolate 9.000 - - copy_FILE_ClimaCore_jl_src_Operators_common_jl_L49 38 - 0 - 0 no_local_memory 1/2 no -
show
winterp.(ᶜw, ᶜf .* 1.0)
weighted_interp_c2f_8_ops weighted_interpolate 17.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 64 - 0 - 0 no_local_memory 1/2 no -
show
winterp.(ᶜw, ᶜf .* 1.0) .+ winterp.(ᶜw, ᶜf .* 2.0) .+ winterp.(ᶜw, ᶜf .* 3.0) .+ winterp.(ᶜw, ᶜf .* 4.0) .+ winterp.(ᶜw, ᶜf .* 5.0) .+ winterp.(ᶜw, ᶜf .* 6.0) .+ winterp.(ᶜw, ᶜf .* 7.0) .+ winterp.(ᶜw, ᶜf .* 8.0)
weighted_interp_c2f_12_ops weighted_interpolate 22.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 80 - 0 - 0 no_local_memory 1/2 yes register_cliff(prev=64, cur=80, jump=16, ratio=1.25)
show
winterp.(ᶜw, ᶜf .* 1.0) .+ winterp.(ᶜw, ᶜf .* 2.0) .+ winterp.(ᶜw, ᶜf .* 3.0) .+ winterp.(ᶜw, ᶜf .* 4.0) .+ winterp.(ᶜw, ᶜf .* 5.0) .+ winterp.(ᶜw, ᶜf .* 6.0) .+ winterp.(ᶜw, ᶜf .* 7.0) .+ winterp.(ᶜw, ᶜf .* 8.0) .+ winterp.(ᶜw, ᶜf .* 9.0) .+ winterp.(ᶜw, ᶜf .* 10.0) .+ winterp.(ᶜw, ᶜf .* 11.0) .+ winterp.(ᶜw, ᶜf .* 12.0)
weighted_interp_c2f_14_ops weighted_interpolate 24.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 90 - 0 - 0 no_local_memory 1/2 no -
show
winterp.(ᶜw, ᶜf .* 1.0) .+ winterp.(ᶜw, ᶜf .* 2.0) .+ winterp.(ᶜw, ᶜf .* 3.0) .+ winterp.(ᶜw, ᶜf .* 4.0) .+ winterp.(ᶜw, ᶜf .* 5.0) .+ winterp.(ᶜw, ᶜf .* 6.0) .+ winterp.(ᶜw, ᶜf .* 7.0) .+ winterp.(ᶜw, ᶜf .* 8.0) .+ winterp.(ᶜw, ᶜf .* 9.0) .+ winterp.(ᶜw, ᶜf .* 10.0) .+ winterp.(ᶜw, ᶜf .* 11.0) .+ winterp.(ᶜw, ᶜf .* 12.0) .+ winterp.(ᶜw, ᶜf .* 13.0) .+ winterp.(ᶜw, ᶜf .* 14.0)
weighted_interp_c2f_16_ops weighted_interpolate 26.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 96 - 0 - 0 no_local_memory 1/2 no -
show
winterp.(ᶜw, ᶜf .* 1.0) .+ winterp.(ᶜw, ᶜf .* 2.0) .+ winterp.(ᶜw, ᶜf .* 3.0) .+ winterp.(ᶜw, ᶜf .* 4.0) .+ winterp.(ᶜw, ᶜf .* 5.0) .+ winterp.(ᶜw, ᶜf .* 6.0) .+ winterp.(ᶜw, ᶜf .* 7.0) .+ winterp.(ᶜw, ᶜf .* 8.0) .+ winterp.(ᶜw, ᶜf .* 9.0) .+ winterp.(ᶜw, ᶜf .* 10.0) .+ winterp.(ᶜw, ᶜf .* 11.0) .+ winterp.(ᶜw, ᶜf .* 12.0) .+ winterp.(ᶜw, ᶜf .* 13.0) .+ winterp.(ᶜw, ᶜf .* 14.0) .+ winterp.(ᶜw, ᶜf .* 15.0) .+ winterp.(ᶜw, ᶜf .* 16.0)
upwinding_3rdorder_1_ops upwinding 9.000 - - copy_FILE_ClimaCore_jl_src_Operators_common_jl_L49 42 - 0 - 0 no_local_memory 2/3 no -
show
upwind.(ᶠv, ᶜf .* 1.0)
upwinding_3rdorder_8_ops upwinding 17.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 96 - 0 - 0 no_local_memory 2/3 yes register_cliff(prev=42, cur=96, jump=54, ratio=2.29)
show
upwind.(ᶠv, ᶜf .* 1.0) .+ upwind.(ᶠv, ᶜf .* 2.0) .+ upwind.(ᶠv, ᶜf .* 3.0) .+ upwind.(ᶠv, ᶜf .* 4.0) .+ upwind.(ᶠv, ᶜf .* 5.0) .+ upwind.(ᶠv, ᶜf .* 6.0) .+ upwind.(ᶠv, ᶜf .* 7.0) .+ upwind.(ᶠv, ᶜf .* 8.0)
upwinding_3rdorder_12_ops upwinding 22.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 128 - 0 - 0 no_local_memory 2/3 yes register_cliff(prev=96, cur=128, jump=32, ratio=1.33)
show
upwind.(ᶠv, ᶜf .* 1.0) .+ upwind.(ᶠv, ᶜf .* 2.0) .+ upwind.(ᶠv, ᶜf .* 3.0) .+ upwind.(ᶠv, ᶜf .* 4.0) .+ upwind.(ᶠv, ᶜf .* 5.0) .+ upwind.(ᶠv, ᶜf .* 6.0) .+ upwind.(ᶠv, ᶜf .* 7.0) .+ upwind.(ᶠv, ᶜf .* 8.0) .+ upwind.(ᶠv, ᶜf .* 9.0) .+ upwind.(ᶠv, ᶜf .* 10.0) .+ upwind.(ᶠv, ᶜf .* 11.0) .+ upwind.(ᶠv, ᶜf .* 12.0)
upwinding_3rdorder_14_ops upwinding 24.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 146 - 0 - 0 no_local_memory 2/3 no -
show
upwind.(ᶠv, ᶜf .* 1.0) .+ upwind.(ᶠv, ᶜf .* 2.0) .+ upwind.(ᶠv, ᶜf .* 3.0) .+ upwind.(ᶠv, ᶜf .* 4.0) .+ upwind.(ᶠv, ᶜf .* 5.0) .+ upwind.(ᶠv, ᶜf .* 6.0) .+ upwind.(ᶠv, ᶜf .* 7.0) .+ upwind.(ᶠv, ᶜf .* 8.0) .+ upwind.(ᶠv, ᶜf .* 9.0) .+ upwind.(ᶠv, ᶜf .* 10.0) .+ upwind.(ᶠv, ᶜf .* 11.0) .+ upwind.(ᶠv, ᶜf .* 12.0) .+ upwind.(ᶠv, ᶜf .* 13.0) .+ upwind.(ᶠv, ᶜf .* 14.0)
upwinding_3rdorder_16_ops upwinding 26.000 - - copy_FILE_ClimaCore_jl_src_Fields_broadcast_jl_L119 162 - 0 - 0 no_local_memory 2/3 no -
show
upwind.(ᶠv, ᶜf .* 1.0) .+ upwind.(ᶠv, ᶜf .* 2.0) .+ upwind.(ᶠv, ᶜf .* 3.0) .+ upwind.(ᶠv, ᶜf .* 4.0) .+ upwind.(ᶠv, ᶜf .* 5.0) .+ upwind.(ᶠv, ᶜf .* 6.0) .+ upwind.(ᶠv, ᶜf .* 7.0) .+ upwind.(ᶠv, ᶜf .* 8.0) .+ upwind.(ᶠv, ᶜf .* 9.0) .+ upwind.(ᶠv, ᶜf .* 10.0) .+ upwind.(ᶠv, ᶜf .* 11.0) .+ upwind.(ᶠv, ᶜf .* 12.0) .+ upwind.(ᶠv, ᶜf .* 13.0) .+ upwind.(ᶠv, ᶜf .* 14.0) .+ upwind.(ᶠv, ᶜf .* 15.0) .+ upwind.(ᶠv, ᶜf .* 16.0)
climaatmos_column_1x climaatmos 15.000 - - kernel_call__FILE__tmp_jl_ZKVch8fQxS_jl_L80 40 - 0 - 0 no_local_memory 3/4 no -
show
@. tendency = fn_with_scheme(winterp(ᶜw, max(zero(rhoq_ice), rhoq_ice + 1.0 / 10)), winterp(ᶜn, max(zero(rhon_ice), rhon_ice + 1.0 / 20)), upwind(ᶠv, ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 1.0 / 50))), upwind(ᶠv, ifelse(iszero(rhon_ice), zero(rhon_ice), rhob_rim / (rhon_ice + 1.0 / 40))), interp(max(zero(rhoq_ice), rhoq_ice + 1.0 / 30)), interp(ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 1.0 / 60))))
climaatmos_column_6x climaatmos 66.000 - - kernel_call__FILE__tmp_jl_XkzAXO3Sjs_jl_L80 56 - 0 - 0 no_local_memory 3/4 no -
show
@. tendency = fn_with_scheme(winterp(ᶜw, max(zero(rhoq_ice), rhoq_ice + 1.0 / 10)), winterp(ᶜn, max(zero(rhon_ice), rhon_ice + 1.0 / 20)), upwind(ᶠv, ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 1.0 / 50))), upwind(ᶠv, ifelse(iszero(rhon_ice), zero(rhon_ice), rhob_rim / (rhon_ice + 1.0 / 40))), interp(max(zero(rhoq_ice), rhoq_ice + 1.0 / 30)), interp(ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 1.0 / 60)))) + fn_with_scheme(winterp(ᶜw, max(zero(rhoq_ice), rhoq_ice + 2.0 / 10)), winterp(ᶜn, max(zero(rhon_ice), rhon_ice + 2.0 / 20)), upwind(ᶠv, ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 2.0 / 50))), upwind(ᶠv, ifelse(iszero(rhon_ice), zero(rhon_ice), rhob_rim / (rhon_ice + 2.0 / 40))), interp(max(zero(rhoq_ice), rhoq_ice + 2.0 / 30)), interp(ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 2.0 / 60)))) + fn_with_scheme(winterp(ᶜw, max(zero(rhoq_ice), rhoq_ice + 3.0 / 10)), winterp(ᶜn, max(zero(rhon_ice), rhon_ice + 3.0 / 20)), upwind(ᶠv, ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 3.0 / 50))), upwind(ᶠv, ifelse(iszero(rhon_ice), zero(rhon_ice), rhob_rim / (rhon_ice + 3.0 / 40))), interp(max(zero(rhoq_ice), rhoq_ice + 3.0 / 30)), interp(ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 3.0 / 60)))) + fn_with_scheme(winterp(ᶜw, max(zero(rhoq_ice), rhoq_ice + 4.0 / 10)), winterp(ᶜn, max(zero(rhon_ice), rhon_ice + 4.0 / 20)), upwind(ᶠv, ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 4.0 / 50))), upwind(ᶠv, ifelse(iszero(rhon_ice), zero(rhon_ice), rhob_rim / (rhon_ice + 4.0 / 40))), interp(max(zero(rhoq_ice), rhoq_ice + 4.0 / 30)), interp(ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 4.0 / 60)))) + fn_with_scheme(winterp(ᶜw, max(zero(rhoq_ice), rhoq_ice + 5.0 / 10)), winterp(ᶜn, max(zero(rhon_ice), rhon_ice + 5.0 / 20)), upwind(ᶠv, ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 5.0 / 50))), upwind(ᶠv, ifelse(iszero(rhon_ice), zero(rhon_ice), rhob_rim / (rhon_ice + 5.0 / 40))), interp(max(zero(rhoq_ice), rhoq_ice + 5.0 / 30)), interp(ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 5.0 / 60)))) + fn_with_scheme(winterp(ᶜw, max(zero(rhoq_ice), rhoq_ice + 6.0 / 10)), winterp(ᶜn, max(zero(rhon_ice), rhon_ice + 6.0 / 20)), upwind(ᶠv, ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 6.0 / 50))), upwind(ᶠv, ifelse(iszero(rhon_ice), zero(rhon_ice), rhob_rim / (rhon_ice + 6.0 / 40))), interp(max(zero(rhoq_ice), rhoq_ice + 6.0 / 30)), interp(ifelse(iszero(rhoq_ice), zero(rhoq_ice), rhoq_rim / (rhoq_ice + 6.0 / 60))))

@petebachant petebachant marked this pull request as draft March 23, 2026 21:34
@petebachant
Copy link
Copy Markdown
Member Author

@dennisYatunin @imreddyTeja if you guys get a chance can you check if this is headed in the right direction?

@imreddyTeja
Copy link
Copy Markdown
Member

@dennisYatunin @imreddyTeja if you guys get a chance can you check if this is headed in the right direction?

I don't know exactly what the goals of these tests are, but the test cases do seem good to me. I don't think any similar tests exist at the moment. Another case that might be good to test is everything you have done, but with more complex types. For example, many of the operations can work with nested named tuple

Helper function to create a vertical column (finite difference) space setup code string.
Produces both center and face spaces needed for C2F/F2C operators.
"""
function create_column_space()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to use any of the CommonSpaces functionality instead here?

warm = join(["curl_op.(v .* $(i).0)" for i in 1:n], " .+ ")
bench = join(["\$curl_op.(\$v .* $(i).0)" for i in 1:n], " .+ ")

test_impl = create_spectral_space() * """
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to skip/remove the time spent creating the space and filling the input field(s) in the benchmarks? In the sample result, knl_fill! is the primary kernel for a good chunk of the tests

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that time is already skipped, and knl_fill! still ends up as the primary kernel for the simpler operations. I need to dig in further to understand how that works though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants