Skip to content

Releases: ggml-org/llama.cpp

b9084

09 May 03:27
6600172

Choose a tag to compare

hexagon: add HTP kernel for GGML_OP_GATED_DELTA_NET (#22837)

Implement the Gated Delta Net recurrence on HVX with:

  • 4-row fused kernels for PP (prompt processing) path
  • 8-row fused kernels for TG (token generation) path, reducing
    K/Q/gate vector reload overhead by 2x
  • Separate PP/TG thread functions for I-cache isolation
  • VTCM state scratchpad with DMA in/out for TG single-cycle access
  • Vectorized gate exp via hvx_exp_f32

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

b9082

08 May 22:21
b46812d

Choose a tag to compare

b9081

08 May 22:18
4995604

Choose a tag to compare

b9080

08 May 21:05
9f5f0e6

Choose a tag to compare

b9079

08 May 20:23
f9cd456

Choose a tag to compare

b9077

08 May 19:29
29debb3

Choose a tag to compare

b9076

08 May 18:53
9dcf835

Choose a tag to compare

b9075

08 May 17:37
58e68df

Choose a tag to compare

cuda: fuse snake activation (mul, sin, sqr, mul, add) (#22667)

  • cuda: fuse snake activation (mul, sin, sqr, mul, add)

Add ggml_cuda_op_snake_fused with F32 / F16 / BF16 templates. The
matcher recognizes the naive 5 op decomposition emitted by audio
decoders (BigVGAN, Vocos) for snake activation
y = x + sin(a*x)^2 * inv_b and rewrites it to a single elementwise
kernel.

Add test_snake_fuse comparing CPU naive vs CUDA fused across
F32 / F16 / BF16.

  • cuda: address review feedback from @am17an

Use ggml_cuda_cast for F32/F16/BF16 conversions and rename
kernel_snake to snake_kernel to match upstream conventions.

  • cuda: snake fusion fastdiv on T_len, Suggested-by: @am17an

  • Update tests/test-backend-ops.cpp

Co-authored-by: Aman Gupta amangupta052@gmail.com

  • cuda: snake fusion check add->type matches x->type

Address review feedback from @am17an

  • cuda: snake fusion check add->type matches x->type

Moved for readability (equivalent)
Address review feedback from @am17an


Co-authored-by: Aman Gupta amangupta052@gmail.com

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

b9073

08 May 16:04
a8fd165

Choose a tag to compare

b9072

08 May 13:26
6d57a49

Choose a tag to compare