Releases · ggml-org/llama.cpp

30 Nov 22:11

2ba7195

b7210

model: LFM2-VL fixes (#17577)

* Adjust to pytorch

* Add antialiasing upscale

* Increase number of patches to 1024

* Handle default marker insertion for LFM2

* Switch to flag

* Reformat

* Cuda implementation of antialias kernel

* Change placement in ops.cpp

* consistent float literals

* Pad only for LFM2

* Address PR feedback

* Rollback default marker placement changes

* Fallback to CPU implementation for antialias implementation of upscale

Assets 20

30 Nov 17:12

github-actions

b7209

7f8ef50

b7209

clip: fix nb calculation for qwen3-vl (#17594)

Assets 20

30 Nov 16:35

github-actions

b7208

3c136b2

b7208

cli: add migration warning (#17620)

Assets 20

30 Nov 14:04

github-actions

b7207

beb1f0c

b7207

common : throttle download progress output to reduce IO flush (#17427)

This change limits progress updates to approximately every 0.1% of the
file size to minimize stdio overhead.

Also fixes compiler warnings regarding __func__ in lambdas.

Signed-off-by: Adrien Gallouët <[email protected]>

Assets 20

30 Nov 12:44

github-actions

b7206

def5404

b7206

common: add LLAMA_LOG_FILE env var (#17609)

Signed-off-by: Aaron Teo <[email protected]>

Assets 20

30 Nov 02:53

github-actions

b7205

fa04659

b7205

ggml: fix: macOS build with `-DGGML_BACKEND_DL=ON` (#17581)

Assets 20

30 Nov 02:48

github-actions

b7204

5a6241f

b7204

common: update env var name (#17588)

Assets 20

30 Nov 02:28

github-actions

b7203

c7af376

b7203

CUDA: add stream-based concurrency (#16991)

* CUDA: add stream-based concurrency

* HIP: fix hipStreamWaitEvent define and nodiscard warnings

* ggml-cuda: fix fusion inside stream

* ggml-cuda: fix bug w.r.t first stream launch

* ggml-cuda: format

* ggml-cuda: improve assert message

* ggml-cuda: use lambda instead of duplicating code

* ggml-cuda: add some more comments

* ggml-cuda: add more detailed comments about concurrency

* ggml-cuda: rename + remove unused var

* ggml-cuda: fix condition for stream launch

* ggml-cuda: address review comments, add destructor

* common.cuh: add is_valid for concurrent events

* common.cuh: make comment better

* update comment

Co-authored-by: Johannes Gäßler <[email protected]>

* update comment

Co-authored-by: Johannes Gäßler <[email protected]>

* common.cuh: fix lower_bound condition + remove join_node data from write_ranges

* ggml-cuda: fix overlap condition + shadowing parameter

---------

Co-authored-by: Carl Philipp Klemm <[email protected]>
Co-authored-by: Johannes Gäßler <[email protected]>

Assets 20

30 Nov 01:37

github-actions

b7202

00425e2

b7202

   cuda : add error checking for cudaMemcpyAsync in argsort (#17599)

* cuda : add error checking for cudaMemcpyAsync in argsort (#12836)

* fix indentation

Assets 20

30 Nov 01:25

github-actions

b7201

385c3da

b7201

vulkan : fix FA mask load with bounds check (coopmat2) (#17606)

Assets 20

Releases: ggml-org/llama.cpp

b7210

Uh oh!

b7209

Uh oh!

b7208

Uh oh!

b7207

Uh oh!

b7206

Uh oh!

b7205

Uh oh!

b7204

Uh oh!

b7203

Uh oh!

b7202

Uh oh!

b7201

Uh oh!