Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b7210
model: LFM2-VL fixes (#17577) * Adjust to pytorch * Add antialiasing upscale * Increase number of patches to 1024 * Handle default marker insertion for LFM2 * Switch to flag * Reformat * Cuda implementation of antialias kernel * Change placement in ops.cpp * consistent float literals * Pad only for LFM2 * Address PR feedback * Rollback default marker placement changes * Fallback to CPU implementation for antialias implementation of upscale
b7209
clip: fix nb calculation for qwen3-vl (#17594)
b7208
cli: add migration warning (#17620)
b7207
common : throttle download progress output to reduce IO flush (#17427) This change limits progress updates to approximately every 0.1% of the file size to minimize stdio overhead. Also fixes compiler warnings regarding __func__ in lambdas. Signed-off-by: Adrien Gallouët <[email protected]>
b7206
common: add LLAMA_LOG_FILE env var (#17609) Signed-off-by: Aaron Teo <[email protected]>
b7205
ggml: fix: macOS build with `-DGGML_BACKEND_DL=ON` (#17581)
b7204
common: update env var name (#17588)
b7203
CUDA: add stream-based concurrency (#16991) * CUDA: add stream-based concurrency * HIP: fix hipStreamWaitEvent define and nodiscard warnings * ggml-cuda: fix fusion inside stream * ggml-cuda: fix bug w.r.t first stream launch * ggml-cuda: format * ggml-cuda: improve assert message * ggml-cuda: use lambda instead of duplicating code * ggml-cuda: add some more comments * ggml-cuda: add more detailed comments about concurrency * ggml-cuda: rename + remove unused var * ggml-cuda: fix condition for stream launch * ggml-cuda: address review comments, add destructor * common.cuh: add is_valid for concurrent events * common.cuh: make comment better * update comment Co-authored-by: Johannes Gäßler <[email protected]> * update comment Co-authored-by: Johannes Gäßler <[email protected]> * common.cuh: fix lower_bound condition + remove join_node data from write_ranges * ggml-cuda: fix overlap condition + shadowing parameter --------- Co-authored-by: Carl Philipp Klemm <[email protected]> Co-authored-by: Johannes Gäßler <[email protected]>
b7202
cuda : add error checking for cudaMemcpyAsync in argsort (#17599) * cuda : add error checking for cudaMemcpyAsync in argsort (#12836) * fix indentation
b7201
vulkan : fix FA mask load with bounds check (coopmat2) (#17606)