-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Docker image update needed to support Kernel 6.18 #13334
Description
After upgrading the host kernel to Linux 6.18, Ollama-based inference using ipex-llm-inference-cpp-xpu docker (oneAPI / SYCL backend) on Intel Xe (Arc) GPUs fails to function correctly.
The same container and model configuration works as expected on an older kernel(6.17.12)
The issue appears to be related to GPU memory detection / allocation via Level Zero after the kernel upgrade.
Additional
1.The issue does not occur on older kernels with the same container image
2.No changes were made to Docker image, Ollama version, or model
3.This suggests a regression or behavior change in kernel 6.18 affecting:
here are error logs from ollama in docker.
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memorytime=2025-12-14T00:04:07.339+08:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server loading model"
[GIN] 2025/12/14 - 00:04:06 | 200 | 23.915µs | 127.0.0.1 | HEAD "/"
[GIN] 2025/12/14 - 00:04:06 | 200 | 125.603561ms | 127.0.0.1 | POST "/api/show"
Native API failed. Native API returns: 39 (UR_RESULT_ERROR_OUT_OF_DEVICE_MEMORY)
Exception caught at file:/home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:405, func:operator()
SYCL error: CHECK_TRY_ERROR(ctx->stream->memset( (char *)tensor->data + original_size, 0, padded_size - original_size).wait()): Exception caught in this line of code.
in function ggml_backend_sycl_buffer_init_tensor at /home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:405
/home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:115: SYCL error
It appears that GPU memory cannot be correctly queried after upgrading to Linux kernel 6.18, while the same setup works normally on kernel 6.17; upgrading to the latest Level Zero runtime and using newer Ollama versions does not resolve the issue, suggesting a possible regression or change in Intel Xe GPU memory management introduced in kernel 6.18.