This happens to me as well on both llama.cpp and clpeak.
$ NEOReadDebugKeys=1 PrintDebugMessages=1 LogWaitingForCompletion=1 EventsDebugEnable=1 PrintKmdTimes=1 LogZEInfo=1 clpeak
WARNING: Failed to request OCL Turbo Boost
computeUnitsUsedForScratch: 4096
hwInfo: {512, 4096}: (16, 1, 32)
Platform: Intel(R) OpenCL Graphics
Device: Intel(R) Arc(TM) A770 Graphics
Driver version : 24.09.28717.17 (Linux x64)
Compute units : 512
Clock frequency : 2400 MHz
DeviceBinaryFormat::zebin : Unhandled SHT_NOTE section : .note.intelgt.metrics currently supports only : .note.intelgt.compat.
DeviceBinaryFormat::zebin::.ze_info : Minor version : 40 is newer than available in decoder : 39 - some features may be skipped
Global memory bandwidth (GBPS)
Waiting for task count 0 at location 0x7fb1fbd65000 with timeout 0. Current value: 0
Waiting completed. Current value: 0
Waiting for task count 1 at location 0x7fb1fbd5f000 with timeout 0. Current value: 0
Waiting completed. Current value: 1
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0 totalPrivateMemorySize: 0
perHwThreadScratchSize: 0 totalScratchSize: 0
perHwThreadPrivateScratchSize: 0 totalPrivateScratchSize: 0
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0 totalPrivateMemorySize: 0
perHwThreadScratchSize: 0 totalScratchSize: 0
perHwThreadPrivateScratchSize: 0 totalPrivateScratchSize: 0
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0 totalPrivateMemorySize: 0
perHwThreadScratchSize: 0 totalScratchSize: 0
perHwThreadPrivateScratchSize: 0 totalPrivateScratchSize: 0
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0 totalPrivateMemorySize: 0
perHwThreadScratchSize: 0 totalScratchSize: 0
perHwThreadPrivateScratchSize: 0 totalPrivateScratchSize: 0
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0 totalPrivateMemorySize: 0
perHwThreadScratchSize: 0 totalScratchSize: 0
perHwThreadPrivateScratchSize: 0 totalPrivateScratchSize: 0
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0 totalPrivateMemorySize: 0
perHwThreadScratchSize: 0 totalScratchSize: 0
perHwThreadPrivateScratchSize: 0 totalPrivateScratchSize: 0
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0 totalPrivateMemorySize: 0
perHwThreadScratchSize: 0 totalScratchSize: 0
perHwThreadPrivateScratchSize: 0 totalPrivateScratchSize: 0
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0 totalPrivateMemorySize: 0
perHwThreadScratchSize: 0 totalScratchSize: 0
perHwThreadPrivateScratchSize: 0 totalPrivateScratchSize: 0
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0 totalPrivateMemorySize: 0
perHwThreadScratchSize: 0 totalScratchSize: 0
perHwThreadPrivateScratchSize: 0 totalPrivateScratchSize: 0
computeUnits for each thread: 4096
perHwThreadPrivateMemorySize: 0 totalPrivateMemorySize: 0
perHwThreadScratchSize: 0 totalScratchSize: 0
perHwThreadPrivateScratchSize: 0 totalPrivateScratchSize: 0
float : DIM:1 GWS:(33550336, 1, 1) ELWS:(256, 1, 1) Offset:(0, 0, 0) AGWS:(33550336, 1, 1) LWS:(256, 1, 1) TWGS:(131056, 1, 1) NWGS:(131056, 1, 1) SWGS:(0, 0, 0)
devMode = 3, taskMode = 3.
devMode = 3, taskMode = 3.
preemption = 3.
DIM:1 GWS:(33550336, 1, 1) ELWS:(256, 1, 1) Offset:(0, 0, 0) AGWS:(33550336, 1, 1) LWS:(256, 1, 1) TWGS:(131056, 1, 1) NWGS:(131056, 1, 1) SWGS:(0, 0, 0)
devMode = 3, taskMode = 3.
devMode = 3, taskMode = 3.
preemption = 3.
Waiting for task count 2 at location 0x7fb1fbd65000 with timeout 96. Current value: 0
Waiting completed. Current value: 0
Waiting for task count 2 at location 0x7fb1fbd65000 with timeout 0. Current value: 0
Samples: 537K of event 'cycles:P', Event count (approx.): 456413846417
Overhead Samples Command Shared Object Symbol
9.08% 36420 clpeak [kernel.kallsyms] [k] clear_bhb_loop ◆
5.43% 21754 clpeak [kernel.kallsyms] [k] __schedule ▒
4.73% 18984 clpeak libc.so.6 [.] __sched_yield ▒
4.65% 18636 clpeak [kernel.kallsyms] [k] _raw_spin_lock ▒
4.64% 18584 clpeak [vdso] [.] __vdso_clock_gettime ▒
4.61% 18482 clpeak [kernel.kallsyms] [k] native_sched_clock ▒
4.48% 17981 clpeak [kernel.kallsyms] [k] psi_account_irqtime ▒
4.01% 16069 clpeak [kernel.kallsyms] [k] update_curr ▒
3.71% 14894 clpeak [kernel.kallsyms] [k] syscall_exit_to_user_mode ▒
3.69% 14808 clpeak [kernel.kallsyms] [k] __calc_delta.constprop.0 ▒
3.66% 14687 clpeak [kernel.kallsyms] [k] entry_SYSRETQ_unsafe_stack ▒
3.50% 14043 clpeak [kernel.kallsyms] [k] pick_next_task_fair ▒
3.23% 12965 clpeak libigdrcl.so [.] 0x000000000005f724 ▒
2.76% 11064 clpeak [kernel.kallsyms] [k] pick_eevdf ▒
2.61% 10453 clpeak [kernel.kallsyms] [k] do_syscall_64 ▒
2.40% 9623 clpeak [kernel.kallsyms] [k] entry_SYSCALL_64 ▒
2.25% 9021 clpeak [kernel.kallsyms] [k] update_min_vruntime ▒
1.83% 7352 clpeak [kernel.kallsyms] [k] update_curr_se ▒
1.78% 7126 clpeak [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe ▒
1.56% 6269 clpeak [kernel.kallsyms] [k] __cgroup_account_cputime ▒
1.52% 6079 clpeak [kernel.kallsyms] [k] record_times ▒
1.51% 6064 clpeak [kernel.kallsyms] [k] update_rq_clock ▒
1.49% 5981 clpeak [kernel.kallsyms] [k] do_sched_yield ▒
1.11% 4453 clpeak [kernel.kallsyms] [k] rcu_note_context_switch ▒
1.01% 4037 clpeak libigdrcl.so [.] 0x000000000005f726 ▒
0.91% 3651 clpeak [kernel.kallsyms] [k] yield_task_fair ▒
0.83% 3346 clpeak [kernel.kallsyms] [k] entry_SYSCALL_64_safe_stack ▒
0.82% 3285 clpeak [kernel.kallsyms] [k] raw_spin_rq_lock_nested ▒
0.78% 3113 clpeak [kernel.kallsyms] [k] syscall_return_via_sysret ▒
0.75% 3008 clpeak [kernel.kallsyms] [k] schedule ▒
0.72% 64724 swapper [kernel.kallsyms] [k] intel_idle ▒
0.70% 2792 clpeak libc.so.6 [.] clock_gettime@@GLIBC_2.17 ▒
0.58% 2338 clpeak [kernel.kallsyms] [k] cpuacct_charge ▒
0.53% 2107 clpeak [kernel.kallsyms] [k] sched_clock ▒
0.51% 2060 clpeak [kernel.kallsyms] [k] sched_clock_cpu ▒
0.45% 1805 clpeak libstdc++.so.6.0.33 [.] std::chrono::_V2::system_clock::now() ▒
0.42% 1690 clpeak [kernel.kallsyms] [k] _raw_spin_unlock ▒
0.39% 1567 clpeak libigdrcl.so [.] 0x0000000000534467 ▒
0.39% 1563 clpeak [kernel.kallsyms] [k] x64_sys_call ▒
0.37% 1469 clpeak [kernel.kallsyms] [k] __list_add_valid_or_report ▒
0.35% 1418 clpeak [kernel.kallsyms] [k] check_cfs_rq_runtime ▒
0.35% 1402 clpeak [kernel.kallsyms] [k] pick_next_entity ▒
0.34% 1368 clpeak [kernel.kallsyms] [k] cgroup_rstat_updated ▒
0.34% 1364 clpeak [kernel.kallsyms] [k] __list_del_entry_valid_or_report ▒
0.27% 1098 clpeak [kernel.kallsyms] [k] syscall_exit_to_user_mode_prepare ▒
0.23% 1856 pgrep libc.so.6 [.] __strncpy_evex ▒
0.22% 882 clpeak libigdrcl.so [.] 0x000000000056e33e ▒
0.19% 775 clpeak [kernel.kallsyms] [k] __x64_sys_sched_yield
kernel 6.8.5-301.fc40.x86_64
Fedora Kinoite 40.20240419.n.0
intel-compute-runtime 24.09.28717.17-1.fc40.x86_64 (this also happens on latest 24.13.29138.7, installed in a ubuntu 22.04 container)
CPU: Intel Core i9-10940X
GPU: Intel Arc A770 16GB
As #710, @Disty0 writes:
This happens to me as well on both llama.cpp and clpeak.
clpeakoutput:Then it stuck here and
clpeakprocess consumes one cpu core (100% usage).perf record -awhen it stuck reports:System information: