-
Notifications
You must be signed in to change notification settings - Fork 4k
hardirqs: fix duplicated count for shared IRQ #3733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hardirqs: fix duplicated count for shared IRQ #3733
Conversation
Currently, hardirqs will count interrupt event simply while tracepoint
irq:irq_handler_entry triggered, it's fine for system without shared IRQ
event, but it will cause wrong interrupt count result for system with
shared IRQ, because kernel will interate all irq handlers belong to this
IRQ descriptor.
Take an example for system with shared IRQ below.
root@localhost:/# cat /proc/interrupts
CPU0
13: 385248 GICv3 39 Level DDOMAIN ISR, gdma
...
23: 61532 GICv3 38 Level VGIP ISR, OnlineMeasure ISR
DDOMAIN IRQ and gdma shared the IRQ 13, VGIP ISR and OnlineMeasure
shared the IRQ 23, and use 'hardirqs -C' to measure the count of these
interrupt event.
root@localhost:/# hardirqs -C 10 1
Tracing hard irq events... Hit Ctrl-C to end.
HARDIRQ TOTAL_count
OnlineMeasure ISR 300
VGIP ISR 300
gdma 2103
DDOMAIN ISR 2103
eth0 6677
hardirqs reported the same interrupt count for shared IRQ
'OnlineMeasure ISR/VGIP ISR' and 'gdma/DDOMAIN ISR'.
We should check the ret field of tracepoint irq:irq_hanlder_exit is
IRQ_HANDLED or IRQ_WAKE_THREAD to make sure the current event is belong
to this interrupt handler. For simplifying, just check `args->ret !=
IRQ_NONE`.
In the meantimes, the same changes should be applied to interrupt time
measurement.
The fixed hardirqs will show below output.
(bcc)root@localhost:/# ./hardirqs -C 10 1
Tracing hard irq events... Hit Ctrl-C to end.
HARDIRQ TOTAL_count
OnlineMeasure ISR 1
VGIP ISR 294
gdma 1168
DDOMAIN ISR 1476
eth0 5210
9bb2f53 to
ee6d955
Compare
|
I am not familiar with this shared IRQ thing, but I test this change, looks correct to me. I have a small question: |
|
Thanks for detailed commit message/explanation. The change looks good me. |
In IRQ context, tid is not so useful, zero tid(swapper) is because the system is going to sleep or idle or some scheduler matters. The only function I think is you know if your measure process is interrupted. For example: You use funclatency to measure function-A and get 300 us, but sometimes the part of latency belongs to ISR. |
Hi @chenhengqi This should cause similar issue reported here #2804. For ISR context, maybe we should use cpu id by calling bpf helper In another way, we can also use Hi @yonghong-song |
|
@ismhong The only issue we have is tid = 0. For any non-zero tid, since the bpf is running in irq context, the underlying tid won't change and won't conflict with other tid's, so we can use it as the key since the key will be removed after irq_exit. To handle tid = 0, as you mentioned, either cpu_id or percpu_hash is fine. cpu_id is simpler. Please submit a followup patch to fix this issue. It would be good to add a comment to mention that the change is to fix tid = 0 issue and cpu_id is okay since there is no migration for irq handlers. |
Currently, hardirqs use tid as key to store information while tracepoint irq_handler_entry. It works fine if irq is triggered while normal task running, but there is a chance causing overwrite issue while irq is triggered while idle task (a.k.a swapper/x, tid=0) running on multi-core system. Let's say there are two irq event trigger simultaneously on both CPU core, irq A @ core #0, irq B @ core #1, and system load is pretty light, so BPF program will get tid=0 since current task is swapper/x for both cpu core. In this case, the information of first irq event stored in map could be overwritten by incoming second irq event. Use tid and cpu_id together to make sure the key is unique for each event in this corner case. Please check more detail at merge request iovisor#2804, iovisor#3733.
Currently, hardirqs use tid as key to store information while tracepoint irq_handler_entry. It works fine if irq is triggered while normal task running, but there is a chance causing overwrite issue while irq is triggered while idle task (a.k.a swapper/x, tid=0) running on multi-core system. Let's say there are two irq event trigger simultaneously on both CPU core, irq A @ core #0, irq B @ core #1, and system load is pretty light, so BPF program will get tid=0 since current task is swapper/x for both cpu core. In this case, the information of first irq event stored in map could be overwritten by incoming second irq event. Use tid and cpu_id together to make sure the key is unique for each event in this corner case. Please check more detail at merge request #2804, #3733.
Currently, hardirqs use tid as key to store information while tracepoint irq_handler_entry. It works fine if irq is triggered while normal task running, but there is a chance causing overwrite issue while irq is triggered while idle task (a.k.a swapper/x, tid=0) running on multi-core system. Let's say there are two irq event trigger simultaneously on both CPU core, irq A @ core #0, irq B @ core iovisor#1, and system load is pretty light, so BPF program will get tid=0 since current task is swapper/x for both cpu core. In this case, the information of first irq event stored in map could be overwritten by incoming second irq event. Use tid and cpu_id together to make sure the key is unique for each event in this corner case. Please check more detail at merge request iovisor#2804, iovisor#3733.
Currently, hardirqs will count interrupt event simply while tracepoint
irq:irq_handler_entry triggered, it's fine for system without shared IRQ
event, but it will cause wrong interrupt count result for system with
shared IRQ, because kernel will interate all irq handlers belong to this
IRQ descriptor.
Take an example for system with shared IRQ below.
DDOMAIN IRQ and gdma shared the IRQ 13, VGIP ISR and OnlineMeasure
shared the IRQ 23, and use 'hardirqs -C' to measure the count of these
interrupt event.
hardirqs reported the same interrupt count for shared IRQ
'OnlineMeasure ISR/VGIP ISR' and 'gdma/DDOMAIN ISR'.
We should check the ret field of tracepoint irq:irq_hanlder_exit is
IRQ_HANDLED or IRQ_WAKE_THREAD to make sure the current event is belong to this
interrupt handler. For simplifying, just check
args->ret != IRQ_NONE.In the meantimes, the same changes should be applied to interrupt time
measurement.
The fixed hardirqs will show below output.