hardirqs: fix duplicated count for shared IRQ #3733

ismhong · 2021-12-06T05:42:35Z

Currently, hardirqs will count interrupt event simply while tracepoint
irq:irq_handler_entry triggered, it's fine for system without shared IRQ
event, but it will cause wrong interrupt count result for system with
shared IRQ, because kernel will interate all irq handlers belong to this
IRQ descriptor.

Take an example for system with shared IRQ below.

root@localhost:/# cat /proc/interrupts
           CPU0
 13:     385248     GICv3  39 Level     DDOMAIN ISR, gdma
 ...
 23:      61532     GICv3  38 Level     VGIP ISR, OnlineMeasure ISR

DDOMAIN IRQ and gdma shared the IRQ 13, VGIP ISR and OnlineMeasure
shared the IRQ 23, and use 'hardirqs -C' to measure the count of these
interrupt event.

root@localhost:/# hardirqs -C 10 1
Tracing hard irq events... Hit Ctrl-C to end.

HARDIRQ                    TOTAL_count
OnlineMeasure ISR                  300
VGIP ISR                           300
gdma                              2103
DDOMAIN ISR                       2103
eth0                              6677

hardirqs reported the same interrupt count for shared IRQ
'OnlineMeasure ISR/VGIP ISR' and 'gdma/DDOMAIN ISR'.

We should check the ret field of tracepoint irq:irq_hanlder_exit is
IRQ_HANDLED or IRQ_WAKE_THREAD to make sure the current event is belong to this
interrupt handler. For simplifying, just check args->ret != IRQ_NONE.

In the meantimes, the same changes should be applied to interrupt time
measurement.

The fixed hardirqs will show below output.

(bcc)root@localhost:/# ./hardirqs -C 10 1
Tracing hard irq events... Hit Ctrl-C to end.

HARDIRQ                    TOTAL_count
OnlineMeasure ISR                    1
VGIP ISR                           294
gdma                              1168
DDOMAIN ISR                       1476
eth0                              5210

Currently, hardirqs will count interrupt event simply while tracepoint irq:irq_handler_entry triggered, it's fine for system without shared IRQ event, but it will cause wrong interrupt count result for system with shared IRQ, because kernel will interate all irq handlers belong to this IRQ descriptor. Take an example for system with shared IRQ below. root@localhost:/# cat /proc/interrupts CPU0 13: 385248 GICv3 39 Level DDOMAIN ISR, gdma ... 23: 61532 GICv3 38 Level VGIP ISR, OnlineMeasure ISR DDOMAIN IRQ and gdma shared the IRQ 13, VGIP ISR and OnlineMeasure shared the IRQ 23, and use 'hardirqs -C' to measure the count of these interrupt event. root@localhost:/# hardirqs -C 10 1 Tracing hard irq events... Hit Ctrl-C to end. HARDIRQ TOTAL_count OnlineMeasure ISR 300 VGIP ISR 300 gdma 2103 DDOMAIN ISR 2103 eth0 6677 hardirqs reported the same interrupt count for shared IRQ 'OnlineMeasure ISR/VGIP ISR' and 'gdma/DDOMAIN ISR'. We should check the ret field of tracepoint irq:irq_hanlder_exit is IRQ_HANDLED or IRQ_WAKE_THREAD to make sure the current event is belong to this interrupt handler. For simplifying, just check `args->ret != IRQ_NONE`. In the meantimes, the same changes should be applied to interrupt time measurement. The fixed hardirqs will show below output. (bcc)root@localhost:/# ./hardirqs -C 10 1 Tracing hard irq events... Hit Ctrl-C to end. HARDIRQ TOTAL_count OnlineMeasure ISR 1 VGIP ISR 294 gdma 1168 DDOMAIN ISR 1476 eth0 5210

chenhengqi · 2021-12-06T12:48:37Z

I am not familiar with this shared IRQ thing, but I test this change, looks correct to me.

I have a small question:
Under IRQ context, is the tid useful ? I think they should all be zero.
I tried print the tid, most of them are 0 with some exceptions:

          <idle>-0       [000] d.h41 2763892.666859: bpf_trace_printk: tid = 0
          <idle>-0       [001] d.h41 2763892.668261: bpf_trace_printk: tid = 0
 systemd-resolve-905     [000] d.h41 2763894.418696: bpf_trace_printk: tid = 905
          <idle>-0       [000] d.h41 2763894.419803: bpf_trace_printk: tid = 0
          <idle>-0       [000] d.h41 2763894.420129: bpf_trace_printk: tid = 0

yonghong-song · 2021-12-06T18:26:59Z

Thanks for detailed commit message/explanation. The change looks good me.

netedwardwu · 2021-12-06T22:35:51Z

I am not familiar with this shared IRQ thing, but I test this change, looks correct to me.

I have a small question: Under IRQ context, is the tid useful ? I think they should all be zero. I tried print the tid, most of them are 0 with some exceptions:
          <idle>-0       [000] d.h41 2763892.666859: bpf_trace_printk: tid = 0
          <idle>-0       [001] d.h41 2763892.668261: bpf_trace_printk: tid = 0
 systemd-resolve-905     [000] d.h41 2763894.418696: bpf_trace_printk: tid = 905
          <idle>-0       [000] d.h41 2763894.419803: bpf_trace_printk: tid = 0
          <idle>-0       [000] d.h41 2763894.420129: bpf_trace_printk: tid = 0

In IRQ context, tid is not so useful, zero tid(swapper) is because the system is going to sleep or idle or some scheduler matters.
IRQ interrupts above jobs, so tid = 0

The only function I think is you know if your measure process is interrupted.

For example: You use funclatency to measure function-A and get 300 us, but sometimes the part of latency belongs to ISR.
So accurate latency should be 300 us minus ISR latency.

ismhong · 2021-12-07T06:44:49Z

I have a small question:
Under IRQ context, is the tid useful ? I think they should all be zero.
I tried print the tid, most of them are 0 with some exceptions:
     <idle>-0       [000] d.h41 2763892.666859: bpf_trace_printk: tid = 0
     <idle>-0       [001] d.h41 2763892.668261: bpf_trace_printk: tid = 0
systemd-resolve-905 [000] d.h41 2763894.418696: bpf_trace_printk: tid = 905
-0 [000] d.h41 2763894.419803: bpf_trace_printk: tid = 0
-0 [000] d.h41 2763894.420129: bpf_trace_printk: tid = 0

Hi @chenhengqi
Hmmm... nice question. :)
I just simply reference to the original implementation of interrupt time measurement bpf_text_time, but what you mentioned is making sense, tid should be not useful for ISR context.

This should cause similar issue reported here #2804.

(bcc)root@localhost:/# bpftrace -e 'tracepoint:irq:irq_handler_entry { @[comm, pid, tid, cpu]=count(); }'
@[hdmi_task, 590, 590, 1]: 45
@[hdmi_task, 590, 590, 0]: 89
@[swapper/3, 0, 0, 3]: 1718
@[swapper/2, 0, 0, 2]: 2319
@[swapper/0, 0, 0, 0]: 7819
@[swapper/1, 0, 0, 1]: 8715

For ISR context, maybe we should use cpu id by calling bpf helper bpf_get_smp_processor_id for key to store ISR name at tracepoint:irq:irq_handler_entry and get back at tracepoint:irq:irq_handler_exit.

In another way, we can also use PERCPU_HASH to prevent this issue. In my opinion, use cpu_id as key should be more simple and don't need to create a PERCPU_HASH rather than simple hash map.

Hi @yonghong-song
Do you have any suggestion for this? Thank you. :)

yonghong-song · 2021-12-07T07:37:49Z

@ismhong The only issue we have is tid = 0. For any non-zero tid, since the bpf is running in irq context, the underlying tid won't change and won't conflict with other tid's, so we can use it as the key since the key will be removed after irq_exit.

To handle tid = 0, as you mentioned, either cpu_id or percpu_hash is fine. cpu_id is simpler. Please submit a followup patch to fix this issue. It would be good to add a comment to mention that the change is to fix tid = 0 issue and cpu_id is okay since there is no migration for irq handlers.

Currently, hardirqs use tid as key to store information while tracepoint irq_handler_entry. It works fine if irq is triggered while normal task running, but there is a chance causing overwrite issue while irq is triggered while idle task (a.k.a swapper/x, tid=0) running on multi-core system. Let's say there are two irq event trigger simultaneously on both CPU core, irq A @ core #0, irq B @ core #1, and system load is pretty light, so BPF program will get tid=0 since current task is swapper/x for both cpu core. In this case, the information of first irq event stored in map could be overwritten by incoming second irq event. Use tid and cpu_id together to make sure the key is unique for each event in this corner case. Please check more detail at merge request iovisor#2804, iovisor#3733.

Currently, hardirqs use tid as key to store information while tracepoint irq_handler_entry. It works fine if irq is triggered while normal task running, but there is a chance causing overwrite issue while irq is triggered while idle task (a.k.a swapper/x, tid=0) running on multi-core system. Let's say there are two irq event trigger simultaneously on both CPU core, irq A @ core #0, irq B @ core #1, and system load is pretty light, so BPF program will get tid=0 since current task is swapper/x for both cpu core. In this case, the information of first irq event stored in map could be overwritten by incoming second irq event. Use tid and cpu_id together to make sure the key is unique for each event in this corner case. Please check more detail at merge request #2804, #3733.

Currently, hardirqs use tid as key to store information while tracepoint irq_handler_entry. It works fine if irq is triggered while normal task running, but there is a chance causing overwrite issue while irq is triggered while idle task (a.k.a swapper/x, tid=0) running on multi-core system. Let's say there are two irq event trigger simultaneously on both CPU core, irq A @ core #0, irq B @ core iovisor#1, and system load is pretty light, so BPF program will get tid=0 since current task is swapper/x for both cpu core. In this case, the information of first irq event stored in map could be overwritten by incoming second irq event. Use tid and cpu_id together to make sure the key is unique for each event in this corner case. Please check more detail at merge request iovisor#2804, iovisor#3733.

ismhong requested review from brendangregg and goldshtn as code owners December 6, 2021 05:42

ismhong force-pushed the fix-hardirqs-shared-irq-number branch from 9bb2f53 to ee6d955 Compare December 6, 2021 07:16

yonghong-song merged commit 814c264 into iovisor:master Dec 6, 2021

ismhong mentioned this pull request Dec 8, 2021

hardirqs: fix issue if irq is triggered while idle task (tid=0) #3740

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

hardirqs: fix duplicated count for shared IRQ #3733

hardirqs: fix duplicated count for shared IRQ #3733

Uh oh!

ismhong commented Dec 6, 2021 •

edited

Loading

Uh oh!

chenhengqi commented Dec 6, 2021

Uh oh!

yonghong-song commented Dec 6, 2021

Uh oh!

netedwardwu commented Dec 6, 2021

Uh oh!

ismhong commented Dec 7, 2021

Uh oh!

yonghong-song commented Dec 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hardirqs: fix duplicated count for shared IRQ #3733

hardirqs: fix duplicated count for shared IRQ #3733

Uh oh!

Conversation

ismhong commented Dec 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chenhengqi commented Dec 6, 2021

Uh oh!

yonghong-song commented Dec 6, 2021

Uh oh!

netedwardwu commented Dec 6, 2021

Uh oh!

ismhong commented Dec 7, 2021

Uh oh!

yonghong-song commented Dec 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ismhong commented Dec 6, 2021 •

edited

Loading