Skip to content

Conversation

@JamesMBartlett
Copy link
Member

PR copied from PR to iovisor/bcc
I noticed that in a single call to perf_reader_event_read, the perf_reader can end up reading more than a full ring buffer's worth of data. As long as the consumer side (perf_reader_raw_cb) is slower than the producer side (ebpf events from the kernel), then the ring buffer will fill up and the ring buffer's head position will always be behind its tail. Since the consumer is slower than the producer, as soon as the perf_reader updates the ring buffer tail position, the kernel will quickly write a new event and increment head. Because the perf_reader_event_read loop, reloads the ring buffer's head position on each iteration, its possible for this function to loop forever. This change just makes it so that the head position is read once at the start of the loop, and then the loop reads all the data up until this initial head position. The downside of this is that if data is written by the kernel during the perf_reader_event_read call it won't be read until the next call to perf_reader_event_read, but this seems acceptable to me compared to unbounded runtime of perf_reader_event_read.

I also added a test that demonstrates the behaviour above. The test has a thread that constantly writes to /dev/null, and then on the main thread it deploys a ebpf program that pushes 32kB of data to the ring buffer on every single write call. It also has a sleep in the callback function for the probe, that forces the consumer to be much slower than the producer. Then in a separate thread it calls perf_buffer->poll(0) and records how much data is received by the callback. Without the changes in this diff, this test would show that more than a full ring buffer's worth of data (times the number of perf_readers' the poll call touched) was received by the callback. I wasn't sure what kernel version perf buffer's are supported from, but based on test_perf_event.cc I assumed that 4.13 should be good. (Happy to remove the test from this diff if its not something you want, but thought it was useful to demonstrate the behaviour I'm trying to fix)

Looked for a style guide, but couldn't find one, so let me know if there are style issues.

cc. @oazizi000

one ring buffer worth of data.

Signed-off-by: James Bartlett <[email protected]>
@oazizi000 oazizi000 merged commit f55c4de into pixie-io:pixie Jan 25, 2022
zasgar pushed a commit that referenced this pull request May 31, 2022
Currently, hardirqs use tid as key to store information while tracepoint
irq_handler_entry. It works fine if irq is triggered while normal task
running, but there is a chance causing overwrite issue while irq is
triggered while idle task (a.k.a swapper/x, tid=0) running on multi-core
system.

Let's say there are two irq event trigger simultaneously on both CPU
core, irq A @ core #0, irq B @ core #1, and system load is pretty light,
so BPF program will get tid=0 since current task is swapper/x for both cpu
core. In this case, the information of first irq event stored in map could
be overwritten by incoming second irq event.

Use tid and cpu_id together to make sure the key is unique for each
event in this corner case.

Please check more detail at merge request iovisor#2804, iovisor#3733.
zasgar pushed a commit that referenced this pull request May 31, 2022
There are two pass managers in LLVM. Currently BCC uses the legacy one.
Switch to the new pass manager because the legacy one will be removed
in upcoming releases of LLVM.

Running the following script:
```

from bcc import BPF

bpf_text = '''
static int foobar()
{
    bpf_trace_printk("enter vfs_read");
    return 0;
}

KFUNC_PROBE(vfs_read)
{
    return foobar();
}
'''

BPF(text=bpf_text, debug=1)
```

The IR output is the same with or without this change using LLVM 15:

    ; ModuleID = 'sscanf'
    source_filename = "sscanf"
    ; ModuleID = '/virtual/main.c'
    source_filename = "/virtual/main.c"
    target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
    target triple = "bpf-pc-linux"

    @_version = dso_local global i32 332032, section "version", align 4, !dbg !0
    @_license = dso_local global [4 x i8] c"GPL\00", section "license", align 1, !dbg !5
    @__const.foobar._fmt = private unnamed_addr constant [15 x i8] c"enter vfs_read\00", align 1
    @llvm.compiler.used = appending global [2 x ptr] [ptr @_license, ptr @_version], section "llvm.metadata"

    ; Function Attrs: alwaysinline nounwind
    define dso_local i32 @kfunc__vfs_read(ptr nocapture noundef readnone %0) local_unnamed_addr #0 section ".bpf.fn.kfunc__vfs_read" !dbg !33 {
    %2 = alloca [15 x i8], align 1
    call void @llvm.dbg.value(metadata ptr %0, metadata !39, metadata !DIExpression()), !dbg !41
    call void @llvm.dbg.value(metadata ptr undef, metadata !42, metadata !DIExpression()) #4, !dbg !45
    call void @llvm.lifetime.start.p0(i64 15, ptr nonnull %2) #4, !dbg !47
    call void @llvm.dbg.declare(metadata ptr %2, metadata !53, metadata !DIExpression()) #4, !dbg !58
    call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 1 dereferenceable(15) %2, ptr noundef nonnull align 1 dereferenceable(15) @__const.foobar._fmt, i64 15, i1 false) #4, !dbg !58
    %3 = call i32 (ptr, i64, ...) inttoptr (i64 6 to ptr)(ptr noundef nonnull %2, i64 noundef 15) #4, !dbg !59
    call void @llvm.lifetime.end.p0(i64 15, ptr nonnull %2) #4, !dbg !60
    call void @llvm.dbg.value(metadata i32 0, metadata !40, metadata !DIExpression()), !dbg !41
    ret i32 0, !dbg !61
    }

    ; Function Attrs: alwaysinline mustprogress nocallback nofree nosync nounwind readnone speculatable willreturn
    declare void @llvm.dbg.declare(metadata, metadata, metadata) #1

    ; Function Attrs: alwaysinline argmemonly mustprogress nocallback nofree nosync nounwind willreturn
    declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #2

    ; Function Attrs: alwaysinline argmemonly mustprogress nocallback nofree nosync nounwind willreturn
    declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #2

    ; Function Attrs: alwaysinline argmemonly mustprogress nofree nounwind willreturn
    declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg) #3

    ; Function Attrs: alwaysinline mustprogress nocallback nofree nosync nounwind readnone speculatable willreturn
    declare void @llvm.dbg.value(metadata, metadata, metadata) #1

    attributes #0 = { alwaysinline nounwind "frame-pointer"="none" "min-legal-vector-width"="0" "no-jump-tables"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
    attributes #1 = { alwaysinline mustprogress nocallback nofree nosync nounwind readnone speculatable willreturn }
    attributes #2 = { alwaysinline argmemonly mustprogress nocallback nofree nosync nounwind willreturn }
    attributes #3 = { alwaysinline argmemonly mustprogress nofree nounwind willreturn }
    attributes #4 = { nounwind }

    !llvm.dbg.cu = !{!2}
    !llvm.module.flags = !{!27, !28, !29, !30, !31}
    !llvm.ident = !{!32}

    !0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
    !1 = distinct !DIGlobalVariable(name: "_version", scope: !2, file: !14, line: 526, type: !26, isLocal: false, isDefinition: true)
    !2 = distinct !DICompileUnit(language: DW_LANG_C99, file: !3, producer: "Ubuntu clang version 15.0.0-++20220426083628+d738d4717f6d-1~exp1~20220426203725.435", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, globals: !4, splitDebugInlining: false, nameTableKind: None)
    !3 = !DIFile(filename: "/virtual/main.c", directory: "/home/ubuntu/sources/bpf-next")
    !4 = !{!0, !5, !12}
    !5 = !DIGlobalVariableExpression(var: !6, expr: !DIExpression())
    !6 = distinct !DIGlobalVariable(name: "_license", scope: !2, file: !7, line: 26, type: !8, isLocal: false, isDefinition: true)
    !7 = !DIFile(filename: "/virtual/include/bcc/footer.h", directory: "")
    !8 = !DICompositeType(tag: DW_TAG_array_type, baseType: !9, size: 32, elements: !10)
    !9 = !DIBasicType(name: "char", size: 8, encoding: DW_ATE_signed_char)
    !10 = !{!11}
    !11 = !DISubrange(count: 4)
    !12 = !DIGlobalVariableExpression(var: !13, expr: !DIExpression())
    !13 = distinct !DIGlobalVariable(name: "bpf_trace_printk_", scope: !2, file: !14, line: 542, type: !15, isLocal: true, isDefinition: true)
    !14 = !DIFile(filename: "/virtual/include/bcc/helpers.h", directory: "")
    !15 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !16, size: 64)
    !16 = !DISubroutineType(types: !17)
    !17 = !{!18, !19, !21, null}
    !18 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
    !19 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !20, size: 64)
    !20 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !9)
    !21 = !DIDerivedType(tag: DW_TAG_typedef, name: "u64", file: !22, line: 23, baseType: !23)
    !22 = !DIFile(filename: "include/asm-generic/int-ll64.h", directory: "/home/ubuntu/sources/bpf-next")
    !23 = !DIDerivedType(tag: DW_TAG_typedef, name: "__u64", file: !24, line: 31, baseType: !25)
    !24 = !DIFile(filename: "include/uapi/asm-generic/int-ll64.h", directory: "/home/ubuntu/sources/bpf-next")
    !25 = !DIBasicType(name: "unsigned long long", size: 64, encoding: DW_ATE_unsigned)
    !26 = !DIBasicType(name: "unsigned int", size: 32, encoding: DW_ATE_unsigned)
    !27 = !{i32 7, !"Dwarf Version", i32 4}
    !28 = !{i32 2, !"Debug Info Version", i32 3}
    !29 = !{i32 1, !"wchar_size", i32 4}
    !30 = !{i32 7, !"PIC Level", i32 2}
    !31 = !{i32 7, !"PIE Level", i32 2}
    !32 = !{!"Ubuntu clang version 15.0.0-++20220426083628+d738d4717f6d-1~exp1~20220426203725.435"}
    !33 = distinct !DISubprogram(name: "kfunc__vfs_read", scope: !34, file: !34, line: 23, type: !35, scopeLine: 23, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !38)
    !34 = !DIFile(filename: "/virtual/main.c", directory: "")
    !35 = !DISubroutineType(types: !36)
    !36 = !{!18, !37}
    !37 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !25, size: 64)
    !38 = !{!39, !40}
    !39 = !DILocalVariable(name: "ctx", arg: 1, scope: !33, file: !34, line: 23, type: !37)
    !40 = !DILocalVariable(name: "__ret", scope: !33, file: !34, line: 23, type: !18)
    !41 = !DILocation(line: 0, scope: !33)
    !42 = !DILocalVariable(name: "ctx", arg: 1, scope: !43, file: !34, line: 23, type: !37)
    !43 = distinct !DISubprogram(name: "____kfunc__vfs_read", scope: !34, file: !34, line: 23, type: !35, scopeLine: 24, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !44)
    !44 = !{!42}
    !45 = !DILocation(line: 0, scope: !43, inlinedAt: !46)
    !46 = distinct !DILocation(line: 23, column: 1, scope: !33)
    !47 = !DILocation(line: 15, column: 5, scope: !48, inlinedAt: !57)
    !48 = distinct !DILexicalBlock(scope: !49, file: !34, line: 15, column: 3)
    !49 = distinct !DISubprogram(name: "foobar", scope: !34, file: !34, line: 13, type: !50, scopeLine: 14, flags: DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !52)
    !50 = !DISubroutineType(types: !51)
    !51 = !{!18}
    !52 = !{!53}
    !53 = !DILocalVariable(name: "_fmt", scope: !48, file: !34, line: 15, type: !54)
    !54 = !DICompositeType(tag: DW_TAG_array_type, baseType: !9, size: 120, elements: !55)
    !55 = !{!56}
    !56 = !DISubrange(count: 15)
    !57 = distinct !DILocation(line: 25, column: 9, scope: !43, inlinedAt: !46)
    !58 = !DILocation(line: 15, column: 10, scope: !48, inlinedAt: !57)
    !59 = !DILocation(line: 15, column: 37, scope: !48, inlinedAt: !57)
    !60 = !DILocation(line: 15, column: 76, scope: !49, inlinedAt: !57)
    !61 = !DILocation(line: 23, column: 1, scope: !33)

Closes iovisor#3947.

References:
  [0]: https://llvm.org/docs/NewPassManager.html
  [1]: https://blog.llvm.org/posts/2021-03-26-the-new-pass-manager/

Signed-off-by: Hengqi Chen <[email protected]>
oazizi000 pushed a commit that referenced this pull request Jul 2, 2022
With the latest clang and bpftool the build of tcpconnect fails as
follows (output patched a bit for readability):

    bpftool gen skeleton tcpconnect.bpf.o > tcpconnect.skel.h
    Error: Something is wrong for .rodata's variable #1: need offset 0, already at 4.

This happens because the filter_ports variable is declared using a ".rodata hack":

    SEC(".rodata") int filter_ports[MAX_PORTS]

This breaks with the recent clang, as the filter_ports variable is placed into
the '.rodata,aw' section. Older clang would put it into '.rodata,a' section
where all 'const volatile' variables are placed. The result is that the
filter_ports variable has a wrong offset of 0 in BTF_KIND_DATASEC.

To hack the hack we can declare the variable as

    SEC(".rodata") const int filter_ports[MAX_PORTS]

but, instead, we now can just declare it as

    const volatile int filter_ports[MAX_PORTS]

In fact, this was already done in a02663b ("libbpf-tools: update bpftool and
fix .rodata hack"), but a later commit f8ac3c6 ("libbpf-tools: fix tcpconnect
compile errors") reverted the change without any comments.

Signed-off-by: Anton Protopopov <[email protected]>
JamesMBartlett pushed a commit that referenced this pull request Dec 8, 2022
…for -v option

Add additional information and change format of backtrace
- add symbol base offset, dso name, dso base offset
- symbol and dso info is included if it's available in target binary
- changed format:
INDEX ADDR [SYMBOL+OFFSET] (MODULE+OFFSET)

Print backtrace of ip if it failed to get syms.

Before:
  # offcputime -v
    psiginfo
    vscanf
    __snprintf_chk
    [unknown]
    [unknown]
    [unknown]
    [unknown]
    [unknown]
    sd_event_exit
    sd_event_dispatch
    sd_event_run
    [unknown]
    __libc_start_main
    [unknown]
    -                systemd-journal (204)
        1

    xas_load
    xas_find
    filemap_map_pages
    __handle_mm_fault
    handle_mm_fault
    do_page_fault
    do_translation_fault
    do_mem_abort
    do_el0_ia_bp_hardening
    el0_ia
    xas_load
    --
failed to get syms
      -                PmLogCtl (138757)
        1

After:
  # offcputime -v
    #0  0xffffffc01018b7e8 __arm64_sys_clock_nanosleep+0x0
    #1  0xffffffc01009a93c el0_svc_handler+0x34
    #2  0xffffffc010084a08 el0_svc+0x8
    #3  0xffffffc01018b7e8 __arm64_sys_clock_nanosleep+0x0
    --
    #4  0x0000007fa0bffd14 clock_nanosleep+0x94 (/usr/lib/libc-2.31.so+0x9ed14)
    #5  0x0000007fa0c0530c nanosleep+0x1c (/usr/lib/libc-2.31.so+0xa430c)
    #6  0x0000007fa0c051e4 sleep+0x34 (/usr/lib/libc-2.31.so+0xa41e4)
    #7  0x000000558a5a9608 flb_loop+0x28 (/usr/bin/fluent-bit+0x52608)
    #8  0x000000558a59f1c4 flb_main+0xa84 (/usr/bin/fluent-bit+0x481c4)
    #9  0x0000007fa0b85124 __libc_start_main+0xe4 (/usr/lib/libc-2.31.so+0x24124)
    iovisor#10 0x000000558a59d828 _start+0x34 (/usr/bin/fluent-bit+0x46828)
    -                fluent-bit (1238)
        1

    #0  0xffffffc01027daa4 generic_copy_file_checks+0x334
    #1  0xffffffc0102ba634 __handle_mm_fault+0x8dc
    #2  0xffffffc0102baa20 handle_mm_fault+0x168
    #3  0xffffffc010ad23c0 do_page_fault+0x148
    #4  0xffffffc010ad27c0 do_translation_fault+0xb0
    #5  0xffffffc0100816b0 do_mem_abort+0x50
    #6  0xffffffc0100843b0 el0_da+0x1c
    #7  0xffffffc01027daa4 generic_copy_file_checks+0x334
    --
    #8  0x0000007f8dc12648 [unknown]
    #9  0x0000007f8dc0aef8 [unknown]
    iovisor#10 0x0000007f8dc1c990 [unknown]
    iovisor#11 0x0000007f8dc08b0c [unknown]
    iovisor#12 0x0000007f8dc08e48 [unknown]
    iovisor#13 0x0000007f8dc081c8 [unknown]
    -                PmLogCtl (2412)
        1

Fixed: iovisor#3884
Signed-off-by: Eunseon Lee <[email protected]>
ddelnano pushed a commit that referenced this pull request Oct 9, 2024
…option

Add additional information and change format of backtrace
- add symbol base offset, dso name, dso base offset
- symbol and dso info is included if it's available in target binary
- changed format:
INDEX ADDR [SYMBOL+OFFSET] (MODULE+OFFSET)

before:
  # ./capable -UK
    TIME     UID   PID     COMM             CAP     NAME                 AUDIT   VER DICT
    01:59:17 0     730     irqbalance       21      CAP_SYS_ADMIN        0       deny
        cap_vm_enough_memory
        security_vm_enough_memory_mm
        mmap_region
        do_mmap
        vm_mmap_pgoff
        do_syscall_64
        entry_SYSCALL_64_after_hwframe
        mmap64
        -                irqbalance (730)

After:
  # ./capable -UKv
    TIME     UID   PID     COMM             CAP     NAME                 AUDIT   VERDICT
    01:56:37 0     730     irqbalance       21      CAP_SYS_ADMIN        0       deny
        #0  0xffffffff81447dc6 cap_vm_enough_memory+0x26
        #1  0xffffffff8144a94f security_vm_enough_memory_mm+0x2f
        #2  0xffffffff812576e3 mmap_region+0x103
        #3  0xffffffff8125837e do_mmap+0x3de
        #4  0xffffffff8122c41c vm_mmap_pgoff+0xdc
        #5  0xffffffff81dc3be0 do_syscall_64+0x50
        #6  0xffffffff81e0011b entry_SYSCALL_64_after_hwframe+0x63
        #7  0x00007f3036e9e9ca mmap64+0xa (/lib/x86_64-linux-gnu/libc-2.19.so+0xf49ca)
        -                irqbalance (730)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants