forked from iovisor/bcc
-
Notifications
You must be signed in to change notification settings - Fork 5
perf_reader: perf_reader_event_read will consume more than a one ring buffer worth of data #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
oazizi000
merged 1 commit into
pixie-io:pixie
from
JamesMBartlett:james/perf_reader_fixes
Jan 25, 2022
Merged
perf_reader: perf_reader_event_read will consume more than a one ring buffer worth of data #1
oazizi000
merged 1 commit into
pixie-io:pixie
from
JamesMBartlett:james/perf_reader_fixes
Jan 25, 2022
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
one ring buffer worth of data. Signed-off-by: James Bartlett <[email protected]>
zasgar
pushed a commit
that referenced
this pull request
May 31, 2022
Currently, hardirqs use tid as key to store information while tracepoint irq_handler_entry. It works fine if irq is triggered while normal task running, but there is a chance causing overwrite issue while irq is triggered while idle task (a.k.a swapper/x, tid=0) running on multi-core system. Let's say there are two irq event trigger simultaneously on both CPU core, irq A @ core #0, irq B @ core #1, and system load is pretty light, so BPF program will get tid=0 since current task is swapper/x for both cpu core. In this case, the information of first irq event stored in map could be overwritten by incoming second irq event. Use tid and cpu_id together to make sure the key is unique for each event in this corner case. Please check more detail at merge request iovisor#2804, iovisor#3733.
zasgar
pushed a commit
that referenced
this pull request
May 31, 2022
There are two pass managers in LLVM. Currently BCC uses the legacy one.
Switch to the new pass manager because the legacy one will be removed
in upcoming releases of LLVM.
Running the following script:
```
from bcc import BPF
bpf_text = '''
static int foobar()
{
bpf_trace_printk("enter vfs_read");
return 0;
}
KFUNC_PROBE(vfs_read)
{
return foobar();
}
'''
BPF(text=bpf_text, debug=1)
```
The IR output is the same with or without this change using LLVM 15:
; ModuleID = 'sscanf'
source_filename = "sscanf"
; ModuleID = '/virtual/main.c'
source_filename = "/virtual/main.c"
target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "bpf-pc-linux"
@_version = dso_local global i32 332032, section "version", align 4, !dbg !0
@_license = dso_local global [4 x i8] c"GPL\00", section "license", align 1, !dbg !5
@__const.foobar._fmt = private unnamed_addr constant [15 x i8] c"enter vfs_read\00", align 1
@llvm.compiler.used = appending global [2 x ptr] [ptr @_license, ptr @_version], section "llvm.metadata"
; Function Attrs: alwaysinline nounwind
define dso_local i32 @kfunc__vfs_read(ptr nocapture noundef readnone %0) local_unnamed_addr #0 section ".bpf.fn.kfunc__vfs_read" !dbg !33 {
%2 = alloca [15 x i8], align 1
call void @llvm.dbg.value(metadata ptr %0, metadata !39, metadata !DIExpression()), !dbg !41
call void @llvm.dbg.value(metadata ptr undef, metadata !42, metadata !DIExpression()) #4, !dbg !45
call void @llvm.lifetime.start.p0(i64 15, ptr nonnull %2) #4, !dbg !47
call void @llvm.dbg.declare(metadata ptr %2, metadata !53, metadata !DIExpression()) #4, !dbg !58
call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 1 dereferenceable(15) %2, ptr noundef nonnull align 1 dereferenceable(15) @__const.foobar._fmt, i64 15, i1 false) #4, !dbg !58
%3 = call i32 (ptr, i64, ...) inttoptr (i64 6 to ptr)(ptr noundef nonnull %2, i64 noundef 15) #4, !dbg !59
call void @llvm.lifetime.end.p0(i64 15, ptr nonnull %2) #4, !dbg !60
call void @llvm.dbg.value(metadata i32 0, metadata !40, metadata !DIExpression()), !dbg !41
ret i32 0, !dbg !61
}
; Function Attrs: alwaysinline mustprogress nocallback nofree nosync nounwind readnone speculatable willreturn
declare void @llvm.dbg.declare(metadata, metadata, metadata) #1
; Function Attrs: alwaysinline argmemonly mustprogress nocallback nofree nosync nounwind willreturn
declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #2
; Function Attrs: alwaysinline argmemonly mustprogress nocallback nofree nosync nounwind willreturn
declare void @llvm.lifetime.end.p0(i64 immarg, ptr nocapture) #2
; Function Attrs: alwaysinline argmemonly mustprogress nofree nounwind willreturn
declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg) #3
; Function Attrs: alwaysinline mustprogress nocallback nofree nosync nounwind readnone speculatable willreturn
declare void @llvm.dbg.value(metadata, metadata, metadata) #1
attributes #0 = { alwaysinline nounwind "frame-pointer"="none" "min-legal-vector-width"="0" "no-jump-tables"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #1 = { alwaysinline mustprogress nocallback nofree nosync nounwind readnone speculatable willreturn }
attributes #2 = { alwaysinline argmemonly mustprogress nocallback nofree nosync nounwind willreturn }
attributes #3 = { alwaysinline argmemonly mustprogress nofree nounwind willreturn }
attributes #4 = { nounwind }
!llvm.dbg.cu = !{!2}
!llvm.module.flags = !{!27, !28, !29, !30, !31}
!llvm.ident = !{!32}
!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
!1 = distinct !DIGlobalVariable(name: "_version", scope: !2, file: !14, line: 526, type: !26, isLocal: false, isDefinition: true)
!2 = distinct !DICompileUnit(language: DW_LANG_C99, file: !3, producer: "Ubuntu clang version 15.0.0-++20220426083628+d738d4717f6d-1~exp1~20220426203725.435", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, globals: !4, splitDebugInlining: false, nameTableKind: None)
!3 = !DIFile(filename: "/virtual/main.c", directory: "/home/ubuntu/sources/bpf-next")
!4 = !{!0, !5, !12}
!5 = !DIGlobalVariableExpression(var: !6, expr: !DIExpression())
!6 = distinct !DIGlobalVariable(name: "_license", scope: !2, file: !7, line: 26, type: !8, isLocal: false, isDefinition: true)
!7 = !DIFile(filename: "/virtual/include/bcc/footer.h", directory: "")
!8 = !DICompositeType(tag: DW_TAG_array_type, baseType: !9, size: 32, elements: !10)
!9 = !DIBasicType(name: "char", size: 8, encoding: DW_ATE_signed_char)
!10 = !{!11}
!11 = !DISubrange(count: 4)
!12 = !DIGlobalVariableExpression(var: !13, expr: !DIExpression())
!13 = distinct !DIGlobalVariable(name: "bpf_trace_printk_", scope: !2, file: !14, line: 542, type: !15, isLocal: true, isDefinition: true)
!14 = !DIFile(filename: "/virtual/include/bcc/helpers.h", directory: "")
!15 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !16, size: 64)
!16 = !DISubroutineType(types: !17)
!17 = !{!18, !19, !21, null}
!18 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!19 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !20, size: 64)
!20 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !9)
!21 = !DIDerivedType(tag: DW_TAG_typedef, name: "u64", file: !22, line: 23, baseType: !23)
!22 = !DIFile(filename: "include/asm-generic/int-ll64.h", directory: "/home/ubuntu/sources/bpf-next")
!23 = !DIDerivedType(tag: DW_TAG_typedef, name: "__u64", file: !24, line: 31, baseType: !25)
!24 = !DIFile(filename: "include/uapi/asm-generic/int-ll64.h", directory: "/home/ubuntu/sources/bpf-next")
!25 = !DIBasicType(name: "unsigned long long", size: 64, encoding: DW_ATE_unsigned)
!26 = !DIBasicType(name: "unsigned int", size: 32, encoding: DW_ATE_unsigned)
!27 = !{i32 7, !"Dwarf Version", i32 4}
!28 = !{i32 2, !"Debug Info Version", i32 3}
!29 = !{i32 1, !"wchar_size", i32 4}
!30 = !{i32 7, !"PIC Level", i32 2}
!31 = !{i32 7, !"PIE Level", i32 2}
!32 = !{!"Ubuntu clang version 15.0.0-++20220426083628+d738d4717f6d-1~exp1~20220426203725.435"}
!33 = distinct !DISubprogram(name: "kfunc__vfs_read", scope: !34, file: !34, line: 23, type: !35, scopeLine: 23, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !38)
!34 = !DIFile(filename: "/virtual/main.c", directory: "")
!35 = !DISubroutineType(types: !36)
!36 = !{!18, !37}
!37 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !25, size: 64)
!38 = !{!39, !40}
!39 = !DILocalVariable(name: "ctx", arg: 1, scope: !33, file: !34, line: 23, type: !37)
!40 = !DILocalVariable(name: "__ret", scope: !33, file: !34, line: 23, type: !18)
!41 = !DILocation(line: 0, scope: !33)
!42 = !DILocalVariable(name: "ctx", arg: 1, scope: !43, file: !34, line: 23, type: !37)
!43 = distinct !DISubprogram(name: "____kfunc__vfs_read", scope: !34, file: !34, line: 23, type: !35, scopeLine: 24, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !44)
!44 = !{!42}
!45 = !DILocation(line: 0, scope: !43, inlinedAt: !46)
!46 = distinct !DILocation(line: 23, column: 1, scope: !33)
!47 = !DILocation(line: 15, column: 5, scope: !48, inlinedAt: !57)
!48 = distinct !DILexicalBlock(scope: !49, file: !34, line: 15, column: 3)
!49 = distinct !DISubprogram(name: "foobar", scope: !34, file: !34, line: 13, type: !50, scopeLine: 14, flags: DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !52)
!50 = !DISubroutineType(types: !51)
!51 = !{!18}
!52 = !{!53}
!53 = !DILocalVariable(name: "_fmt", scope: !48, file: !34, line: 15, type: !54)
!54 = !DICompositeType(tag: DW_TAG_array_type, baseType: !9, size: 120, elements: !55)
!55 = !{!56}
!56 = !DISubrange(count: 15)
!57 = distinct !DILocation(line: 25, column: 9, scope: !43, inlinedAt: !46)
!58 = !DILocation(line: 15, column: 10, scope: !48, inlinedAt: !57)
!59 = !DILocation(line: 15, column: 37, scope: !48, inlinedAt: !57)
!60 = !DILocation(line: 15, column: 76, scope: !49, inlinedAt: !57)
!61 = !DILocation(line: 23, column: 1, scope: !33)
Closes iovisor#3947.
References:
[0]: https://llvm.org/docs/NewPassManager.html
[1]: https://blog.llvm.org/posts/2021-03-26-the-new-pass-manager/
Signed-off-by: Hengqi Chen <[email protected]>
oazizi000
pushed a commit
that referenced
this pull request
Jul 2, 2022
With the latest clang and bpftool the build of tcpconnect fails as
follows (output patched a bit for readability):
bpftool gen skeleton tcpconnect.bpf.o > tcpconnect.skel.h
Error: Something is wrong for .rodata's variable #1: need offset 0, already at 4.
This happens because the filter_ports variable is declared using a ".rodata hack":
SEC(".rodata") int filter_ports[MAX_PORTS]
This breaks with the recent clang, as the filter_ports variable is placed into
the '.rodata,aw' section. Older clang would put it into '.rodata,a' section
where all 'const volatile' variables are placed. The result is that the
filter_ports variable has a wrong offset of 0 in BTF_KIND_DATASEC.
To hack the hack we can declare the variable as
SEC(".rodata") const int filter_ports[MAX_PORTS]
but, instead, we now can just declare it as
const volatile int filter_ports[MAX_PORTS]
In fact, this was already done in a02663b ("libbpf-tools: update bpftool and
fix .rodata hack"), but a later commit f8ac3c6 ("libbpf-tools: fix tcpconnect
compile errors") reverted the change without any comments.
Signed-off-by: Anton Protopopov <[email protected]>
JamesMBartlett
pushed a commit
that referenced
this pull request
Dec 8, 2022
…for -v option
Add additional information and change format of backtrace
- add symbol base offset, dso name, dso base offset
- symbol and dso info is included if it's available in target binary
- changed format:
INDEX ADDR [SYMBOL+OFFSET] (MODULE+OFFSET)
Print backtrace of ip if it failed to get syms.
Before:
# offcputime -v
psiginfo
vscanf
__snprintf_chk
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
sd_event_exit
sd_event_dispatch
sd_event_run
[unknown]
__libc_start_main
[unknown]
- systemd-journal (204)
1
xas_load
xas_find
filemap_map_pages
__handle_mm_fault
handle_mm_fault
do_page_fault
do_translation_fault
do_mem_abort
do_el0_ia_bp_hardening
el0_ia
xas_load
--
failed to get syms
- PmLogCtl (138757)
1
After:
# offcputime -v
#0 0xffffffc01018b7e8 __arm64_sys_clock_nanosleep+0x0
#1 0xffffffc01009a93c el0_svc_handler+0x34
#2 0xffffffc010084a08 el0_svc+0x8
#3 0xffffffc01018b7e8 __arm64_sys_clock_nanosleep+0x0
--
#4 0x0000007fa0bffd14 clock_nanosleep+0x94 (/usr/lib/libc-2.31.so+0x9ed14)
#5 0x0000007fa0c0530c nanosleep+0x1c (/usr/lib/libc-2.31.so+0xa430c)
#6 0x0000007fa0c051e4 sleep+0x34 (/usr/lib/libc-2.31.so+0xa41e4)
#7 0x000000558a5a9608 flb_loop+0x28 (/usr/bin/fluent-bit+0x52608)
#8 0x000000558a59f1c4 flb_main+0xa84 (/usr/bin/fluent-bit+0x481c4)
#9 0x0000007fa0b85124 __libc_start_main+0xe4 (/usr/lib/libc-2.31.so+0x24124)
iovisor#10 0x000000558a59d828 _start+0x34 (/usr/bin/fluent-bit+0x46828)
- fluent-bit (1238)
1
#0 0xffffffc01027daa4 generic_copy_file_checks+0x334
#1 0xffffffc0102ba634 __handle_mm_fault+0x8dc
#2 0xffffffc0102baa20 handle_mm_fault+0x168
#3 0xffffffc010ad23c0 do_page_fault+0x148
#4 0xffffffc010ad27c0 do_translation_fault+0xb0
#5 0xffffffc0100816b0 do_mem_abort+0x50
#6 0xffffffc0100843b0 el0_da+0x1c
#7 0xffffffc01027daa4 generic_copy_file_checks+0x334
--
#8 0x0000007f8dc12648 [unknown]
#9 0x0000007f8dc0aef8 [unknown]
iovisor#10 0x0000007f8dc1c990 [unknown]
iovisor#11 0x0000007f8dc08b0c [unknown]
iovisor#12 0x0000007f8dc08e48 [unknown]
iovisor#13 0x0000007f8dc081c8 [unknown]
- PmLogCtl (2412)
1
Fixed: iovisor#3884
Signed-off-by: Eunseon Lee <[email protected]>
ddelnano
pushed a commit
that referenced
this pull request
Oct 9, 2024
…option
Add additional information and change format of backtrace
- add symbol base offset, dso name, dso base offset
- symbol and dso info is included if it's available in target binary
- changed format:
INDEX ADDR [SYMBOL+OFFSET] (MODULE+OFFSET)
before:
# ./capable -UK
TIME UID PID COMM CAP NAME AUDIT VER DICT
01:59:17 0 730 irqbalance 21 CAP_SYS_ADMIN 0 deny
cap_vm_enough_memory
security_vm_enough_memory_mm
mmap_region
do_mmap
vm_mmap_pgoff
do_syscall_64
entry_SYSCALL_64_after_hwframe
mmap64
- irqbalance (730)
After:
# ./capable -UKv
TIME UID PID COMM CAP NAME AUDIT VERDICT
01:56:37 0 730 irqbalance 21 CAP_SYS_ADMIN 0 deny
#0 0xffffffff81447dc6 cap_vm_enough_memory+0x26
#1 0xffffffff8144a94f security_vm_enough_memory_mm+0x2f
#2 0xffffffff812576e3 mmap_region+0x103
#3 0xffffffff8125837e do_mmap+0x3de
#4 0xffffffff8122c41c vm_mmap_pgoff+0xdc
#5 0xffffffff81dc3be0 do_syscall_64+0x50
#6 0xffffffff81e0011b entry_SYSCALL_64_after_hwframe+0x63
#7 0x00007f3036e9e9ca mmap64+0xa (/lib/x86_64-linux-gnu/libc-2.19.so+0xf49ca)
- irqbalance (730)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR copied from PR to iovisor/bcc
I noticed that in a single call to
perf_reader_event_read, the perf_reader can end up reading more than a full ring buffer's worth of data. As long as the consumer side (perf_reader_raw_cb) is slower than the producer side (ebpf events from the kernel), then the ring buffer will fill up and the ring buffer's head position will always be behind its tail. Since the consumer is slower than the producer, as soon as the perf_reader updates the ring buffer tail position, the kernel will quickly write a new event and increment head. Because the perf_reader_event_read loop, reloads the ring buffer's head position on each iteration, its possible for this function to loop forever. This change just makes it so that the head position is read once at the start of the loop, and then the loop reads all the data up until this initial head position. The downside of this is that if data is written by the kernel during theperf_reader_event_readcall it won't be read until the next call toperf_reader_event_read, but this seems acceptable to me compared to unbounded runtime ofperf_reader_event_read.I also added a test that demonstrates the behaviour above. The test has a thread that constantly writes to
/dev/null, and then on the main thread it deploys a ebpf program that pushes 32kB of data to the ring buffer on every single write call. It also has a sleep in the callback function for the probe, that forces the consumer to be much slower than the producer. Then in a separate thread it callsperf_buffer->poll(0)and records how much data is received by the callback. Without the changes in this diff, this test would show that more than a full ring buffer's worth of data (times the number of perf_readers' the poll call touched) was received by the callback. I wasn't sure what kernel version perf buffer's are supported from, but based ontest_perf_event.ccI assumed that 4.13 should be good. (Happy to remove the test from this diff if its not something you want, but thought it was useful to demonstrate the behaviour I'm trying to fix)Looked for a style guide, but couldn't find one, so let me know if there are style issues.
cc. @oazizi000