Skip to content

Enable PC sampling with attach/detach: add code_object tracking and dedicated background thread for attach/detach to prevent deadlocks#3921

Draft
julijose-amd wants to merge 2 commits intodevelopfrom
users/julijose-amd/pcs-attach-bg
Draft

Enable PC sampling with attach/detach: add code_object tracking and dedicated background thread for attach/detach to prevent deadlocks#3921
julijose-amd wants to merge 2 commits intodevelopfrom
users/julijose-amd/pcs-attach-bg

Conversation

@julijose-amd
Copy link
Contributor

Summary

PC Sampling: Code Object Tracking in Attach/Detach Mode

  • PC sampling relies on CodeobjTableTranslatorSynchronized to map sampled PC addresses to code object IDs. In non-attach mode, this is populated via hsa_executable_freeze / hsa_executable_destroy hooks. In attach/detach mode, these hooks are not available as the profiler attaches after HSA initialization, so hsa_executable_freeze has already been called for all loaded executables. Without the attach table path, the translation table remains empty and PC samples cannot be correlated to code objects.
  • Added a new initialization path in pc_sampling/code_object that uses the attach dispatch table to enumerate already‑loaded code objects and register a chained notify callback for newly loaded ones.
  • Ensures CodeobjTableTranslatorSynchronized is correctly populated, enabling PC address → code object correlation when attaching mid‑execution.
  • Skip buffer flushing for pre‑existing code objects during attach initialization, since the PC sampling service may not yet be active.
  • Introduced pc_sampling::code_object::initialize(rocattach_api) and integrated it into registration.cpp so the PC sampling code object module initializes when rocattach registers.

Background Thread to avoid deadlocks during Attach/Detach

  • Previously, ptrace‑based code injection targeted the main application thread, which could deadlock if that thread held internal mutexes.
  • Added a dedicated idle background thread in librocprofiler-sdk-attach that:
    • Names itself "rocp-bg-attach" via pthread_setname_np
    • Blocks all signals
    • Waits on a condition variable indefinitely
    • Avoids taking any application‑owned locks
  • The attacher now locates this safe injection thread by scanning /proc/<pid>/task/*/comm for "rocp-bg-attach" and uses its TID for all ptrace operations.

Changes

  • pc_sampling/code_object.cpp

    • Added executable_freeze_internal() shared logic for HSA hook path and the attach path
    • Added chained_notify_new_code_object() for callback chaining
    • Added load_attach_code_objects() for attach‑time initialization
    • Added initialize(RocAttachDispatchTable*) overload
  • pc_sampling/code_object.hpp

    • Included "lib/rocprofiler-sdk-attach/table.h"
    • Declared initialize(RocAttachDispatchTable*) overload
  • registration.cpp

    • Initialize PC sampling code_object module when rocattach registers
      (guarded by ROCPROFILER_SDK_HSA_PC_SAMPLING)
  • rocprofiler-sdk-attach/attach.cpp

    • Added BackgroundThread with thread naming, signal blocking, and idle wait
  • rocprofiler-sdk-rocattach/rocattach.cpp

    • Added resolve_attach_tid() to scan /proc/<pid>/task/*/comm
    • Updated setup() to use the background thread’s TID

Testing

  • Verified PC sampling with attach/detach on the transpose application — no deadlocks across multiple runs.
  • Confirmed background thread TID discovery via /proc/<pid>/task/*/comm.
  • The generated PC sampling CSV output contains:
    • Instruction addresses
    • Dispatch IDs
    • Correlation IDs

Copy link
Contributor

@meserve-amd meserve-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far, some minor comments on the draft
Please add tests or add-on to existing tests in the full PR

Comment on lines +35 to +39
#include <condition_variable>
#include <mutex>
#include <signal.h>
#include <thread>
#include <unistd.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <condition_variable>
#include <mutex>
#include <signal.h>
#include <thread>
#include <unistd.h>
#include <signal.h>
#include <unistd.h>
#include <condition_variable>
#include <mutex>
#include <thread>

Give system headers their own category separate from C++ library headers

{
rocprofiler::common::init_logging("ROCPROFILER_ATTACH");

static BackgroundThread bg_thread;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to use the rocprofiler common::static_object and hide this in a separate "get" function
See here for an example.
Available in lib/common/static_object.hpp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants