Skip to content

[Bug] Core Dump in mirror_replay Test Suite During Execution #782

@edespino

Description

@edespino

Apache Cloudberry version

main branch

What happened

The mirror_replay test suite is consistently generating a core dump during execution. This test is part of the greenplum_schedule running under the ic-good-opt-off (make -c src/test/regress installcheck-good) test matrix configuration. From the core dump's stack , the issue occurs specifically during the append-only segment file handling in the startup process.

Environment

Project: Apache Cloudberry
Test Suite: mirror_replay
Schedule: greenplum_schedule
Test Matrix Config: ic-good-opt-off
Build Type: Debug build with the following configuration:

--enable-debug
--enable-profiling
--enable-cassert
--enable-debug-extensions

Stack Trace
The core dump stack trace indicates the crash occurs during append-only segment file handling:

Thread 1 (Thread 0x7f9cf7a5ed00 (LWP 8442)):
#0  0x00007f9cf8f11a6c in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007f9cf8ec4686 in raise () from /lib64/libc.so.6
#2  0x00007f9cf8eae833 in abort () from /lib64/libc.so.6
#3  0x00007f9cf9ca28bf in errfinish (filename=<optimized out>, filename@entry=0x7f9cfa27ef7a "xlogutils.c", lineno=lineno@entry=103, funcname=funcname@entry=0x7f9cfa27f060 <__func__.5> "log_invalid_page") at elog.c:819
#4  0x00007f9cf97272d6 in log_invalid_page (present=false, blkno=1, forkno=MAIN_FORKNUM, node=...) at xlogutils.c:103
#5  XLogAOSegmentFile (rnode=..., segmentFileNum=1) at xlogutils.c:567
#6  0x00007f9cf9d590a6 in ao_truncate_replay (record=<optimized out>, record=<optimized out>) at cdbappendonlyxlog.c:177
#7  0x00007f9cf971b7e5 in StartupXLOG () at xlog.c:7824
#8  0x00007f9cf9a6d124 in StartupProcessMain () at startup.c:267
#9  0x00007f9cf9767e52 in AuxiliaryProcessMain (argc=<optimized out>, argc@entry=2, argv=<optimized out>, argv@entry=0x7ffd0b1cc490) at bootstrap.c:483
#10 0x00007f9cf9a6cbd4 in StartChildProcess (type=StartupProcess) at postmaster.c:6139
#11 PostmasterMain (argc=argc@entry=7, argv=argv@entry=0x137aa30) at postmaster.c:1668
#12 0x000000000040282f in main (argc=7, argv=0x137aa30) at main/main.c:270
$1 = {si_signo = 6, si_errno = 0, si_code = -6, _sifields = {_pad = {8442, 1000, 0 <repeats 26 times>}, _kill = {si_pid = 8442, si_uid = 1000}, _timer = {si_tid = 8442, si_overrun = 1000, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = 8442, si_uid = 1000, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 8442, si_uid = 1000, si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {si_addr = 0x3e8000020fa, _addr_lsb = 0, _addr_bnd = {_lower = 0x0, _upper = 0x0}}, _sigpoll = {si_band = 4294967304442, si_fd = 0}, _sigsys = {_call_addr = 0x3e8000020fa, _syscall = 0, _arch = 0}}}

Impact

  • Blocks successful execution of mirror_replay test suite
  • May indicate potential issues with append-only segment file handling during mirror synchronization

What you think should happen instead

Analysis

  1. The crash occurs in the startup process during XLOG replay
  2. Specifically fails in log_invalid_page() function in xlogutils.c
  3. The context suggests this is related to append-only segment file handling during mirror replay
  4. The immediate cause appears to be an invalid page access during AO segment file processing

How to reproduce

Ensure your system is capable of generating core files. Execute the following dev test execution command:

make -c src/test/regress installcheck-good

Issue reproduces consistently without additional steps

Operating System

Rocky Linux 9 (should be platfo independent)

Anything else

Additional Context
The error occurs during the append-only truncate replay operation (ao_truncate_replay), suggesting potential issues with either:

  • Invalid segment file state during replay
  • Corruption in the XLOG records
  • Incorrect handling of append-only segment files during mirror synchronization

Are you willing to submit PR?

  • Yes, I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: BugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions