[CORE-14558] `storage`: prevent race between `disk_log_appender` and `remove_prefix_full_segments()` by WillemKauf · Pull Request #28472 · redpanda-data/redpanda

WillemKauf · 2025-11-11T15:48:12Z

New Cover Letter

Here be dragons. Read individual commit messages to understand races and their fixes

Old Cover Letter

(This only covers one of the discovered issues/race conditions)

This can race with a segment::close() operation and result in an exception being thrown:

1. F1: 13:12:39,356 [shard 0:clus] storage - disk_log_impl.cc:3236 - Removing "/var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log" (remove_prefix_full_segments...)
2. F2: 13:12:39,356 [shard 0:rr  ] storage - disk_log_impl.cc:2077 - creating log appender for: {redpanda/controller/0}, next offset: 24, log offsets: {start_offset:24, committed_offset:23, committed_offset_term:1, dirty_offset:23, dirty_offset_term:1}
3. F1: 13:12:39,357 [shard 0:clus] storage - disk_log_impl.cc:3249 - Segment has outstanding locks. Might take a while to close:/var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log
4. F2: _log.maybe_roll_unlocked() in disk_log_appender
5. F1: TRACE 2025-11-06 13:12:39,357 [shard 0:clus] storage - segment.cc:94 - closing segment: {offset_tracker:{term:1, base_offset:0, committed_offset:23, stable_offset:23, dirty_offset:23}, compacted_segment=0, finished_self_compaction=0, finished_windowed_compaction=0, generation=48, reader={/var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log
6. F1: _gate in segment 0-1-v1.log is closed as a result of the above
7. F2: We have an appender, so maybe_roll_unlocked() goes into segment::release_appender()
8. F2: We are forced into release_appender_in_background() due to held segment locks
9. F2: _gate.check(); in release_appender_in_background() throws.

Or, I hypothesize, we could end up attempting to append to a closed segment, which would also be an exceptional case.

Fixes https://redpandadata.atlassian.net/browse/CORE-14558.

Backports Required

Release Notes

Bug Fixes

Prevent a slew of races between operations in the storage layer that could lead to attempting to append to a closed segment or a failed assert

Copilot

Pull Request Overview

This PR adjusts the logging severity for batch write errors in the storage layer based on whether the error occurred during system shutdown. Errors during normal operation remain at ERROR level, while shutdown-related errors are downgraded to WARN to avoid alarming users about expected behavior during shutdown.

Conditionally sets log level to WARN for shutdown exceptions and ERROR for other exceptions
Uses ssx::is_shutdown_exception() to distinguish shutdown-related errors
Replaces direct stlog.error() call with level-conditional vlogl() macro

src/v/storage/probe.cc

nvartolomei

please see the comments first

nvartolomei · 2025-11-11T16:09:38Z

@WillemKauf do you happen to know what gate is getting closed? I'm suspicious that we close storage stuff with work in-flight. In most cases we'd want this to be the other way around, cancel work, wait for cancellation, then shutdown storage/segments in correct order.

WillemKauf · 2025-11-11T16:15:58Z

@nvartolomei I believe it is the log_manager's _gate, since we call into log_manager::make_log_segment() from disk_log_impl::new_segment() (from disk_log_impl::maybe_roll_unlocked()) which seems to be the only pathway in the relevant try/catch block here.

WillemKauf · 2025-11-11T16:17:15Z

I agree that perhaps the gate should be held/kept open by the appender if it is already in progress.

nvartolomei · 2025-11-11T16:29:18Z

@WillemKauf storage is one of the first services to start and hence i'd assume one of the last to stop. Something is leaking continuations then? My concern is that they, in this case, get an exception on the gate, but what if instead they access some deallocated memory in other instances? I.e. when the continuation is scheduled slightly later.

WillemKauf · 2025-11-11T16:47:20Z

~~@nvartolomei It looks like, from the CI failure logs, this was a concurrent partition movement of controller/0 and append requests leading to the ERROR log line.~~

WillemKauf · 2025-11-11T16:59:36Z

Ah, I think I see it.

1. F1: 13:12:39,356 [shard 0:clus] storage - disk_log_impl.cc:3236 - Removing "/var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log" (remove_prefix_full_segments...)
2. F2: 13:12:39,356 [shard 0:rr  ] storage - disk_log_impl.cc:2077 - creating log appender for: {redpanda/controller/0}, next offset: 24, log offsets: {start_offset:24, committed_offset:23, committed_offset_term:1, dirty_offset:23, dirty_offset_term:1}
3. F1: 13:12:39,357 [shard 0:clus] storage - disk_log_impl.cc:3249 - Segment has outstanding locks. Might take a while to close:/var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log
4. F2: _log.maybe_roll_unlocked() in disk_log_appender
5. F1: TRACE 2025-11-06 13:12:39,357 [shard 0:clus] storage - segment.cc:94 - closing segment: {offset_tracker:{term:1, base_offset:0, committed_offset:23, stable_offset:23, dirty_offset:23}, compacted_segment=0, finished_self_compaction=0, finished_windowed_compaction=0, generation=48, reader={/var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log
6. F1: _gate in segment 0-1-v1.log is closed as a result of the above
7. F2: We have an appender, so maybe_roll_unlocked() goes into segment::release_appender()
8. F2: We are forced into release_appender_in_background() due to held segment locks
9. F2: _gate.check(); in release_appender_in_background() throws.

nvartolomei · 2025-11-11T17:35:13Z

@WillemKauf this sounds like a potential bug? Should we ensure release appender is initiated before closing and takes a gate? Current behavior seem spaghetti 🤔

WillemKauf · 2025-11-11T17:38:17Z

@nvartolomei yes for sure this is spaghetti. Pushing an update soon- let me know what you think.

WillemKauf · 2025-11-11T17:40:18Z

@nvartolomei I don't think we should force a log appender to hold a segment's gate. We should allow segments to be closed and dispatch any batch appends to a new segment the log provides us with.

nvartolomei

From phone. Will have a better chance to look at this on Thursday the earliest.

nvartolomei · 2025-11-11T19:23:10Z

src/v/storage/disk_log_appender.cc

-            co_await _log.maybe_roll_unlocked(_last_term, _idx);
+            auto roll_fut = co_await ss::coroutine::as_future(
+              _log.maybe_roll_unlocked(_last_term, _idx));
+            if (roll_fut.failed()) {


This is a large catch all. I'm not comfortable with catch-all and continuing as if nothing happens. If we expect a benign race, can't we handle it at a lower level? Ie what if storage is indeed stopping (not sure it's possible today but may be another race introduced in the future)? It would spin in this while loop forever?

If it helps, this race (between disk_log_appender and a segment close) has always existed- the only difference in recent history is that release_appender_in_background() can now throw leading to this new narrow race condition. The best thing to do here is to ignore the closing of the old segment and attempt to re-roll with the log.

This exceptional path is also not what typically saves us from a spin-loop nor is it the typical exit path, fyi.

And to be honest, I think a lot of this spaghettiness comes from the fact the disk_log_appender directly interfaces with a segment obtained from the log instead of opaquely dealing with the log itself. It makes the object/behavior very difficult to reason about indeed.

I also need to stare at this a bit more, but I wonder if we should be treating the segment_roll_lock as a general lock around the last segment (regardless of whether it has an appender or not), in which case, maybe we should try to acquire the segment roll lock before calling _segs.pop_back() on the last segment (assuming that's what's happening here)

but I wonder if we should be treating...

That feels pretty sane. It feels like there are a bunch of overlapping hoops serving as concurrency control in local storage and this gap isn't yet covered.

It also seems like even if this roll were to be wrapped in a try/catch block, we may just end up throwing when attempting to append to a closed segment anyways in append_batch_to_segment.

I can try to prototype a solution which requires segment_roll_lock be held when popping back a segment from the log.

Hmm, actually, not sure this is the problem yet. Need to think about this more.

Does this commit 59dea3e pass the sniff test?

vbotbuildovich · 2025-11-11T20:17:27Z

CI test results

test results on build#76043

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason	test_history
ShadowLinkingReplicationTests	test_replication_basic	{"shuffle_leadership": true, "source_cluster_spec": {"cluster_type": "redpanda"}}	integration	https://buildkite.com/redpanda/redpanda/builds/76043#019a7434-4e74-4c26-bd96-90ad39601145	FLAKY	18/21	upstream reliability is '90.12256669069934'. current run reliability is '85.71428571428571'. drift is 4.40828 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_replication_basic
MountUnmountIcebergTest	test_simple_remount	{"cloud_storage_type": 1}	integration	https://buildkite.com/redpanda/redpanda/builds/76043#019a7434-4e6e-492e-a41d-e45c5e84b60f	FLAKY	14/21	upstream reliability is '88.74773139745916'. current run reliability is '66.66666666666666'. drift is 22.08106 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=MountUnmountIcebergTest&test_method=test_simple_remount
SegmentMsTest	test_segment_rolling_with_retention_consumer	null	integration	https://buildkite.com/redpanda/redpanda/builds/76043#019a7434-4e70-46c4-b515-30095b06a3be	FLAKY	19/21	upstream reliability is '93.28028293545535'. current run reliability is '90.47619047619048'. drift is 2.80409 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=SegmentMsTest&test_method=test_segment_rolling_with_retention_consumer
WriteCachingFailureInjectionTest	test_unavoidable_data_loss	null	integration	https://buildkite.com/redpanda/redpanda/builds/76043#019a7434-4e74-4c26-bd96-90ad39601145	FLAKY	19/21	upstream reliability is '94.77911646586345'. current run reliability is '90.47619047619048'. drift is 4.30293 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionTest&test_method=test_unavoidable_data_loss

test results on build#76173

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason	test_history
SegmentMsTest	test_segment_rolling_with_retention_consumer	null	integration	https://buildkite.com/redpanda/redpanda/builds/76173#019a7aaa-134c-4400-87ac-d5b7801649da	FLAKY	20/21	upstream reliability is '92.67241379310344'. current run reliability is '95.23809523809523'. drift is -2.56568 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=SegmentMsTest&test_method=test_segment_rolling_with_retention_consumer
simple_fair_scheduling_policy_fixture	test_happy_path		unit	https://buildkite.com/redpanda/redpanda/builds/76173#019a7a36-3eff-4f69-837a-be4fe7ef25b1	FAIL	0/1
src/v/storage/tests/segment_appender_rpbench_test	src/v/storage/tests/segment_appender_rpbench_test		unit	https://buildkite.com/redpanda/redpanda/builds/76173#019a7a36-3f01-4fab-96ef-6445f584e2b5	FAIL	0/1

test results on build#76180

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason	test_history
WriteCachingFailureInjectionTest	test_unavoidable_data_loss	null	integration	https://buildkite.com/redpanda/redpanda/builds/76180#019a7b1e-0307-4f10-a9bc-87fb5358a61a	FLAKY	20/21	upstream reliability is '94.93201483312733'. current run reliability is '95.23809523809523'. drift is -0.30608 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionTest&test_method=test_unavoidable_data_loss

test results on build#76227

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason	test_history
ReplicatedMetastoreTest	TestBasicRemoveTopics		unit	https://buildkite.com/redpanda/redpanda/builds/76227#019a7e2a-82af-42bc-98a9-b871394d48d7	FAIL	0/1
MountUnmountIcebergTest	test_simple_remount	{"cloud_storage_type": 1}	integration	https://buildkite.com/redpanda/redpanda/builds/76227#019a7ed2-ee61-421e-80d0-717cf4cc0e98	FLAKY	16/21	upstream reliability is '83.94160583941606'. current run reliability is '76.19047619047619'. drift is 7.75113 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=MountUnmountIcebergTest&test_method=test_simple_remount
DatalakeThrottlingTest	test_backlog_metric	{"catalog_type": "rest_hadoop", "cloud_storage_type": 1}	integration	https://buildkite.com/redpanda/redpanda/builds/76227#019a7ed2-ee61-421e-80d0-717cf4cc0e98	FLAKY	20/21	upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=DatalakeThrottlingTest&test_method=test_backlog_metric
SegmentMsTest	test_segment_rolling_with_retention_consumer	null	integration	https://buildkite.com/redpanda/redpanda/builds/76227#019a7ed2-ee6d-4838-846b-62f300952c8d	FLAKY	20/21	upstream reliability is '92.65785609397945'. current run reliability is '95.23809523809523'. drift is -2.58024 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=SegmentMsTest&test_method=test_segment_rolling_with_retention_consumer
WriteCachingFailureInjectionE2ETest	test_crash_all	{"use_transactions": false}	integration	https://buildkite.com/redpanda/redpanda/builds/76227#019a7ed2-ee6f-4ac4-8351-97155fd6af80	FLAKY	16/21	upstream reliability is '91.0295616717635'. current run reliability is '76.19047619047619'. drift is 14.83909 and the allowed drift is set to 50. The test should PASS	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all
src/v/storage/mvlog/tests/file_test	src/v/storage/mvlog/tests/file_test		unit	https://buildkite.com/redpanda/redpanda/builds/76227#019a7e8b-8e76-49dd-8a3b-8aabc6f8bbf5	FAIL	0/1

Instead, log batch append errors in the `disk_log_appender` at `ERROR`.

A useful sanity check to prevent any potential concurrency issues.

src/v/storage/disk_log_impl.cc

WillemKauf · 2025-11-13T02:28:55Z

/ci-repeat 1

oleiman · 2025-11-13T02:48:17Z

src/v/storage/disk_log_impl.cc

+    // concurrency issues.
+    std::optional<ssx::semaphore_units> seg_rolling_units;
+    if (ptr->has_appender()) {
+        seg_rolling_units = co_await _segments_rolling_lock.get_units();


if we happen to be waiting for log::close(), should we catch the broken semaphore exception here and quietly return?

We are currently relying on the thrown exception to break out of the ss::do_until() loop (unless we also make that a bit more intelligent- catching and silencing the exception without other changes would cause an infinite loop). The exception is eventually caught further up the chain (I think all the way in log_eviction_stm).

cool yeah, that's what i was asking. i didn't see anything else that would throw in this path at a glance, so this is basically new behavior? maybe it makes sense to add "closed" to the do_until condition instead, but idk.

Yeah, I agree that having an exceptional path here does seem like new behavior. It feels like similar behavior regardless of whether we catch and silently return on an exception here OR allow it to be thrown and bubble up the chain, but I'm still a little unsure.

Happy to get other opinions here and a sanity check on any potential side effects to this change (i.e previously if this were to race with a close() operation, it seems that we would still be able to prefix truncate the entire log if necessary)

I think this is fine, because if we're racing with a close, the log will have still persisted the start offset before getting here, and subsequent usage should be quite limited given the log is closing.

Is this PR overall too complex then? hitting a vassert is bad but if we have your PR to downgrade those to a throw, are we happy to accept there may be concurrency issues during shutdown which are fairly benign after all?

oleiman

looks pretty much legit to me. might want a second set of eyes

oleiman · 2025-11-13T03:24:37Z

src/v/storage/disk_log_impl.cc

+    // concurrency issues.
+    std::optional<ssx::semaphore_units> seg_rolling_units;
+    if (ptr->has_appender()) {
+        seg_rolling_units = co_await _segments_rolling_lock.get_units();


cool yeah, that's what i was asking. i didn't see anything else that would throw in this path at a glance, so this is basically new behavior? maybe it makes sense to add "closed" to the do_until condition instead, but idk.

…ve()`... ...in `disk_log_impl` Without any sort of safety guarantees around mutual exclusion of these functions, we can easily end up with an assert failure like ``` ERROR 2025-11-13 15:50:46,593 [shard 0:main] assert - Assert failure: (src/v/storage/disk_log_impl.cc:2084) '!_closed' flush on closed log - {offsets: {start_offset:850, committed_offset:849, committed_offset_term:-9223372036854775808, dirty_offset:849, dirty_offset_term:-9223372036854775808}, is_closed: true, segments: [{size: 0, []}], config: {ntp: {test.log_builder/OOvVPTBb/0}, base_dir: /home/willem/.cache/bazel/_bazel_willem/6fc2d9663f4e6fbf4046ab3d95678674/sandbox/linux-sandbox/12837/execroot/_main/_tmp/efa90c48b51a675ce0846aa066568b84/test.dir_8IlPIe, overrides: nullptr, revision: 0, topic_revision: -9223372036854775808, remote_revision: 0}} ``` Due to the call to `flush()` in `truncate_prefix()`. Wait for and break the `_segment_rewrite_lock()` in `disk_log_impl::close()` and `remove()` to avoid this issue.

…x_full_segments()` We will be calling `remove_segment_permanently()` on what is potentially the active segment in `remove_prefix_full_segments()`. If there is a concurrent append occuring, we may get into a race with the `disk_log_appender` since we return the units in `lock_holder` (i.e the segment's write lock) while the future for `remove_segment_permanently()` is still unresolved. In this case, we need to obtain units from `_segments_rolling_lock` to prevent any concurrency issues, which manifest as either a thrown exception due to gate closure, or an inability to add batches to a closed `segment`.

For use in a non-seastar thread context.

A test which validates potential races between `disk_log_appender`, a `truncate_prefix()` operation, and a `close()` operation in a concurrent context. Without the previous two commits, this test would occasionally demonstrate assert failures around operations on closed logs.

andrwng · 2025-11-13T19:31:44Z

src/v/storage/disk_log_impl.cc

      && !_eviction_monitor->promise.get_future().available()) {
        _eviction_monitor->promise.set_exception(segment_closed_exception());
    }
+


nit: not from this PR (maybe as a follow up?), but let's move the "closing log" line up top, before we do any awaits (I can imagine this being extra hard to debug without realizing we're even in this call). and maybe update the text here to indicate that we've waited for locks

Agreed, this is a good follow up

andrwng · 2025-11-13T19:58:44Z

src/v/storage/tests/storage_e2e_test.cc

+    }
+
+    ss::abort_source as;
+    mutex log_mutex{"e2e_test::log_mutex"};


I'm a little confused about the purpose of the mutex here -- is it meant to emulate a specific call pattern (like certain calls must not overlap), or is it just another layer of delaying the calls in funky ways?

Mutex here is just to prevent the append fiber from trying to make_appender on a closed log as well as avoiding a prefix truncation on a closed log (both of which vassert(), at least before your PR lands)

andrwng · 2025-11-13T21:46:36Z

src/v/storage/disk_log_impl.cc

+    // concurrency issues.
+    std::optional<ssx::semaphore_units> seg_rolling_units;
+    if (ptr->has_appender()) {
+        seg_rolling_units = co_await _segments_rolling_lock.get_units();


I think this is fine, because if we're racing with a close, the log will have still persisted the start offset before getting here, and subsequent usage should be quite limited given the log is closing.

vbotbuildovich · 2025-11-14T03:54:43Z

/backport v25.3.x

vbotbuildovich · 2025-11-14T03:54:44Z

/backport v25.2.x

WillemKauf · 2025-11-14T03:55:11Z

CI looks green, no new flakes or suspicious storage related regressions. Merging. Let's see how these changes do in CI before finalizing backports.

vbotbuildovich · 2025-11-14T03:55:51Z

Failed to create a backport PR to v25.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-28472-v25.2.x-121 remotes/upstream/v25.2.x
git cherry-pick -x f7ca2ec7da df9089acbf 485b24d8ad e31ae0bdca a0ae9625a2 799ff148b7

Workflow run logs.

WillemKauf requested review from andrwng, Copilot, dotnwat and nvartolomei November 11, 2025 15:48

github-actions bot added the area/redpanda label Nov 11, 2025

Copilot AI reviewed Nov 11, 2025

View reviewed changes

nvartolomei reviewed Nov 11, 2025

View reviewed changes

src/v/storage/probe.cc Outdated Show resolved Hide resolved

nvartolomei previously approved these changes Nov 11, 2025

View reviewed changes

WillemKauf dismissed nvartolomei’s stale review via 1770323 November 11, 2025 16:12

WillemKauf force-pushed the storage_batch_write_error_logging branch from cc7bb40 to 1770323 Compare November 11, 2025 16:12

WillemKauf force-pushed the storage_batch_write_error_logging branch from 1770323 to 2ef42a9 Compare November 11, 2025 17:39

WillemKauf changed the title ~~[CORE-14558] storage: log batch write errors at WARN iff shutdown~~ [CORE-14558] storage: wrap maybe_roll_unlocked in as_future() in disk_log_appender Nov 11, 2025

WillemKauf requested a review from nvartolomei November 11, 2025 17:41

nvartolomei reviewed Nov 11, 2025

View reviewed changes

WillemKauf added 2 commits November 12, 2025 13:30

storage: remove superfluous logging in probe

f7ca2ec

Instead, log batch append errors in the `disk_log_appender` at `ERROR`.

storage: check for closed segment in disk_log_appender

df9089a

A useful sanity check to prevent any potential concurrency issues.

WillemKauf force-pushed the storage_batch_write_error_logging branch from 2ef42a9 to 1f5b520 Compare November 12, 2025 18:52

WillemKauf changed the title ~~[CORE-14558] storage: wrap maybe_roll_unlocked in as_future() in disk_log_appender~~ [CORE-14558] storage: prevent race between disk_log_appender and remove_prefix_full_segments() Nov 12, 2025

WillemKauf changed the title ~~[CORE-14558] storage: prevent race between disk_log_appender and remove_prefix_full_segments()~~ [CORE-14558] storage: prevent race between disk_log_appender and remove_prefix_full_segments() Nov 12, 2025

WillemKauf force-pushed the storage_batch_write_error_logging branch from 1f5b520 to 9b01363 Compare November 12, 2025 19:06

WillemKauf force-pushed the storage_batch_write_error_logging branch 2 times, most recently from 8410e99 to 59dea3e Compare November 12, 2025 19:10

oleiman reviewed Nov 12, 2025

View reviewed changes

src/v/storage/disk_log_impl.cc Show resolved Hide resolved

WillemKauf force-pushed the storage_batch_write_error_logging branch from 59dea3e to 70fd3e4 Compare November 12, 2025 22:34

oleiman reviewed Nov 13, 2025

View reviewed changes

oleiman previously approved these changes Nov 13, 2025

View reviewed changes

WillemKauf requested a review from nvartolomei November 13, 2025 03:36

WillemKauf dismissed oleiman’s stale review via 6059c5b November 13, 2025 16:57

WillemKauf force-pushed the storage_batch_write_error_logging branch from 70fd3e4 to 6059c5b Compare November 13, 2025 16:57

WillemKauf added 3 commits November 13, 2025 11:58

storage: add append_single_batch_coro

a0ae962

For use in a non-seastar thread context.

WillemKauf force-pushed the storage_batch_write_error_logging branch from 6059c5b to 799ff14 Compare November 13, 2025 17:00

andrwng approved these changes Nov 13, 2025

View reviewed changes

WillemKauf mentioned this pull request Nov 13, 2025

storage: adjust log lines in disk_log_impl::close/remove() #28536

Merged

8 tasks

WillemKauf merged commit 013d951 into redpanda-data:dev Nov 14, 2025
23 checks passed

vbotbuildovich mentioned this pull request Nov 14, 2025

[v25.3.x] [CORE-14558] storage: prevent race between disk_log_appender and remove_prefix_full_segments() #28545

Merged

vbotbuildovich mentioned this pull request Nov 14, 2025

[v25.2.x] [CORE-14558] storage: prevent race between disk_log_appender and remove_prefix_full_segments() #28546

Open

WillemKauf mentioned this pull request Nov 14, 2025

[v25.2.x] [CORE-14558] storage: prevent race between disk_log_appender and remove_prefix_full_segments() (MANUAL BACKPORT) #28563

Closed

8 tasks

Conversation

WillemKauf commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New Cover Letter

Old Cover Letter

Backports Required

Release Notes

Bug Fixes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

nvartolomei left a comment

Choose a reason for hiding this comment

Uh oh!

nvartolomei commented Nov 11, 2025

Uh oh!

WillemKauf commented Nov 11, 2025

Uh oh!

WillemKauf commented Nov 11, 2025

Uh oh!

nvartolomei commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WillemKauf commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WillemKauf commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nvartolomei commented Nov 11, 2025

Uh oh!

WillemKauf commented Nov 11, 2025

Uh oh!

WillemKauf commented Nov 11, 2025

Uh oh!

nvartolomei left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vbotbuildovich commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI test results

Uh oh!

Uh oh!

WillemKauf commented Nov 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oleiman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

WillemKauf commented Nov 11, 2025 •

edited

Loading

nvartolomei commented Nov 11, 2025 •

edited

Loading

WillemKauf commented Nov 11, 2025 •

edited

Loading

WillemKauf commented Nov 11, 2025 •

edited

Loading

vbotbuildovich commented Nov 11, 2025 •

edited

Loading