Skip to content

CORE-15587 proto: allow large TLS certs in shadow link#29718

Merged
pgellert merged 2 commits intoredpanda-data:devfrom
pgellert:sl/allow-large-tls-strings
Mar 3, 2026
Merged

CORE-15587 proto: allow large TLS certs in shadow link#29718
pgellert merged 2 commits intoredpanda-data:devfrom
pgellert:sl/allow-large-tls-strings

Conversation

@pgellert
Copy link
Copy Markdown
Contributor

Shadow link TLS PEM fields (ca, key, cert) were limited to 128KiB
by proto serde's string size limit. Large CA bundles with multiple
certificates could exceed this, causing deserialization failures.

Add (redpanda.core.pbgen.iobuf) = true annotation to these fields
in TLSPEMSettings proto. Update shadow_link converter to use
iobuf_to_string() for conversions. This allows arbitrarily large
PEM data while maintaining sstring API compatibility.

Fixes https://redpandadata.atlassian.net/browse/CORE-15587

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x

Release Notes

Improvements

  • Shadow Linking: allow creating shadow links with TLS settings where the CA/key/cert strings exceed 128KiB.

@pgellert pgellert requested review from a team, bharathv and rockwotj February 26, 2026 15:13
@pgellert pgellert self-assigned this Feb 26, 2026
@pgellert pgellert requested review from Copilot and nguyen-andrew and removed request for a team February 26, 2026 15:13
The lookup_field() generator was missing .share() for string fields
with (redpanda.core.pbgen.iobuf) = true. Since iobuf is move-only,
this caused compile errors. Fix by checking isIOBuf(f) alongside the
existing BytesKind check.
@pgellert pgellert force-pushed the sl/allow-large-tls-strings branch from d42881c to e27e32b Compare February 26, 2026 15:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the 128KiB limitation on TLS PEM certificate fields in shadow link configurations by using iobuf-backed protobuf fields instead of regular strings. The change enables large CA bundles with multiple certificates to be used in shadow link TLS settings without deserialization failures.

Changes:

  • Added (redpanda.core.pbgen.iobuf) = true annotation to ca, key, and cert fields in TLSPEMSettings proto
  • Updated pbgen code generator to handle iobuf-backed string fields with .share() calls in lookup_field
  • Modified shadow_link converter to use iobuf_to_string() helper for converting iobuf PEM data to sstring

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
proto/redpanda/core/common/v1/tls.proto Added iobuf annotations to ca, key, and cert string fields in TLSPEMSettings
bazel/pbgen/main.go Extended lookup_field generation to call .share() for iobuf-backed string fields
src/v/serde/protobuf/tests/codegen_test.proto Added test message IOBufStringField to verify iobuf annotation codegen
src/v/serde/protobuf/tests/goldens/codegen_test.proto.h Generated header for IOBufStringField test message
src/v/serde/protobuf/tests/goldens/codegen_test.proto.cc Generated implementation with correct .share() usage in lookup_field
src/v/serde/protobuf/tests/round_trip_test.cc Added test verifying 150KiB iobuf-backed field roundtrips correctly
src/v/redpanda/admin/services/shadow_link/converter.cc Implemented iobuf_to_string() helper and updated all TLS PEM conversions
src/v/redpanda/admin/services/shadow_link/BUILD Added iobuf_parser dependency
src/v/redpanda/admin/services/shadow_link/tests/converter_test.cc Updated tests to use iobuf::from() for TLS PEM test data
Comments suppressed due to low confidence (1)

src/v/redpanda/admin/services/shadow_link/tests/converter_test.cc:471

  • The converter_test lacks coverage for the primary issue this PR addresses: TLS certificates exceeding 128KiB. While the protobuf layer is tested with a 150KiB string in round_trip_test.cc, there is no end-to-end test verifying that large TLS PEM data (ca, key, cert) can be successfully converted from proto to metadata and back. Consider adding a test case that exercises the iobuf_to_string conversion with TLS data exceeding the previous 128KiB limit to ensure the complete flow works correctly.
TEST(converter_test, create_with_tls_value) {
    const auto name = "test-link";
    const auto ca = "ca";
    const auto key = "key";
    const auto cert = "cert";
    proto::admin::shadow_link shadow_link;
    proto::admin::create_shadow_link_request req;
    proto::admin::shadow_link_configurations shadow_link_configurations;
    proto::admin::shadow_link_client_options shadow_link_client_options;
    proto::common::tls_settings tls_settings;
    proto::common::tlspem_settings tls_pem_settings;
    tls_pem_settings.set_ca(iobuf::from(ca));
    tls_pem_settings.set_key(iobuf::from(key));
    tls_pem_settings.set_cert(iobuf::from(cert));
    tls_settings.set_tls_pem_settings(std::move(tls_pem_settings));

    shadow_link_client_options.set_tls_settings(std::move(tls_settings));
    shadow_link_client_options.set_bootstrap_servers({"localhost:9092"});
    shadow_link_configurations.set_client_options(
      std::move(shadow_link_client_options));

    shadow_link.set_configurations(std::move(shadow_link_configurations));
    shadow_link.set_name(ss::sstring{name});
    req.set_shadow_link(std::move(shadow_link));

    auto md = admin::convert_create_to_metadata(std::move(req));
    ASSERT_TRUE(md.connection.ca.has_value());
    ASSERT_TRUE(
      std::holds_alternative<cluster_link::model::tls_value>(
        md.connection.ca.value()));
    EXPECT_EQ(
      std::get<cluster_link::model::tls_value>(md.connection.ca.value()), ca);
    ASSERT_TRUE(md.connection.key.has_value());
    ASSERT_TRUE(
      std::holds_alternative<cluster_link::model::tls_value>(
        md.connection.key.value()));
    EXPECT_EQ(
      std::get<cluster_link::model::tls_value>(md.connection.key.value()), key);
    ASSERT_TRUE(md.connection.cert.has_value());
    ASSERT_TRUE(
      std::holds_alternative<cluster_link::model::tls_value>(
        md.connection.cert.value()));
    EXPECT_EQ(
      std::get<cluster_link::model::tls_value>(md.connection.cert.value()),
      cert);
}

Comment on lines +62 to +72
/// Converts an iobuf to ss::sstring for TLS PEM data.
///
/// Note: This creates a contiguous allocation which may exceed 128KiB for large
/// CA bundles. While large allocations are generally discouraged in Seastar,
/// this is acceptable here because the Seastar TLS layer (set_x509_key, etc.)
/// requires linearized certificate data anyway.
ss::sstring iobuf_to_string(const iobuf& buf) {
auto sz = buf.size_bytes();
iobuf_const_parser p(buf);
return p.read_string(sz);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of crashing, can we temporarily disable the crash on oversized alloc behavior in our allocator and instead catch and rethrow the bad_alloc as an rpc error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, I've added this now

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider making this a method in iobuf proper, but otherwise this LGTM

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

CI test results

test results on build#81144
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
TopicRecreateTest test_cloud_topic_recreation_while_producing null integration https://buildkite.com/redpanda/redpanda/builds/81144#019c9a95-7c67-48ff-9918-2be1c36b6883 FLAKY 18/21 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0499, p0=0.2631, reject_threshold=0.0100. adj_baseline=0.1422, p1=0.4436, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TopicRecreateTest&test_method=test_cloud_topic_recreation_while_producing

@pgellert pgellert force-pushed the sl/allow-large-tls-strings branch from e27e32b to 289ff8e Compare March 2, 2026 10:18
@pgellert
Copy link
Copy Markdown
Contributor Author

pgellert commented Mar 2, 2026

force-push: address PR feedback (throw on bad alloc)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (4)

src/v/redpanda/admin/services/shadow_link/converter.cc:92

  • iobuf_to_string can throw serde::pb::rpc::resource_exhausted_exception, which will bypass the std::invalid_argument catch-and-wrap in convert_create_to_metadata. If callers/documentation assume only invalid_argument_exception from conversion, this introduces a new exception contract. Consider updating the conversion API docs/handling to reflect/translate this error consistently.
    } catch (const std::bad_alloc&) {
        throw serde::pb::rpc::resource_exhausted_exception(
          ssx::sformat(
            "TLS certificate data too large to linearize ({} bytes)", sz));
    }

src/v/redpanda/admin/services/shadow_link/tests/converter_test.cc:440

  • These tests cover TLS PEM handling but still only exercise very small CA/key/cert payloads. Since the regression here is specifically about values exceeding the 128KiB string allocation limit, adding a test that constructs a CA/key/cert >128KiB (and verifies conversion succeeds) would better lock in the intended behavior for shadow link creation.
TEST(converter_test, create_with_tls_value) {
    const auto name = "test-link";
    const auto ca = "ca";
    const auto key = "key";
    const auto cert = "cert";
    proto::admin::shadow_link shadow_link;
    proto::admin::create_shadow_link_request req;
    proto::admin::shadow_link_configurations shadow_link_configurations;
    proto::admin::shadow_link_client_options shadow_link_client_options;
    proto::common::tls_settings tls_settings;
    proto::common::tlspem_settings tls_pem_settings;
    tls_pem_settings.set_ca(iobuf::from(ca));
    tls_pem_settings.set_key(iobuf::from(key));
    tls_pem_settings.set_cert(iobuf::from(cert));
    tls_settings.set_tls_pem_settings(std::move(tls_pem_settings));

proto/redpanda/core/common/v1/tls.proto:64

  • By making ca/cert iobuf-backed, these fields can now be much larger than before. Code paths that log full request objects (e.g., create_shadow_link: {} formatting) may end up emitting very large log lines containing full PEM material. Consider adding debug_redact (or another truncation/redaction mechanism) for these fields, or ensuring request logging avoids printing the raw PEM payloads.
    // The CA
    string ca = 1 [(redpanda.core.pbgen.iobuf) = true];
    // Key and Cert are optional but if one is provided, then both must be
    // The key
    string key = 2 [
        (google.api.field_behavior) = INPUT_ONLY,
        debug_redact = true,
        (redpanda.core.pbgen.iobuf) = true
    ];
    // The SHA-256 of the key, in base64 format
    string key_fingerprint = 3 [(google.api.field_behavior) = OUTPUT_ONLY];
    // The cert
    string cert = 4 [(redpanda.core.pbgen.iobuf) = true];

src/v/redpanda/admin/services/shadow_link/converter.cc:83

  • iobuf_to_string temporarily flips Seastar's global (per-shard) abort-on-allocation-failure flag. Because other fibers on the same shard may interleave while this flag is disabled, unrelated allocations could start throwing std::bad_alloc instead of aborting, changing failure mode outside this conversion. Consider avoiding global state changes (e.g., fail fast on an explicit size cap / preflight, or introduce a scoped helper in Seastar/ssx that restores the previous value without affecting other tasks).
    // Temporarily disable crash-on-allocation-failure so we can convert
    // std::bad_alloc to an RPC error instead of crashing the broker.
    ss::memory::set_abort_on_allocation_failure(false);
    auto guard = ss::defer([]() {
        ss::memory::set_abort_on_allocation_failure(
          config::shard_local_cfg().memory_abort_on_alloc_failure());
    });

Shadow link TLS PEM fields (ca, key, cert) were limited to 128KiB
by proto serde's string size limit. Large CA bundles with multiple
certificates could exceed this, causing deserialization failures.

Add (redpanda.core.pbgen.iobuf) = true annotation to these fields
in TLSPEMSettings proto. Update shadow_link converter to use
iobuf_to_string() for conversions. This allows arbitrarily large
PEM data while maintaining sstring API compatibility.
@pgellert pgellert force-pushed the sl/allow-large-tls-strings branch from 289ff8e to 34090ee Compare March 2, 2026 11:04
@pgellert
Copy link
Copy Markdown
Contributor Author

pgellert commented Mar 2, 2026

force-push: fix clang tidy failure by switching to scoped_system_alloc_fallback

@pgellert pgellert merged commit b5a57c4 into redpanda-data:dev Mar 3, 2026
28 checks passed
@vbotbuildovich
Copy link
Copy Markdown
Collaborator

/backport v25.3.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants