Skip to content

[v25.3.x] CORE-14479 schema_registry: fix avro external reference collection#29627

Merged
pgellert merged 5 commits intoredpanda-data:v25.3.xfrom
pgellert:backport/avro-ref-fix
Feb 26, 2026
Merged

[v25.3.x] CORE-14479 schema_registry: fix avro external reference collection#29627
pgellert merged 5 commits intoredpanda-data:v25.3.xfrom
pgellert:backport/avro-ref-fix

Conversation

@pgellert
Copy link
Copy Markdown
Contributor

@pgellert pgellert commented Feb 18, 2026

Manual backport of #28780

See the commit messages for details on what is changing compared to the original PR. The main difference is that now the new (potentially breaking-change) behaviour is hidden behind a default-disable cluster config schema_registry_avro_use_named_references to enable the safe backporting of this bug fix.

Output of git range-diff f579884~4..f579884 HEAD~5..HEAD to show the diff in this commit range: https://gist.github.com/pgellert/84c33796b3c02fbeea1a11fbe06f4e48

Relates to #29611

Fixes CORE-15528

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x

Release Notes

Bug Fixes

  • Schema Registry: Fixes a bug in Avro schema compilation where external references were not properly resolved from the schema store, causing compilation failures, compatibility check failures or server-side schema id validation failures for schemas with reference dependencies. For v25.3.x patch versions, the bug fix is only enabled if the new cluster config schema_registry_avro_use_named_references is enabled, while starting v26.1.1 it the new behaviour is always enabled.

(cherry picked from commit 8c5b546)

Manual backport changes:
- update bazel lockfile for v25.3
For consistency across the codebase, use relative references for bazel
dependencies that refer to targets that are in the same module.

(cherry picked from commit 38af379)
Various tests require a minimal store fixture for testing, so extract a
shared implementation that can be reused in existing tests and in the
one I am about to introduce.

(cherry picked from commit 7b9bf85)
Previously, avro external references were not handled correctly during
schema processing. The old implementation attempted to build schemas by
concatenating them, which resulted in incorrect parsing - for example, a
schema A that references B would be parsed as schema B (the first schema
in the concatenated result).

This commit fixes the reference collection logic to properly traverse
and collect all external schema references, and uses
compileJsonSchemaWithNamedReferences to parse schemas with named
references correctly.

As a result:

- Avro schemas with external references are now compiled correctly. This
  affects compatibility checks and server-side schema ID validation.
- The schema definition returned by the schema registry API endpoints is
  unchanged.
- The way schemas are serialized onto the schemas topic is unchanged.

Manual backport change:

- The fix is gated by the schema_registry_avro_use_named_references
  config which defaults to false, preserving existing behavior. When
enabled, schemas with external references are compiled using
compileJsonSchemaWithNamedReferences instead of schema concatenation,
which fixes compatibility checks and schema validation issues.
- Adds a test for the above gated behaviour.

(cherry picked from commit f579884)
@vbotbuildovich
Copy link
Copy Markdown
Collaborator

vbotbuildovich commented Feb 18, 2026

CI test results

test results on build#80655
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
invalid_describe_configs_test bad_describe_config_response unit https://buildkite.com/redpanda/redpanda/builds/80655#019c7101-212b-4359-8173-5e34b06e7f96 FAIL 0/1
test results on build#80772
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
ClusterLinkingOMBTest test_omb null integration https://buildkite.com/redpanda/redpanda/builds/80772#019c76b8-18d8-4e82-bb01-57d0a2de9214 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ClusterLinkingOMBTest&test_method=test_omb
ClusterLinkingOMBTest test_omb null integration https://buildkite.com/redpanda/redpanda/builds/80772#019c76c9-3bc5-4a55-b5b1-916ee055fdb0 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ClusterLinkingOMBTest&test_method=test_omb
DatalakeOMBTest basic_workload_linear_20_test {"cloud_storage_type": 1} integration https://buildkite.com/redpanda/redpanda/builds/80772#019c76b8-18da-47c7-ae9d-abbb911a1650 FAIL 0/1 https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=DatalakeOMBTest&test_method=basic_workload_linear_20_test
DatalakeOMBTest basic_workload_linear_20_test {"cloud_storage_type": 1} integration https://buildkite.com/redpanda/redpanda/builds/80772#019c76c9-3bc7-4925-9b30-010841f1bbca FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=DatalakeOMBTest&test_method=basic_workload_linear_20_test
PartitionReassignmentsTest test_reassignments_cancel null integration https://buildkite.com/redpanda/redpanda/builds/80772#019c76c9-3bc4-4850-91c9-4b436198b430 FLAKY 15/21 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0597, p0=0.0055, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=PartitionReassignmentsTest&test_method=test_reassignments_cancel
OpenBenchmarkSelfTest test_default_omb_configuration {"driver": "SIMPLE_DRIVER", "workload": "SIMPLE_WORKLOAD"} integration https://buildkite.com/redpanda/redpanda/builds/80772#019c76b8-18d5-4f0f-9a24-38bc562f3d52 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=OpenBenchmarkSelfTest&test_method=test_default_omb_configuration
test results on build#81128
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
TestReadReplicaService test_writes_forbidden {"cloud_storage_type": 2, "partition_count": 10} integration https://buildkite.com/redpanda/redpanda/builds/81128#019c99af-391d-4277-b4ca-c365f60b34af FLAKY 8/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TestReadReplicaService&test_method=test_writes_forbidden
UpgradeAndCheckRecoveryReads test_basic_upgrade null integration https://buildkite.com/redpanda/redpanda/builds/81128#019c99af-391a-449e-98bd-a87a4309121e FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=UpgradeAndCheckRecoveryReads&test_method=test_basic_upgrade

Comment on lines +384 to +386
// Explicitly ensure the legacy behavior is active (default is false)
config::shard_local_cfg()
.schema_registry_avro_use_named_references.reset();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it would be nice to explicitly set it to false, or reset it + assert/require that the default is false

// schema in the concatenation (Address), not from Person.
auto compiled = register_schema(
subject("PersonSubject"), referencing_schema, schema_version{1});
EXPECT_EQ(compiled.name(), "com.example.Address");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can add const auto ref_name = "com.example.Address"; replace this and line 405 with ref_name.

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

vbotbuildovich commented Feb 19, 2026

Retry command for Build#80655

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/cluster_linking_omb_test.py::ClusterLinkingOMBTest.test_omb
tests/rptest/tests/datalake/datalake_omb_test.py::DatalakeOMBTest.basic_workload_linear_20_test@{"cloud_storage_type":1}
tests/rptest/tests/services_self_test.py::OpenBenchmarkSelfTest.test_default_omb_configuration@{"driver":"SIMPLE_DRIVER","workload":"SIMPLE_WORKLOAD"}

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

vbotbuildovich commented Feb 19, 2026

Retry command for Build#80772

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/services_self_test.py::OpenBenchmarkSelfTest.test_default_omb_configuration@{"driver":"SIMPLE_DRIVER","workload":"SIMPLE_WORKLOAD"}
tests/rptest/tests/cluster_linking_omb_test.py::ClusterLinkingOMBTest.test_omb
tests/rptest/tests/datalake/datalake_omb_test.py::DatalakeOMBTest.basic_workload_linear_20_test@{"cloud_storage_type":1}
tests/rptest/tests/partition_reassignments_test.py::PartitionReassignmentsTest.test_reassignments_cancel

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

Retry command for Build#81128

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/read_replica_e2e_test.py::TestReadReplicaService.test_writes_forbidden@{"cloud_storage_type":2,"partition_count":10}

@pgellert
Copy link
Copy Markdown
Contributor Author

/ci-repeat 1
skip-redpanda-build
skip-units
tests/rptest/tests/read_replica_e2e_test.py::TestReadReplicaService.test_writes_forbidden@{"cloud_storage_type":2,"partition_count":10}

@pgellert pgellert merged commit b6e0536 into redpanda-data:v25.3.x Feb 26, 2026
19 of 21 checks passed
@tyson-redpanda tyson-redpanda modified the milestones: v25.3.x-next, v25.3.10 Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants