Skip to content

iceberg: a couple fixes found working with Glue + Redshift#27188

Merged
andrwng merged 2 commits intoredpanda-data:devfrom
andrwng:iceberg-glue-redshift
Aug 11, 2025
Merged

iceberg: a couple fixes found working with Glue + Redshift#27188
andrwng merged 2 commits intoredpanda-data:devfrom
andrwng:iceberg-glue-redshift

Conversation

@andrwng
Copy link
Copy Markdown
Contributor

@andrwng andrwng commented Aug 8, 2025

See individual commits.

We will need follow-up work to allow for topic deletion to not require manual intervention. This at least allows a cluster to become functional after the Iceberg topic has already been deleted.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.2.x
  • v25.1.x
  • v24.3.x

Release Notes

Bug Fixes

  • Redpanda now serializes Iceberg manifest lists differently, to allow certain engines (e.g. AWS Redshift) to query the Iceberg tables when using the empty partition spec.

andrwng added 2 commits August 8, 2025 14:48
Glue doesn't support the purge option when dropping tables, returning a
400 error immediately, even after the underlying table has been dropped:

```
WARN  2025-08-08 19:00:21,698 [shard 0:main] iceberg - catalog_client.cc:367 - [/iceberg/v1/catalogs/12345/namespaces/redpanda/tables/table?purgeRequested=true] error: http_call_error: Bad Request, message: '{"error":{"code":400,"message":"PurgeRequested cannot be true for Glue iceberg tables.","type":"InvalidInputException"}}'
```

When in this state, translation for this topic and all other topics of
the same name is stuck indefinitely, because when
redpanda.iceberg.delete is true, we will try to drop the table of older
versions of the topic before allowing creation of the new table.

This commit adds a stop-gap to at least be able to get unstuck by
manually deleting the table. When dropping, we'll first check if the
dropped table has been deleted already.
Redshift complains when using an null 'partitions' field in the manifest
list:

```
ERROR: Wrong type in Avro file. Detail: ----------------------------------------------- error: Wrong type in Avro file. code: 15003 context: Field: partitions. Expected: 12. Got: 7 query: -1[child_sequence:1] location: avro_utils.hpp:55 process: padbmaster [pid=1073913955] ----------------------------------------------- [ErrorId: 1-68963282-20644b700a198f806702f6aa]
```

This fix has Redpanda serialize empty 'partitions' as an empty array.
Tested manually against Glue and Redshift.
@andrwng andrwng enabled auto-merge August 8, 2025 23:22
@vbotbuildovich
Copy link
Copy Markdown
Collaborator

vbotbuildovich commented Aug 8, 2025

Retry command for Build#70487

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/self_test_test.py::SelfTestTest.test_self_test_unknown_test_type
tests/rptest/tests/raft_availability_test.py::RaftAvailabilityTest.test_leader_transfers_recovery@{"acks":-1}
tests/rptest/tests/leadership_transfer_test.py::MultiTopicAutomaticLeadershipBalancingTest.test_topic_aware_rebalance
tests/rptest/tests/datalake/disk_budget_test.py::DatalakeDiskUsageTest.test_idle_finish@{"cloud_storage_type":1,"concurrent_translations":4,"num_partitions":10}
tests/rptest/tests/datalake/disk_budget_test.py::DatalakeDiskUsageTest.test_idle_finish@{"cloud_storage_type":1,"concurrent_translations":4,"num_partitions":40}
tests/rptest/tests/workload_upgrade_runner_test.py::RedpandaUpgradeTest.test_workloads_through_releases@{"cloud_storage_type":1}

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

CI test results

test results on build#70487
test_class test_method test_arguments test_kind job_url test_status passed reason
FeaturesMultiNodeTest test_license_upload_and_query null integration https://buildkite.com/redpanda/redpanda/builds/70487#01988bc7-f67c-4693-9c49-a8c7483400e5 FLAKY 16/21 upstream reliability is '100.0'. current run reliability is '76.19047619047619'. drift is 23.80952 and the allowed drift is set to 50. The test should PASS
DataMigrationsApiTest test_creating_and_listing_migrations null integration https://buildkite.com/redpanda/redpanda/builds/70487#01988bca-0d3d-4ce6-b416-49516a5809c3 FLAKY 20/21 upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS
DatalakeDiskUsageTest test_idle_finish {"cloud_storage_type": 1, "concurrent_translations": 4, "num_partitions": 10} integration https://buildkite.com/redpanda/redpanda/builds/70487#01988bc7-f67b-4c7c-bf7b-4d8e6e99cb02 FAIL 0/21 The test has failed across all retries
DatalakeDiskUsageTest test_idle_finish {"cloud_storage_type": 1, "concurrent_translations": 4, "num_partitions": 40} integration https://buildkite.com/redpanda/redpanda/builds/70487#01988bc7-f67c-4df9-973a-1a406255bf38 FAIL 0/21 The test has failed across all retries
MultiTopicAutomaticLeadershipBalancingTest test_topic_aware_rebalance null integration https://buildkite.com/redpanda/redpanda/builds/70487#01988bc7-f679-4856-8e6e-85a57088231b FAIL 0/21 The test has failed across all retries
RaftAvailabilityTest test_leader_transfers_recovery {"acks": -1} integration https://buildkite.com/redpanda/redpanda/builds/70487#01988bc7-f67e-4aa4-8899-42812a4f0ec9 FAIL 0/21 The test has failed across all retries
RaftAvailabilityTest test_leader_transfers_recovery {"acks": -1} integration https://buildkite.com/redpanda/redpanda/builds/70487#01988bca-0d3d-4fc7-8f45-106fade07389 FAIL 0/21 The test has failed across all retries
SelfTestTest test_self_test_unknown_test_type null integration https://buildkite.com/redpanda/redpanda/builds/70487#01988bc7-f67e-4aa4-8899-42812a4f0ec9 FAIL 0/21 The test has failed across all retries
SelfTestTest test_self_test_unknown_test_type null integration https://buildkite.com/redpanda/redpanda/builds/70487#01988bca-0d39-4b36-9294-602bd0bc20fd FAIL 0/21 The test has failed across all retries
RedpandaUpgradeTest test_workloads_through_releases {"cloud_storage_type": 1} integration https://buildkite.com/redpanda/redpanda/builds/70487#01988bc7-f679-4aae-983f-c7b99ca710bd FAIL 0/21 The test has failed across all retries

@andrwng andrwng merged commit c682953 into redpanda-data:dev Aug 11, 2025
19 checks passed
@vbotbuildovich
Copy link
Copy Markdown
Collaborator

/backport v25.2.x

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

/backport v25.1.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants