Apache Iceberg version
0.9.0 (latest release)
Please describe the bug 🐞
Hi Everyone!
I've encountered memory leak while using pyiceberg on Ubuntu aarch64 slim-container. There is no aarch64 wheel available so pyiceberg/avro/decoder_fast.pyi is not built thus pyiceberg is reverting to native implementation. I'm getting the following warning.
/app/.venv/lib/python3.12/site-packages/pyiceberg/avro/decoder.py:185: UserWarning: Falling back to pure Python Avro decoder, missing Cython implementation
warnings.warn("Falling back to pure Python Avro decoder, missing Cython implementation")
First, I thought StreamingBinaryDecoder (fallback for decoder_fast) will be the culprit but the issue persits in amd64 image where decoder_fast is compiled.
I'm writing pyarrow tables to AWS Glue repeatedly and it crashes with OOM after a couple of writes in the container. Each write leaves ~200mb residue in memory after deletion of the Iceberg Table object.
It works on macOS fine, storage is getting freed.
iceberg_table = self.glue_catalog.load_table(
(self.database, self.table_name)
)
iceberg_table.append(pa_table)
del iceberg_table
Tested in image: ghcr.io/astral-sh/uv:python3.12-bookworm-slim
Test envs: M1 macBook natively, M1 macBook Podman amd64 emulation, AWS t3.small Podman
Built: Podman on M1macBook
I'm trying to find out the reason for the leak.
Willingness to contribute
Apache Iceberg version
0.9.0 (latest release)
Please describe the bug 🐞
Hi Everyone!
I've encountered memory leak while using pyiceberg on Ubuntu aarch64 slim-container. There is no aarch64 wheel available so pyiceberg/avro/decoder_fast.pyi is not built thus pyiceberg is reverting to native implementation. I'm getting the following warning.
First, I thought StreamingBinaryDecoder (fallback for decoder_fast) will be the culprit but the issue persits in amd64 image where decoder_fast is compiled.
I'm writing pyarrow tables to AWS Glue repeatedly and it crashes with OOM after a couple of writes in the container. Each write leaves ~200mb residue in memory after deletion of the Iceberg Table object.
It works on macOS fine, storage is getting freed.
Tested in image: ghcr.io/astral-sh/uv:python3.12-bookworm-slim
Test envs: M1 macBook natively, M1 macBook Podman amd64 emulation, AWS t3.small Podman
Built: Podman on M1macBook
I'm trying to find out the reason for the leak.
Willingness to contribute