Skip to content

Glue scan with filter throws list index out of range #1804

@Cabeda

Description

@Cabeda

Apache Iceberg version

0.9.0 (latest release)

Please describe the bug 🐞

Hi,

Not sure if this is a bug but worst case scenario this might be something for other to look up into in the future.

I've created a table like follows using pyiceberg

            schema = Schema(
                NestedField(field_id=1, name="bk_id", field_type=StringType(), required=False),
                NestedField(field_id=2, name="inference_date", field_type=TimestampType(), required=False),
                NestedField(field_id=3, name="verified", field_type=BooleanType(), required=False),
                NestedField(field_id=4, name="id", field_type=StringType(), required=True),
            )

I've been able to do multiple appends to the table using pyiceberg with no issues.

Now, to run some tests and prepare to use the new upsert operation, I decided do append a row with id = 'dummy_id', and then run a scan filtering by it. When I do the scan through AWS Athena I see the row, however, when doing the scan with dummy = table.scan(row_filter=EqualTo("id", 'dummy_id')) I get list index out of range. This seems to be because pyiceberg isn't able to retrieve the row.

Here is the code I have setup to replicate the issue:

from pyiceberg.expressions import EqualTo
import pyarrow as pa

df = pa.Table.from_pydict(
        {
            "bk_id": ["BK123456"],
            "inference_date": [pd.Timestamp.now()],
            "verified": [False],
            "id": ["dummy_id"],
        }
    )


catalog = load_catalog(
        "glue",
        **{
            "type": "glue",
            "warehouse": warehouse_path,
            "downcast-ns-timestamp-to-us-on-write": True,
        },
    )

table_identifier = "database_name.table_name"
table = catalog.load_table(table_identifier)


table.append(df)


dummy = table.scan(row_filter=EqualTo("id", 'dummy_id'))
dummy.to_arrow()

Is there something I'm doing wrong?

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions