Apache Iceberg version
0.9.0 (latest release)
Please describe the bug 🐞
Hi,
Not sure if this is a bug but worst case scenario this might be something for other to look up into in the future.
I've created a table like follows using pyiceberg
schema = Schema(
NestedField(field_id=1, name="bk_id", field_type=StringType(), required=False),
NestedField(field_id=2, name="inference_date", field_type=TimestampType(), required=False),
NestedField(field_id=3, name="verified", field_type=BooleanType(), required=False),
NestedField(field_id=4, name="id", field_type=StringType(), required=True),
)
I've been able to do multiple appends to the table using pyiceberg with no issues.
Now, to run some tests and prepare to use the new upsert operation, I decided do append a row with id = 'dummy_id', and then run a scan filtering by it. When I do the scan through AWS Athena I see the row, however, when doing the scan with dummy = table.scan(row_filter=EqualTo("id", 'dummy_id')) I get list index out of range. This seems to be because pyiceberg isn't able to retrieve the row.
Here is the code I have setup to replicate the issue:
from pyiceberg.expressions import EqualTo
import pyarrow as pa
df = pa.Table.from_pydict(
{
"bk_id": ["BK123456"],
"inference_date": [pd.Timestamp.now()],
"verified": [False],
"id": ["dummy_id"],
}
)
catalog = load_catalog(
"glue",
**{
"type": "glue",
"warehouse": warehouse_path,
"downcast-ns-timestamp-to-us-on-write": True,
},
)
table_identifier = "database_name.table_name"
table = catalog.load_table(table_identifier)
table.append(df)
dummy = table.scan(row_filter=EqualTo("id", 'dummy_id'))
dummy.to_arrow()
Is there something I'm doing wrong?
Willingness to contribute
Apache Iceberg version
0.9.0 (latest release)
Please describe the bug 🐞
Hi,
Not sure if this is a bug but worst case scenario this might be something for other to look up into in the future.
I've created a table like follows using pyiceberg
I've been able to do multiple appends to the table using pyiceberg with no issues.
Now, to run some tests and prepare to use the new upsert operation, I decided do append a row with id = 'dummy_id', and then run a scan filtering by it. When I do the scan through AWS Athena I see the row, however, when doing the scan with
dummy = table.scan(row_filter=EqualTo("id", 'dummy_id'))I getlist index out of range. This seems to be because pyiceberg isn't able to retrieve the row.Here is the code I have setup to replicate the issue:
Is there something I'm doing wrong?
Willingness to contribute