Skip to content

Conversation

@aihuaxu
Copy link
Contributor

@aihuaxu aihuaxu commented Jul 30, 2025

This is generated from apache/iceberg#13708 which includes the following changes on top of #90:

  • Add variant logical type
  • Update the decimal test data to align with the spec. E.g., the value with precision in [1,9] uses decimal, so 123456.7890 should uses decimal8 instead of decimal4. Update to use 12345.6789 for decimal4 test.
  • Mark some tests as invalid since they don't align with the spec and the readers can choose different implementation.

I have tested from Parquet-Java against this data (https://github.com/apache/parquet-java/pull/3258/files).
Here shows an example of the schema with variant logical type annotated.

Schema:
message table {
  required int32 id = 1;
  optional group var (VARIANT(1)) = 2 {
    required binary metadata;
    optional binary value;
    optional group typed_value (LIST) {
      repeated group list {
        required group element {
          optional binary value;
          optional binary typed_value (STRING);
        }
      }
    }
  }
}

@aihuaxu
Copy link
Contributor Author

aihuaxu commented Jul 30, 2025

cc @rdblue, @zeroshade. This is the same data set but I have included parquet logical type in the schema.

@xxubai
Copy link
Contributor

xxubai commented Aug 1, 2025

Do we have any python script to generate shredding variants

@aihuaxu
Copy link
Contributor Author

aihuaxu commented Aug 2, 2025

Do we have any python script to generate shredding variants

We don't have right now. We generated from Iceberg test cases.

@xxubai
Copy link
Contributor

xxubai commented Aug 2, 2025

We don't have right now. We generated from Iceberg test cases.

OK. Anyway, this is very helpful for shredding variant testing in other projects.

@aihuaxu aihuaxu force-pushed the variant-shared-testing branch 2 times, most recently from cbb19c1 to 560a76e Compare August 14, 2025 20:55
@zeroshade
Copy link
Member

@aihuaxu You updated the descriptions in shredded_variant/cases.json but the files that actually have the variant logical type in them are in data/shredded_variant/ can we consolidate to either one directory or the other?

@aihuaxu aihuaxu force-pushed the variant-shared-testing branch 2 times, most recently from 0af1223 to 0f5bf3b Compare August 15, 2025 03:53
@aihuaxu aihuaxu force-pushed the variant-shared-testing branch from 0f5bf3b to f422987 Compare August 15, 2025 04:17
@alamb
Copy link
Contributor

alamb commented Aug 15, 2025

My understanding is that this PR would replace #90 , is that correct?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @aihuaxu

I can't test with these values as we are still working on the shredding implementation in arrow-rs (see apache/arrow-rs#8150 and related tickets) but they look reasonable to me

I defer to @zeroshade to check with the Go implementation

cc @carpecodeum and @mprammer and @scovich

"parquet_file" : "case-127.parquet",
"error_message" : "Unsupported shredded value type: INTEGER(32,false)"
}, {
"case_number" : 128,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message looks clear to me

@aihuaxu
Copy link
Contributor Author

aihuaxu commented Aug 15, 2025

My understanding is that this PR would replace #90 , is that correct?

That is correct. I build on top of #90 to add variant logical type and made small changes (update decimal test cases and change test case with INVALID).

@aihuaxu
Copy link
Contributor Author

aihuaxu commented Aug 15, 2025

@zeroshade Can you check with GO implementation? I have tested with Java implementation and it works fine.

@zeroshade
Copy link
Member

Hey @aihuaxu I've filed #94 with the files generated by Go, also fixing the issues you identified.

@aihuaxu
Copy link
Contributor Author

aihuaxu commented Aug 18, 2025

Hey @aihuaxu I've filed #94 with the files generated by Go, also fixing the issues you identified.

@zeroshade I have verified from parquet-java and it works. Thanks.

@aihuaxu
Copy link
Contributor Author

aihuaxu commented Aug 18, 2025

@zeroshade and @zeroshade can we merge this PR? Seems we have done enough cross-language check.

@carpecodeum
Copy link

arrow-rs is also going ahead with these files as the core test cases - apache/arrow-rs#8104

@alamb
Copy link
Contributor

alamb commented Aug 18, 2025

@zeroshade and @zeroshade can we merge this PR? Seems we have done enough cross-language check.

I think it is a good idea. However, I suspect that @zeroshade may not have the permissions to do so. I think it will require someone from this group (@emkornfield @julienledem @rdblue etc)

https://people.apache.org/phonebook.html?unix=parquet

@wgtmac
Copy link
Member

wgtmac commented Aug 21, 2025

Thanks all for validation! Let me merge this.

@wgtmac wgtmac merged commit a3d96a6 into apache:master Aug 21, 2025
@alamb
Copy link
Contributor

alamb commented Aug 21, 2025

Thank you @wgtmac and @aihuaxu and @zeroshade

alamb added a commit to apache/arrow-rs that referenced this pull request Sep 16, 2025
…ing` data (#8325)

# Which issue does this PR close?

We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.

- Closes #8084

# Rationale for this change

Now that we have merged the upstream parquet-variant tests:
- apache/parquet-testing#91

We can test how far we are from the rust variant implementation working
for all the values

This PR updates the test harness added
#8104 by @carpecodeum to use the
final parquet files and the currnet APIs

# What changes are included in this PR?

1. Update parquet-testing pin
2. Update the test harness to use the standard rust test runner
(`#[test]`) rather than a custom main function
3. Added links to follow on tickets

You can run this test manually like this:

```shell
cargo test --all-features --test variant_integration

...
running 138 tests
test test_variant_integration_case_106 ... ok
test test_variant_integration_case_107 ... ok
test test_variant_integration_case_109 ... ok
test test_variant_integration_case_110 ... ok
..
test test_variant_integration_case_90 ... ok
test test_variant_integration_case_91 ... ok
test test_variant_integration_case_93 ... ok
test test_variant_integration_case_83 - should panic ... ok
test test_variant_integration_case_84 - should panic ... ok
```

# Are these changes tested?
Yes this is all tests

# Are there any user-facing changes?
No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants