Skip to content

Conversation

@aihuaxu
Copy link
Contributor

@aihuaxu aihuaxu commented Aug 3, 2025

Rationale for this change

From the test in apache/arrow-go#455 which validates the cross-language variant implementation, we may need to update a few wording in the spec to make it clear

What changes are included in this PR?

  • In variant array, either value and typed_value fields can be omitted but not both in array elements.

Do these changes have PoC implementations?

@aihuaxu aihuaxu requested a review from rdblue August 5, 2025 16:30
@aihuaxu aihuaxu force-pushed the clarify-variant-shredding-spec branch from b1ad692 to 28e8dc2 Compare August 5, 2025 19:11
The element's `value` field stores the element as Variant-encoded `binary` when the `typed_value` is not present or cannot represent it.
The `typed_value` field may be omitted when not shredding elements as a specific type.
When `typed_value` is omitted, `value` must be `required`.
The `value` field may be omitted when shredding elements as a specific type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine to say that one of the fields must be present and that value can be omitted (though it is not necessary).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also clarify whether they can/should be optional vs required or either?

Copy link
Contributor Author

@aihuaxu aihuaxu Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing. In line 68, we have the statement

Both value and typed_value are optional fields used together to encode a single value.

to serve the overall value shredding. I think we don't need to mention in different places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means the test cases which test with the fields being required should also get notes that they are invalid based on the spec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have such case in the tests. The case we have is: the group should be required but it's passed in as optional. For example:

required group event_type {
      optional binary value;
      optional binary typed_value (STRING);
    }

@rdblue
Copy link
Contributor

rdblue commented Aug 15, 2025

Thanks for the review, @zeroshade! I'll merge this.

@rdblue rdblue merged commit 3c7b130 into apache:master Aug 15, 2025
jiayuasu pushed a commit to jiayuasu/parquet-format that referenced this pull request Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants