Skip to content

GH-49746: [Python] Add row_splits/offsets methods for VariableShapeTensorArray#49747

Draft
rok wants to merge 1 commit intoapache:mainfrom
rok:38007_python_row_splits
Draft

GH-49746: [Python] Add row_splits/offsets methods for VariableShapeTensorArray#49747
rok wants to merge 1 commit intoapache:mainfrom
rok:38007_python_row_splits

Conversation

@rok
Copy link
Copy Markdown
Member

@rok rok commented Apr 14, 2026

Rationale for this change

VariableShapeTensorArray stores ragged tensor data as a StructArray of {data: list, shape: fixed_size_list[ndim]}. Users coming from ML frameworks (notably TensorFlow RaggedTensor) might have their data already laid out as a flat values buffer plus row boundaries (row_splits), and want to construct/inspect a VariableShapeTensorArray directly in that form without materializing a Python-level list of ndarrays.

What changes are included in this PR?

Add four methods such as from_row_splits, from_offsets, to_row_splits, to_offsets

Are these changes tested?

Python tests are included.

Are there any user-facing changes?

Yes, new methods are added.

…/from_offsets methods to VariableShapeTensor Python bindings

Add PyArrow bindings for the VariableShapeTensor extension type,
including VariableShapeTensorType, VariableShapeTensorArray, and
VariableShapeTensorScalar with support for converting to/from
NumPy tensors.
@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Apr 14, 2026
@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #49746 has been automatically assigned in GitHub to PR creator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant