Skip to content

[Data] - schema() handle pd.ArrowDtype -> pyarrow type conversion#57057

Merged
alexeykudinkin merged 1 commit intoray-project:masterfrom
goutamvenkat-anyscale:goutam/handle_pd_arrow_dtype
Oct 1, 2025
Merged

[Data] - schema() handle pd.ArrowDtype -> pyarrow type conversion#57057
alexeykudinkin merged 1 commit intoray-project:masterfrom
goutamvenkat-anyscale:goutam/handle_pd_arrow_dtype

Conversation

@goutamvenkat-anyscale
Copy link
Contributor

@goutamvenkat-anyscale goutamvenkat-anyscale commented Sep 30, 2025

Why are these changes needed?

When the schema contains pd.ArrowDtype datatypes, the existing pa.from_numpy_dtype(dtype) in the schema function will fail.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run pre-commit jobs to lint the changes in this PR. (pre-commit setup)
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Note

Schema.types now converts pandas ArrowDtype to pyarrow types (including within TensorDtype), with unit tests validating dtype conversion.

  • Schema/types conversion
    • Add _convert_to_pa_type to map pandas.ArrowDtype and numpy dtype to pyarrow types.
    • Use helper for both generic column dtypes and TensorDtype._dtype (works with ArrowTensorType/ArrowTensorTypeV2).
    • Import pandas to detect pd.ArrowDtype.
  • Tests
    • Add parametric test ensuring Schema.types returns correct pyarrow types for pd.ArrowDtype and numpy dtypes.
    • Minor test imports updated (e.g., pyarrow, Schema).

Written by Cursor Bugbot for commit 243cdd6. This will update automatically on new commits. Configure here.

Signed-off-by: Goutam V. <goutam@anyscale.com>
@goutamvenkat-anyscale goutamvenkat-anyscale requested a review from a team as a code owner September 30, 2025 22:53
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses an issue where pd.ArrowDtype was not handled properly when determining a dataset's schema types. The introduction of the _convert_to_pa_type helper function is a clean solution, and the accompanying test effectively validates the fix. I've added one suggestion to make the new helper function even more robust by handling raw pyarrow.DataType instances, which seems to be a possibility based on existing code patterns.

Comment on lines +6420 to +6423
def _convert_to_pa_type(dtype: Union[np.dtype, pd.ArrowDtype]) -> pa.DataType:
if isinstance(dtype, pd.ArrowDtype):
return dtype.pyarrow_dtype
return pa.from_numpy_dtype(dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function correctly handles pd.ArrowDtype. To make it more robust, consider also handling raw pyarrow.DataType instances. It appears TensorDtype can sometimes be constructed with a pyarrow.DataType, which would cause a TypeError here as pa.from_numpy_dtype does not accept it. This error is then silently caught by the generic except Exception block, and the type becomes None, which can hide underlying issues. Explicitly handling pyarrow.DataType would prevent this.

Suggested change
def _convert_to_pa_type(dtype: Union[np.dtype, pd.ArrowDtype]) -> pa.DataType:
if isinstance(dtype, pd.ArrowDtype):
return dtype.pyarrow_dtype
return pa.from_numpy_dtype(dtype)
def _convert_to_pa_type(dtype: Union[np.dtype, pd.ArrowDtype, "pa.DataType"]) -> "pa.DataType":
if isinstance(dtype, pd.ArrowDtype):
return dtype.pyarrow_dtype
if isinstance(dtype, pa.DataType):
return dtype
return pa.from_numpy_dtype(dtype)

@goutamvenkat-anyscale goutamvenkat-anyscale added data Ray Data-related issues go add ONLY when ready to merge, run all tests labels Sep 30, 2025
@goutamvenkat-anyscale goutamvenkat-anyscale changed the title [Data] - schema() handle pd.ArrowDtype -> pyarrow type [Data] - schema() handle pd.ArrowDtype -> pyarrow type conversion Sep 30, 2025
@alexeykudinkin alexeykudinkin enabled auto-merge (squash) September 30, 2025 23:16
@alexeykudinkin alexeykudinkin merged commit 2d9d528 into ray-project:master Oct 1, 2025
7 checks passed
@goutamvenkat-anyscale goutamvenkat-anyscale deleted the goutam/handle_pd_arrow_dtype branch October 1, 2025 00:12
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
…y-project#57057)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

When the schema contains `pd.ArrowDtype` datatypes, the existing
`pa.from_numpy_dtype(dtype)` in the schema function will fail.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run pre-commit jobs to lint the changes in this PR.
([pre-commit
setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting))
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Schema.types now converts pandas ArrowDtype to pyarrow types
(including within TensorDtype), with unit tests validating dtype
conversion.
>
> - **Schema/types conversion**
> - Add `_convert_to_pa_type` to map `pandas.ArrowDtype` and `numpy
dtype` to `pyarrow` types.
> - Use helper for both generic column dtypes and `TensorDtype._dtype`
(works with ArrowTensorType/ArrowTensorTypeV2).
>   - Import `pandas` to detect `pd.ArrowDtype`.
> - **Tests**
> - Add parametric test ensuring `Schema.types` returns correct
`pyarrow` types for `pd.ArrowDtype` and `numpy` dtypes.
>   - Minor test imports updated (e.g., `pyarrow`, `Schema`).
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
243cdd6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Signed-off-by: Goutam V. <goutam@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
dstrodtman pushed a commit to dstrodtman/ray that referenced this pull request Oct 6, 2025
…y-project#57057)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

When the schema contains `pd.ArrowDtype` datatypes, the existing
`pa.from_numpy_dtype(dtype)` in the schema function will fail.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run pre-commit jobs to lint the changes in this PR.
([pre-commit
setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting))
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Schema.types now converts pandas ArrowDtype to pyarrow types
(including within TensorDtype), with unit tests validating dtype
conversion.
>
> - **Schema/types conversion**
> - Add `_convert_to_pa_type` to map `pandas.ArrowDtype` and `numpy
dtype` to `pyarrow` types.
> - Use helper for both generic column dtypes and `TensorDtype._dtype`
(works with ArrowTensorType/ArrowTensorTypeV2).
>   - Import `pandas` to detect `pd.ArrowDtype`.
> - **Tests**
> - Add parametric test ensuring `Schema.types` returns correct
`pyarrow` types for `pd.ArrowDtype` and `numpy` dtypes.
>   - Minor test imports updated (e.g., `pyarrow`, `Schema`).
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
243cdd6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Signed-off-by: Goutam V. <goutam@anyscale.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
liulehui pushed a commit to liulehui/ray that referenced this pull request Oct 9, 2025
…y-project#57057)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

When the schema contains `pd.ArrowDtype` datatypes, the existing
`pa.from_numpy_dtype(dtype)` in the schema function will fail.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run pre-commit jobs to lint the changes in this PR.
([pre-commit
setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting))
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Schema.types now converts pandas ArrowDtype to pyarrow types
(including within TensorDtype), with unit tests validating dtype
conversion.
> 
> - **Schema/types conversion**
> - Add `_convert_to_pa_type` to map `pandas.ArrowDtype` and `numpy
dtype` to `pyarrow` types.
> - Use helper for both generic column dtypes and `TensorDtype._dtype`
(works with ArrowTensorType/ArrowTensorTypeV2).
>   - Import `pandas` to detect `pd.ArrowDtype`.
> - **Tests**
> - Add parametric test ensuring `Schema.types` returns correct
`pyarrow` types for `pd.ArrowDtype` and `numpy` dtypes.
>   - Minor test imports updated (e.g., `pyarrow`, `Schema`).
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
243cdd6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Signed-off-by: Goutam V. <goutam@anyscale.com>
joshkodi pushed a commit to joshkodi/ray that referenced this pull request Oct 13, 2025
…y-project#57057)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

When the schema contains `pd.ArrowDtype` datatypes, the existing
`pa.from_numpy_dtype(dtype)` in the schema function will fail.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run pre-commit jobs to lint the changes in this PR.
([pre-commit
setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting))
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Schema.types now converts pandas ArrowDtype to pyarrow types
(including within TensorDtype), with unit tests validating dtype
conversion.
>
> - **Schema/types conversion**
> - Add `_convert_to_pa_type` to map `pandas.ArrowDtype` and `numpy
dtype` to `pyarrow` types.
> - Use helper for both generic column dtypes and `TensorDtype._dtype`
(works with ArrowTensorType/ArrowTensorTypeV2).
>   - Import `pandas` to detect `pd.ArrowDtype`.
> - **Tests**
> - Add parametric test ensuring `Schema.types` returns correct
`pyarrow` types for `pd.ArrowDtype` and `numpy` dtypes.
>   - Minor test imports updated (e.g., `pyarrow`, `Schema`).
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
243cdd6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Signed-off-by: Goutam V. <goutam@anyscale.com>
Signed-off-by: Josh Kodi <joshkodi@gmail.com>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
…y-project#57057)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

When the schema contains `pd.ArrowDtype` datatypes, the existing
`pa.from_numpy_dtype(dtype)` in the schema function will fail.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run pre-commit jobs to lint the changes in this PR.
([pre-commit
setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting))
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Schema.types now converts pandas ArrowDtype to pyarrow types
(including within TensorDtype), with unit tests validating dtype
conversion.
> 
> - **Schema/types conversion**
> - Add `_convert_to_pa_type` to map `pandas.ArrowDtype` and `numpy
dtype` to `pyarrow` types.
> - Use helper for both generic column dtypes and `TensorDtype._dtype`
(works with ArrowTensorType/ArrowTensorTypeV2).
>   - Import `pandas` to detect `pd.ArrowDtype`.
> - **Tests**
> - Add parametric test ensuring `Schema.types` returns correct
`pyarrow` types for `pd.ArrowDtype` and `numpy` dtypes.
>   - Minor test imports updated (e.g., `pyarrow`, `Schema`).
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
243cdd6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Signed-off-by: Goutam V. <goutam@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…y-project#57057)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

When the schema contains `pd.ArrowDtype` datatypes, the existing
`pa.from_numpy_dtype(dtype)` in the schema function will fail.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run pre-commit jobs to lint the changes in this PR.
([pre-commit
setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting))
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Schema.types now converts pandas ArrowDtype to pyarrow types
(including within TensorDtype), with unit tests validating dtype
conversion.
> 
> - **Schema/types conversion**
> - Add `_convert_to_pa_type` to map `pandas.ArrowDtype` and `numpy
dtype` to `pyarrow` types.
> - Use helper for both generic column dtypes and `TensorDtype._dtype`
(works with ArrowTensorType/ArrowTensorTypeV2).
>   - Import `pandas` to detect `pd.ArrowDtype`.
> - **Tests**
> - Add parametric test ensuring `Schema.types` returns correct
`pyarrow` types for `pd.ArrowDtype` and `numpy` dtypes.
>   - Minor test imports updated (e.g., `pyarrow`, `Schema`).
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
243cdd6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Signed-off-by: Goutam V. <goutam@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…y-project#57057)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

When the schema contains `pd.ArrowDtype` datatypes, the existing
`pa.from_numpy_dtype(dtype)` in the schema function will fail.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run pre-commit jobs to lint the changes in this PR.
([pre-commit
setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting))
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Schema.types now converts pandas ArrowDtype to pyarrow types
(including within TensorDtype), with unit tests validating dtype
conversion.
>
> - **Schema/types conversion**
> - Add `_convert_to_pa_type` to map `pandas.ArrowDtype` and `numpy
dtype` to `pyarrow` types.
> - Use helper for both generic column dtypes and `TensorDtype._dtype`
(works with ArrowTensorType/ArrowTensorTypeV2).
>   - Import `pandas` to detect `pd.ArrowDtype`.
> - **Tests**
> - Add parametric test ensuring `Schema.types` returns correct
`pyarrow` types for `pd.ArrowDtype` and `numpy` dtypes.
>   - Minor test imports updated (e.g., `pyarrow`, `Schema`).
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
243cdd6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Signed-off-by: Goutam V. <goutam@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…y-project#57057)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

When the schema contains `pd.ArrowDtype` datatypes, the existing
`pa.from_numpy_dtype(dtype)` in the schema function will fail.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run pre-commit jobs to lint the changes in this PR.
([pre-commit
setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting))
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Schema.types now converts pandas ArrowDtype to pyarrow types
(including within TensorDtype), with unit tests validating dtype
conversion.
>
> - **Schema/types conversion**
> - Add `_convert_to_pa_type` to map `pandas.ArrowDtype` and `numpy
dtype` to `pyarrow` types.
> - Use helper for both generic column dtypes and `TensorDtype._dtype`
(works with ArrowTensorType/ArrowTensorTypeV2).
>   - Import `pandas` to detect `pd.ArrowDtype`.
> - **Tests**
> - Add parametric test ensuring `Schema.types` returns correct
`pyarrow` types for `pd.ArrowDtype` and `numpy` dtypes.
>   - Minor test imports updated (e.g., `pyarrow`, `Schema`).
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
243cdd6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Signed-off-by: Goutam V. <goutam@anyscale.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants