Skip to content

enh: Adds support for Time datatype#2113

Merged
dangotbanned merged 15 commits intonarwhals-dev:mainfrom
marvinl803:issue-1989
Mar 17, 2025
Merged

enh: Adds support for Time datatype#2113
dangotbanned merged 15 commits intonarwhals-dev:mainfrom
marvinl803:issue-1989

Conversation

@marvinl803
Copy link
Copy Markdown
Contributor

@marvinl803 marvinl803 commented Feb 28, 2025

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

I have added support for the Polars Time datatype. I wanted to implement this first to ensure there are no issues before proceeding with the Binary datatype, as mentioned in the issue.

Please let me know if any changes or fixes are needed. Thank you!

@marvinl803 marvinl803 marked this pull request as ready for review February 28, 2025 17:09
Copy link
Copy Markdown
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @marvinl803 - This is a good start 🙏🏼

I would like to aim to support the following in this PR:

@marvinl803
Copy link
Copy Markdown
Contributor Author

Thanks for the contribution @marvinl803 - This is a good start 🙏🏼

I would like to aim to support the following in this PR:

Hi @FBruzzesi,

I have added the test cases where you suggested and also implemented support for PyArrow and DuckDB. However, I encountered an issue in narwhals/dtypes.py—I wasn’t sure how to provide an example for the docs using DuckDB.

Additionally, I added Time to all the other supported types in narwhals_to_native_dtype, except for PyArrow. I’m uncertain whether we should return time32 or time64 in that case. Could you provide some guidance on how best to handle this?

I appreciate your help!

@FBruzzesi
Copy link
Copy Markdown
Member

I have added the test cases where you suggested and also implemented support for PyArrow and DuckDB. However, I encountered an issue in narwhals/dtypes.py—I wasn’t sure how to provide an example for the docs using DuckDB.

Thanks for your adjustments - I think we are very close to have this ready. Regarding the narwhals failing tests:

  • One is because of coverage: conversion from narwhals to native has no test yet
  • The other one if because of top level imports - try use constructor and mark unsupported backends as xfail rather than manually instantiating dataframes

Additionally, I added Time to all the other supported types in narwhals_to_native_dtype, except for PyArrow. I’m uncertain whether we should return time32 or time64 in that case. Could you provide some guidance on how best to handle this?

I think we should convert narwhals Time to pyarrow time64 since that's what polars Time mentions:

The underlying representation of this type is a 64-bit signed integer. The integer indicates the number of nanoseconds since midnight.


As a side note, we will need to make PRs in shiny and marimo to fix their failing tests

Comment on lines +415 to +416
if dtype.startswith("time") and dtype.endswith("[pyarrow]"):
return dtypes.Time()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pattern is time<32|64>[<unit>][pyarrow]

Example:

from datetime import time
import pandas as pd
import pyarrow as pa

data = {"a": [time(12, 0, 0), time(12, 0, 5)]}

pd.DataFrame(data).convert_dtypes(dtype_backend="pyarrow").astype(pd.ArrowDtype(pa.time64("ns"))).dtypes
a    time64[ns][pyarrow]
dtype: object

if isinstance_or_issubclass(
dtype, (dtypes.Struct, dtypes.Array, dtypes.List, dtypes.Time)
):
if implementation is Implementation.PANDAS and backend_version >= (2, 2):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated, but... should we check modin backed by pyarrow?

dtypes.UInt8,
dtypes.Enum,
dtypes.Categorical,
dtypes.Time,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a datatype in pyspark for this, so it will just end up in the unsupported list

@FBruzzesi
Copy link
Copy Markdown
Member

FBruzzesi commented Mar 6, 2025

@marvinl803 I added a few commits to reach coverage and add pandas support (if backed by pyarrow dtype).

I think this is ready to merge, but we should have ready the PRs to address marimo and shiny whenever we press the merge button. I will try to find a bit of time tomorrow (no promises 🙈)


Plotly failure is unrelated, they simply reworked some deps files

Comment on lines +291 to +301
def test_cast_time(request: pytest.FixtureRequest, constructor: Constructor) -> None:
if "pandas" in str(constructor) and PANDAS_VERSION < (2, 2):
request.applymarker(pytest.mark.xfail)

if any(backend in str(constructor) for backend in ("dask", "pyspark", "modin")):
request.applymarker(pytest.mark.xfail)

data = {"a": [time(12, 0, 0), time(12, 0, 5)]}
df = nw.from_native(constructor(data))
result = df.select(nw.col("a").cast(nw.Time()))
assert result.collect_schema() == {"a": nw.Time()}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this is working for the Expr.cast case!

Could we also gets some tests to see how nw.Time interacts with Expr.dt methods?
I'm thinking we should aim to support anything that doesn't mention date, duration, time_zone components

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The polars Temporal docs usually specify which subset of TemporalType they work with

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only found polars.Expr.dt.to_string mention Time so far

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

@FBruzzesi FBruzzesi Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might get more complex that expected: pandas does not access pyarrow time via .dt:

from datetime import time
import pandas as pd
import pyarrow as pa

data = {"a": [time(12, 0, 0), time(12, 0, 5)]}

(
 pd.DataFrame(data)
 .convert_dtypes(dtype_backend="pyarrow")
 .astype(pd.ArrowDtype(pa.time64("ns")))
 ["a"].dt
)

AttributeError: Can only use .dt accessor with datetimelike values

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FBruzzesi we don't have to resolve all the issues in this PR - but I think it would still be worth identifying them for follow-ups.

Before reading through (#2113 (comment)) I wasn't aware of how little support there was for Time

@marvinl803
Copy link
Copy Markdown
Contributor Author

marvinl803 commented Mar 7, 2025

@marvinl803 I added a few commits to reach coverage and add pandas support (if backed by pyarrow dtype).

I think this is ready to merge, but we should have ready the PRs to address marimo and shiny whenever we press the merge button. I will try to find a bit of time tomorrow (no promises 🙈)

Plotly failure is unrelated, they simply reworked some deps files

Got it! Thanks for your help and the updates, @FBruzzesi. I'm not too familiar with Marimo and Shiny, but happy to assist however I can. Let me know if there's anything I can do! 👍

@FBruzzesi
Copy link
Copy Markdown
Member

FBruzzesi commented Mar 7, 2025

Got it! Thanks for your help and the updates, @FBruzzesi. I'm not too familiar with Marimo and Shiny, but happy to assist however I can. Let me know if there's anything I can do! 👍

Hey @marvinl803 I opened one PR in each repo, and forgot to mention it. You should be able to see the link somewhere here (I am from mobile and it's tricky to find the links) 😊

@marvinl803
Copy link
Copy Markdown
Contributor Author

Hey @marvinl803 I opened one PR in each repo, and forgot to mention it. You should be able to see the link somewhere here (I am from mobile and it's tricky to find the links) 😊

No problem, @FBruzzesi! I found them—I'll check them out.

@FBruzzesi
Copy link
Copy Markdown
Member

@MarcoGorelli I have a PR open in marimo that provide a fix, yet we will need a release - either we can merge this PR and make a patch release, or merge it on Monday before the minor release (I would prefer this second option) 😁

@FBruzzesi FBruzzesi changed the title enh: Adds support for Polars Time datatype enh: Adds support for Time datatype Mar 17, 2025
Copy link
Copy Markdown
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a ton for the contribution (and your patience) @marvinl803

Let's ship this to include this feature in today's release
WooblesTheWooblesGIF

P.S. @dangotbanned I was not able to solve the mypy issues is CI 🥲

@dangotbanned
Copy link
Copy Markdown
Member

dangotbanned commented Mar 17, 2025

P.S. @dangotbanned I was not able to solve the mypy issues is CI 🥲

@FBruzzesi Let me take a quick look before merging 🤝

@dangotbanned
Copy link
Copy Markdown
Member

I've opened a PR upstream for another fix

@FBruzzesi
Copy link
Copy Markdown
Member

Thanks @dangotbanned , shall we merge?

@dangotbanned dangotbanned merged commit 806f56b into narwhals-dev:main Mar 17, 2025
27 of 28 checks passed
mscolnick added a commit to marimo-team/marimo that referenced this pull request Mar 18, 2025
## 📝 Summary

[Narwhals#2113](narwhals-dev/narwhals#2113) is
going to introduce `Time` dtype, this PR prepares for that


## 📋 Checklist

- [x] I have read the [contributor
guidelines](https://github.com/marimo-team/marimo/blob/main/CONTRIBUTING.md).
- [ ] For large changes, or changes that affect the public API: this
change was discussed or approved through an issue, on
[Discord](https://marimo.io/discord?ref=pr), or the community
[discussions](https://github.com/marimo-team/marimo/discussions) (Please
provide a link if applicable).
- [ ] I have added tests for the changes made.
- [x] I have run the code and verified that it works as expected.

## 📜 Reviewers

@akshayka OR @mscolnick

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Myles Scolnick <myles@marimo.io>
dangotbanned added a commit that referenced this pull request Aug 6, 2025
Both `pa.date32()` and `pa.time64("ns")` are currently supported in `modin`, but weren't included in #2113
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dtypes enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants