chore(typing): Use SQLFrame instead of PySpark for _spark_like internally#2190
Conversation
narwhals/_spark_like/dataframe.py
Outdated
| SQLFrameDataFrame: TypeAlias = _SQLFrameDataFrame[Any, Any, Any, Any, Any] | ||
| _NativeDataFrame: TypeAlias = "DataFrame | SQLFrameDataFrame" | ||
| NativeFrame = BaseDataFrame[Any, Any, Any, Any, Any] | ||
| NativeSession = _BaseSession[Any, Any, Any, Any, Any, Any, Any] |
There was a problem hiding this comment.
Don't think I'll ever get used to seeing 7 TypeVars 😂
|
Thanks for the updates @MarcoGorelli Do you mind holding off on merging? |
- `mypy` just needed the `else` block - but this is much more maintainable
Need it to start a thread
- Added in narwhals-dev#2187, narwhals-dev#2051 - No longer needed since `sqlframe` only typing
narwhals/translate.py
Outdated
| native_object, # NOTE: this is leaking the internal lie | ||
| # error: Argument of type "DataFrame" cannot be assigned to parameter "native_dataframe" of type "SQLFrameDataFrame" in function "__init__" | ||
| # "DataFrame" is not assignable to "BaseDataFrame[Any, Any, Any, Any, Any]" (reportArgumentType) |
There was a problem hiding this comment.
@MarcoGorelli this seems like it'll impact LazyFrame[T] - since we need to say that'll only be SQLFrameDataFrame for SparkLikeLazyFrame
There was a problem hiding this comment.
Turned out that while Series isn't an issue - Frame may well be
There was a problem hiding this comment.
how so?
I tried
# ruff: noqa
from __future__ import annotations
import narwhals as nw
import pandas as pd
import polars as pl
from sqlframe.duckdb import DuckDBSession
from sqlframe.duckdb.dataframe import DuckDBDataFrame
import sqlframe.duckdb.functions as F
from pyspark.sql.dataframe import DataFrame as SparkDataFrame
def func(a: SparkDataFrame) -> None:
reveal_type(nw.from_native(a).to_native())
def func2(a: DuckDBDataFrame) -> None:
reveal_type(nw.from_native(a).to_native())and get
/home/marcogorelli/polars-api-compat-dev/t.py:14:17 - information: Type of "nw.from_native(a).to_native()" is "DataFrame"
/home/marcogorelli/polars-api-compat-dev/t.py:16:17 - information: Type of "nw.from_native(a).to_native()" is "DuckDBDataFrame"
There was a problem hiding this comment.
Try changing that to this and you'll see it matches DataFrame and not LazyFrame
def func(a: SparkDataFrame) -> None:
b = nw.from_native(a)
reveal_type(b)
c = b.to_native()
reveal_type(c)Type of "b" is "narwhals.dataframe.DataFrame[pyspark.sql.dataframe.DataFrame]"
Bit confused on this one though 🤔
Type of "c" is "DataFrame"
There was a problem hiding this comment.
@MarcoGorelli just in case you missed this while writing #2190 (comment)
There was a problem hiding this comment.
isn't this an issue on main as well:
(.310venv) marcogorelli@DESKTOP-U8OKFP3:~/polars-api-compat-dev$ cat t.py
# ruff: noqa
from __future__ import annotations
import narwhals as nw
import pandas as pd
import polars as pl
from sqlframe.duckdb import DuckDBSession
from sqlframe.duckdb.dataframe import DuckDBDataFrame
import sqlframe.duckdb.functions as F
from pyspark.sql.dataframe import DataFrame as SparkDataFrame
def func(a: SparkDataFrame) -> None:
reveal_type(nw.from_native(a).to_native())
reveal_type(nw.from_native(a))
def func2(a: DuckDBDataFrame) -> None:
reveal_type(nw.from_native(a).to_native())
reveal_type(nw.from_native(a))
(.310venv) marcogorelli@DESKTOP-U8OKFP3:~/polars-api-compat-dev$ pyright t.py
/home/marcogorelli/polars-api-compat-dev/t.py
/home/marcogorelli/polars-api-compat-dev/t.py:14:17 - information: Type of "nw.from_native(a).to_native()" is "DataFrame"
/home/marcogorelli/polars-api-compat-dev/t.py:15:17 - information: Type of "nw.from_native(a)" is "narwhals.dataframe.DataFrame[pyspark.sql.dataframe.DataFrame]"
/home/marcogorelli/polars-api-compat-dev/t.py:17:17 - information: Type of "nw.from_native(a).to_native()" is "DuckDBDataFrame"
/home/marcogorelli/polars-api-compat-dev/t.py:18:17 - information: Type of "nw.from_native(a)" is "DataFrame[DuckDBDataFrame]"
0 errors, 0 warnings, 4 informations There was a problem hiding this comment.
isn't this an issue on main as well
Yeah I also noticed it seems to happen for any LazyFrame when passed without any keywords
More motivation towards #2116 I suppose
|
@MarcoGorelli more of a general note A lot of the stuff I've touched is because I still have We still need to have that factored into the CI - even if you want to speed things up locally. |
could you elaborate please? can't we run pyspark tests in CI, but just use sqlframe for internal type checking? |
Sure thing narwhals/narwhals/translate.py Line 732 in af20c95 Which is a result of this change This is what I meant in (#2185 (comment)) when I said
This is what I want - but the fact that >>> pyright
c:\Users\danie\Documents\GitHub\narwhals\narwhals\translate.py
c:\Users\danie\Documents\GitHub\narwhals\narwhals\translate.py:732:17 - error: Argument of type "DataFrame" cannot be assigned to parameter "native_dataframe" of type "SQLFrameDataFrame" in function "__init__"
"DataFrame" is not assignable to "BaseDataFrame[Any, Any, Any, Any, Any]" (reportArgumentType)
1 error, 0 warnings, 0 informations I'm thinking the problem is in these changes: https://github.com/narwhals-dev/narwhals/pull/2190/files#diff-8da2d7afa6668fbe866ba5f8a8f24dd448aa57df874700c659e8ad93abf29478 |
|
is it ok if we just From having tried #2190 (comment), it looks like it all works fine with pyspark installed from the user's perspective |
Yeah that's fine if it isn't causing an issue externally. But we do need to have |
| validate_column_names: bool = False, | ||
| ) -> None: | ||
| self._native_frame = native_dataframe | ||
| self._native_frame: SQLFrameDataFrame = native_dataframe |
There was a problem hiding this comment.
I added an annotation to try and trigger an [arg-type] from mypy.
Still not sure why pyright is the only one to detect (d5e899a)
There was a problem hiding this comment.
I'm happy with where everything is at now @MarcoGorelli
Thanks for proposing and working on this 🎉
_spark_like internally_spark_like internally
|
Awesome, thanks for your review and help here Dan, much appreciated! 🙏 |



closes #2185
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below