Skip to content
This repository was archived by the owner on Mar 3, 2026. It is now read-only.

feat: python udf implementation #703

Merged
jordanrfrazier merged 18 commits intomainfrom
python-udf/python-trait-implementation-udf
Aug 28, 2023
Merged

feat: python udf implementation #703
jordanrfrazier merged 18 commits intomainfrom
python-udf/python-trait-implementation-udf

Conversation

@jordanrfrazier
Copy link
Collaborator

@jordanrfrazier jordanrfrazier commented Aug 23, 2023

Adds the ability for users to define Python user-defined-functions that operate on Pandas Series, and interoperate with Timestreams. For example,

@kd.udf("add<N: number>(x: N, y: N) -> N")
def add(x: pd.Series, y: pd.Series) -> pd.Series:
    x + y

Which can be called simply as:

add(Foo.m, Foo.n)

There are some considerations:

  1. Only allows for operations on the Series and not per-element.
  2. Requires knowing Fenl syntax/types to place the annotation.

However, improvements can be made where only the return type is supplied, and the argument types checked at runtime. This is similar to how PySpark handles it.

Polars offers another option, where they split udf calls into apply (working per-element) and map (over arrays). It's noted that the per-element operations are slow, which is expected.

Closes #698

@cla-bot cla-bot bot added the cla-signed Set when all authors of a PR have signed our CLA label Aug 23, 2023
@jordanrfrazier jordanrfrazier marked this pull request as ready for review August 25, 2023 03:15
@bjchambers bjchambers changed the title draft: python udf implementation feat: python udf implementation Aug 25, 2023
@github-actions github-actions bot added the enhancement New feature or request label Aug 25, 2023
@jordanrfrazier jordanrfrazier force-pushed the python-udf/python-trait-implementation-udf branch from d3a595d to 0683b00 Compare August 28, 2023 16:41
@jordanrfrazier jordanrfrazier added this pull request to the merge queue Aug 28, 2023
Merged via the queue into main with commit 4caa780 Aug 28, 2023
@jordanrfrazier jordanrfrazier deleted the python-udf/python-trait-implementation-udf branch August 28, 2023 19:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed Set when all authors of a PR have signed our CLA enhancement New feature or request sparrow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python UDF

2 participants