added Builder.with_materializers() by zilto · Pull Request #911 · apache/hamilton

zilto · 2024-05-16T13:42:02Z

Following #877

Added the ability to add materializer nodes to the FunctionGraph at Driver build time.

Use cases:

Easier to view materializers as part of .display_all_functions()
Allows to call materializers by name with .execute(). Would allow some users to completely ignore .materialize()
Possible to combine "static" materializers at build time and "dynamic" materializers at execution time
Catch invalid dataflows earlier than dr.materialize()

Changes

Builder now accepts .with_materializers(*materializers)
Builder.build() adds materializer nodes after creating the Driver and returns the updated Driver. The logic is derived from Driver.materialize()
Builder.copy() copies the materializers

How I tested this

Added tests to check materializer nodes (savers and loaders) are properly added
test that "static" and "dynamic" materializers can be used together

Notes

each time materializers are added post_graph_construct hooks are triggered
Should be include materializers in the version hashing of the dataflow? should we treat "static" and "dynamic" differently. IMO, we might want two create two versions: one for "the dataflow transform, ignoring materializers" and one for "all nodes" since they answer different equality / diffing questions
there's duplicate logic between Builder.build() and Driver.materialize(). A better approach could be to have Driver.add_materializers() and Driver.execute_materializers(). Both Builder.build() and Driver.materialize() could call Driver.add_materializers()
Related to Allow materializer targets to specify inputs #713, users need to be aware that materializers need a valid target or dependencies (type and name) to connect to.

elijahbenizzy · 2024-05-17T18:02:06Z

OK, I like the API. I think the implementation is overly built towards the current way it works. From what I understand:

This builds the standard driver
This then appends the materializers
This is due to the idea of keeping the graph version consistent (right?)
It ends up calling post_graph_construct twice (once in the construction of the driver, once otherwise)
Has a final graph version with materializers

IMO this should:

Build the standard driver with the materializers
Only call it once
Have only one version

So, how would this work?

Idea:

Take the code out of the driver constructor
Put it in a static function
Make all driver parameters optional (E.G. have an internal one -- _graph)
Have there be two paths -- one if you want to construct the graph outside, and the other if you want to do it inside

Alternatively (and this is cleaner, but we have to be careful with some of the SDK stuff, or just say only use the builder/adapter API):

Create a superclass (DriverBase, that takes in graph/other info)
Move just the constructor (currently compatible) into a subclass Driver(user-facing for backwards compatibility with the non-builder API)
Have the Builder build it using the DriverBase API

Then we can put more logic in the builder. Perhaps the simplest solution might be:

Pass in materializers to the driver, default to empty (See the _ parameters
Build it in the driver constructor

I think this is the easiest, and will save on complexity.

Regarding multiple versions that is, IMO, too complex now. We should add it later?

skrawcz · 2024-05-18T04:09:24Z

I'm in favor of seeing if we can make it part of the constructor with this approach. That way post graph construct is only called once.

skrawcz

I think this looks good. Just need to fix the pre-commit hook.

added Builder.with_materializers()

3ae77be

zilto added the enhancement New feature or request label May 16, 2024

zilto requested a review from elijahbenizzy May 16, 2024 13:42

apache deleted a comment from sweep-ai-deprecated bot May 18, 2024

zilto added 2 commits May 18, 2024 20:08

moved logic to Driver __init__

85b7634

added type annotation

7fa7ef4

skrawcz reviewed May 19, 2024

View reviewed changes

zilto added 2 commits May 21, 2024 11:51

made materializers an internal attribute

0b664fc

updated docs; pre-commit hook; typing bugfix

99f7e89

skrawcz approved these changes May 21, 2024

View reviewed changes

skrawcz merged commit 2924ed0 into main May 21, 2024

skrawcz deleted the feat/with-materializers branch May 21, 2024 21:17

skrawcz mentioned this pull request Jul 18, 2024

Allow materializer targets to specify inputs #713

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added Builder.with_materializers()#911

added Builder.with_materializers()#911
skrawcz merged 5 commits intomainfrom
feat/with-materializers

zilto commented May 16, 2024

Uh oh!

elijahbenizzy commented May 17, 2024

Uh oh!

skrawcz commented May 18, 2024

Uh oh!

skrawcz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zilto commented May 16, 2024

Changes

How I tested this

Notes

Uh oh!

elijahbenizzy commented May 17, 2024

Uh oh!

skrawcz commented May 18, 2024

Uh oh!

skrawcz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants