Skip to content

Add Builder.with_materializers() #877

@zilto

Description

@zilto

Following the discussion from #816, there would be benefits to allow materializer nodes to be defined statically at the Driver level (both DataLoader and DataSaver).

  • The nodes can be called directly via .execute()
  • Materializers appear in HamiltonGraph and visualizations even if they aren't executed.
  • Validate the DAG, including the materializers before execution.

Solution 1

dr = (
  driver.Builder()
  .with_modules(...)
  .with_materializers(
    to.dlt(
      id="features_duckdb",
      dependencies=["features_df"]m
      destination=duckdb_dest(...),
    )
  )
  .build()
)

Solution 2

An alternative, would be to allow materializers to be imported and added via .with_modules(). For example, production_materializers.py contains

# production_materializers.py
from hamilton.io.materialization import to

to.dlt(
  id="features__duckdb",
  dependencies=["features_df"],
  destination=duckdb_dest(...),
)
from hamilton import driver
import dataflow
import production_materializers

dr = driver.Builder().with_modules(dataflow, production_materializers).build()

For basic to.parquet() usage, it might be more efficient to store simple Python functions using pd.to_parquet() in a module to enable this patterns. More powerful materializers (e.g., dlt) would benefit from this approach though

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions