Skip to content

Value Error When Applying Multiple Data Validation Decorators to the Same Function #950

@rohithrockzz

Description

@rohithrockzz

Description:

I encountered an issue while trying to apply multiple data validation decorators to a single function in the Hamilton DAG framework. Specifically, I am trying to validate different columns of a DataFrame using multiple instances of the @check_output_custom decorator. However, I receive a ValueError indicating that the function cannot be defined more than once.

Steps to Reproduce:

  1. Define a function to process a DataFrame.
  2. Apply multiple @check_output_custom decorators to the function, each with different validation parameters.
  3. Attempt to run the decorated function.

Example code snippet:

1st issue code snippet

@check_output_custom(CompositePrimaryKeyValidatorPySparkDataFrame(columns=["OrderID", "ItemNumber"], importance="fail"))
@check_output_custom(CategoricalValuesValidatorPySparkDataFrame(column="CategoryID", allowed_values=[1, 2, 3], importance="fail"))
def process_order_data(order_data_config: dict, order_filter_template: List) -> DataFrame:
    # Function implementation
    pass

This raises the error:

ValueError: Cannot define function process_order_data_raw more than once. Already defined by function <function process_order_data

2nd issue code snippet

@check_output_custom(CategoricalValuesValidatorPySparkDataFrame(column="CategoryID", allowed_values=[1, 2, 3], importance="fail"))
@check_output_custom(CategoricalValuesValidatorPySparkDataFrame(column="ProductID", allowed_values=[10, 20, 30], importance="warn"))
def process_order_data(order_data_config: dict, order_filter_template: List) -> DataFrame:
    # Function implementation
    pass

This raises the error:

ValueError: Cannot define function process_order_data_CategoricalValuesValidator more than once. Already defined by function <function process_order_data

Expected Behavior

Applying multiple @check_output_custom decorators to a single function should allow for different validation checks on various columns of the DataFrame without raising a ValueError.

Actual Behavior

A ValueError is raised, indicating that the function cannot be defined more than once by the same validator.

Library & System Information

python version = 3.9.5
hamilton library version = 1.65.0

Additional Context:

This issue prevents the application of multiple validators to a single function, which is necessary for comprehensive data validation in our use case. It would be helpful if the framework could support multiple validators on the same function without raising errors.

Thank you for your attention to this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    triagelabel for issues that need to be triaged.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions