-
Notifications
You must be signed in to change notification settings - Fork 177
Description
Description:
I encountered an issue while trying to apply multiple data validation decorators to a single function in the Hamilton DAG framework. Specifically, I am trying to validate different columns of a DataFrame using multiple instances of the @check_output_custom decorator. However, I receive a ValueError indicating that the function cannot be defined more than once.
Steps to Reproduce:
- Define a function to process a DataFrame.
- Apply multiple @check_output_custom decorators to the function, each with different validation parameters.
- Attempt to run the decorated function.
Example code snippet:
1st issue code snippet
@check_output_custom(CompositePrimaryKeyValidatorPySparkDataFrame(columns=["OrderID", "ItemNumber"], importance="fail"))
@check_output_custom(CategoricalValuesValidatorPySparkDataFrame(column="CategoryID", allowed_values=[1, 2, 3], importance="fail"))
def process_order_data(order_data_config: dict, order_filter_template: List) -> DataFrame:
# Function implementation
passThis raises the error:
ValueError: Cannot define function process_order_data_raw more than once. Already defined by function <function process_order_data
2nd issue code snippet
@check_output_custom(CategoricalValuesValidatorPySparkDataFrame(column="CategoryID", allowed_values=[1, 2, 3], importance="fail"))
@check_output_custom(CategoricalValuesValidatorPySparkDataFrame(column="ProductID", allowed_values=[10, 20, 30], importance="warn"))
def process_order_data(order_data_config: dict, order_filter_template: List) -> DataFrame:
# Function implementation
passThis raises the error:
ValueError: Cannot define function process_order_data_CategoricalValuesValidator more than once. Already defined by function <function process_order_data
Expected Behavior
Applying multiple @check_output_custom decorators to a single function should allow for different validation checks on various columns of the DataFrame without raising a ValueError.
Actual Behavior
A ValueError is raised, indicating that the function cannot be defined more than once by the same validator.
Library & System Information
python version = 3.9.5
hamilton library version = 1.65.0
Additional Context:
This issue prevents the application of multiple validators to a single function, which is necessary for comprehensive data validation in our use case. It would be helpful if the framework could support multiple validators on the same function without raising errors.
Thank you for your attention to this issue.