Skip to content

Add DataValidator class#724

Merged
cpaniaguam merged 41 commits intomainfrom
722-refactor-data-validation-preprocessing-into-a-dedicated-module
Jun 11, 2025
Merged

Add DataValidator class#724
cpaniaguam merged 41 commits intomainfrom
722-refactor-data-validation-preprocessing-into-a-dedicated-module

Conversation

@cpaniaguam
Copy link
Copy Markdown
Collaborator

@cpaniaguam cpaniaguam commented Jun 9, 2025

This pull request introduces a new DataValidator class to handle data validation and preprocessing for HSSM models. The DataValidator class centralizes functionality related to data sanity checks, handling missing data, and deadline processing. Additionally, the HSSM class now inherits from DataValidator, simplifying its implementation by delegating these responsibilities. Comprehensive tests for the DataValidator class have been added to ensure robustness.

New DataValidator Class:

  • src/hssm/data_validator.py: Introduced the DataValidator class with methods for validating data fields, handling missing data and deadlines, and updating extra fields. This class encapsulates data preprocessing logic for HSSM models.

Integration with HSSM:

  • src/hssm/hssm.py:
    • Modified HSSM to inherit from DataValidator, enabling reuse of data validation and preprocessing methods.
    • Replaced redundant methods in HSSM with calls to DataValidator methods, such as _handle_missing_data_and_deadline and _set_missing_data_and_deadline. [1] [2]

Test Coverage:

  • tests/test_data_validator.py: Added extensive unit tests for the DataValidator class, covering its constructor, field validation, data sanity checks, handling of missing data and deadlines, and integration with external fields.

@cpaniaguam cpaniaguam linked an issue Jun 9, 2025 that may be closed by this pull request
7 tasks
cpaniaguam added 28 commits June 9, 2025 12:38
- Add static method check_fields
- Refactor _check_extra_fields, _pre_check_data_sanity using check_fields
…ks for missing, NaN, and invalid response times
…eters

They are set as default in DataValidator constructor
@cpaniaguam cpaniaguam changed the title Add DataValidator class constructor Add DataValidator class Jun 10, 2025
@cpaniaguam cpaniaguam requested a review from Copilot June 10, 2025 16:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new DataValidator class that encapsulates data validation and preprocessing functionality for HSSM models, and integrates it by having HSSM inherit from DataValidator.

  • Introduces DataValidator in src/hssm/data_validator.py with methods to validate and preprocess data.
  • Updates HSSM in src/hssm/hssm.py to delegate data handling to DataValidator.
  • Adds comprehensive unit tests in tests/test_data_validator.py to ensure robustness.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
tests/test_data_validator.py Adds unit tests for data validation, though one helper isn’t decorated as a fixture.
src/hssm/hssm.py Integrates DataValidator and removes redundant methods.
src/hssm/data_validator.py Introduces the new DataValidator class with data sanity checks and preprocessing logic.

@cpaniaguam cpaniaguam marked this pull request as ready for review June 10, 2025 16:44
@cpaniaguam cpaniaguam self-assigned this Jun 10, 2025
Copy link
Copy Markdown
Member

@AlexanderFengler AlexanderFengler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great honestly.
Already polished, didn't see any obvious problem here.
Thanks @cpaniaguam

# but is expected to exist in subclasses (e.g., HSSM).
# The 'type: ignore[attr-defined]' comment tells mypy to ignore the missing
# attribute error here and avoid moving this method to the HSSM class.
self.model_distribution.extra_fields = [ # type: ignore[attr-defined]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I resolved cases like this with if hasattr() before. But not sure what the recommended pattern is.

@AlexanderFengler
Copy link
Copy Markdown
Member

@cpaniaguam before merging, could you take 3.12 out of the testing matrix here to make sure all the other stuff runs to completion?

We will then bring 3.12 back via another PR, in the context of fixing the dependency situation.

Remove Python 3.12 from test matrix in GitHub Actions workflow
@codecov
Copy link
Copy Markdown

codecov bot commented Jun 11, 2025

Codecov Report

Attention: Patch coverage is 99.00000% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/hssm/data_validator.py 97.61% 2 Missing ⚠️
Files with missing lines Coverage Δ
src/hssm/hssm.py 77.36% <100.00%> (-0.53%) ⬇️
tests/test_data_sanity.py 100.00% <100.00%> (+2.17%) ⬆️
tests/test_data_validator.py 100.00% <100.00%> (ø)
src/hssm/data_validator.py 97.61% <97.61%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cpaniaguam cpaniaguam merged commit 0e352ac into main Jun 11, 2025
5 of 6 checks passed
@cpaniaguam cpaniaguam deleted the 722-refactor-data-validation-preprocessing-into-a-dedicated-module branch June 11, 2025 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor Data Validation & Preprocessing into a Dedicated Module

3 participants