-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: metric inheritance patterns: separate factory-created metrics from class-instantiated metrics #2316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4 tasks
anistark
approved these changes
Sep 27, 2025
Member
anistark
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great.
anistark
pushed a commit
that referenced
this pull request
Oct 1, 2025
…rics (#2320) ## Summary This PR adds persistence capabilities and better string representations for LLM-based metrics, making them easier to save, share, and debug. ## Changes ### 1. Save/Load Functionality - Added `save()` and `load()` methods to `SimpleLLMMetric` and its subclasses (`DiscreteMetric`, `NumericMetric`, `RankingMetric`) - Supports JSON format with optional gzip compression - Handles all prompt types including `Prompt` and `DynamicFewShotPrompt` - Smart defaults: `metric.save()` saves to `./metric_name.json` ### 2. Improved `__repr__` Methods - Clean, informative string representations for both LLM-based and decorator-based metrics - Removed implementation details (memory addresses, `<locals>`, internal attributes) - Smart prompt truncation (80 chars max) - Function signature display for decorator-based metrics **Before:** ```python create_metric_decorator.<locals>.decorator_factory.<locals>.decorator.<locals>.CustomMetric(name='summary_accuracy', _func=<function summary_accuracy at 0x151ffdf80>, ...) ``` **After:** ```python # LLM-based metrics DiscreteMetric(name='response_quality', allowed_values=['correct', 'incorrect'], prompt='Evaluate if the response...') # Decorator-based metrics summary_accuracy(user_input, response) -> DiscreteMetric[['pass', 'fail']] ``` ### 3. Response Model Handling - Added `create_auto_response_model()` factory to mark auto-generated models - Only warns about custom response models during save, not standard ones ## Usage Examples ```python # Save metric with default path metric.save() # → ./response_quality.json # Save with custom path metric.save("custom.json") metric.save("/path/to/metrics/") # → /path/to/metrics/response_quality.json metric.save("compressed.json.gz") # Compressed # Load metric loaded_metric = DiscreteMetric.load("response_quality.json") # For DynamicFewShotPrompt metrics loaded_metric = DiscreteMetric.load("metric.json", embedding_model=embeddings) ``` ## Testing - Comprehensive test suite with 8 tests covering all save/load scenarios - Tests for default paths, directory handling, compression - Tests for all prompt types and metric subclasses ## Dependencies **Note:** This PR builds on #2316 (Fix metric inheritance patterns) and requires it to be merged first. The changes here depend on the cleaned-up metric inheritance structure from that PR. ## Checklist - [x] Tests added - [x] Documentation in docstrings - [x] Backwards compatible (new functionality only) - [x] Follows TDD practices
anistark
added a commit
that referenced
this pull request
Nov 17, 2025
…om class-instantiated metrics (#2316) **Primary Motivation**: This PR fixes a fundamental inheritance pattern issue in the metrics system where factory-created metrics (via `@discrete_metric`, `@numeric_metric`, etc.) and class-instantiated metrics (via `DiscreteMetric()`, `NumericMetric()`, etc.) should have different base classes but were incorrectly sharing the same inheritance hierarchy. **The Problem**: - Factory-created metrics should inherit from `SimpleBaseMetric` (lightweight, decorator-based) - Class-instantiated metrics should inherit from `SimpleLLMMetric` (LLM-enabled, full-featured) - Previously, both paths incorrectly inherited from the same base classes, creating confusion and incorrect behavior **The Solution**: • **Separated base classes**: Created `SimpleBaseMetric` (for factory) and `SimpleLLMMetric` (for class instantiation) as distinct, unrelated base classes • **Removed `llm_based.py`**: Consolidated `BaseLLMMetric` and `LLMMetric` into `base.py` as `SimpleBaseMetric` and `SimpleLLMMetric` • **Fixed decorator inheritance**: Factory methods now create metrics that inherit from `SimpleBaseMetric + ValidatorMixin` only • **Fixed class inheritance**: Class-based metrics like `DiscreteMetric` now inherit from `SimpleLLMMetric + ValidatorMixin` • **Added validator system**: Introduced modular validation mixins that work with both inheritance patterns • **Maintained backward compatibility**: Added aliases `BaseMetric = SimpleBaseMetric` and `LLMMetric = SimpleLLMMetric` **Exact Steps Taken**: 1. `7d6de2a` - Updated gitignore for experimental directories 2. `c6101f8` - Renamed classes and established proper naming convention 3. `46450d8` - Refactored decorator and class-based inheritance patterns 4. `a464c37` - Simplified validator system with proper mixins 5. `fe996f6` - Removed `llm_based.py` after consolidation - [ ] Verify factory-created metrics (`@discrete_metric`) inherit from `SimpleBaseMetric` only - [ ] Verify class-instantiated metrics (`DiscreteMetric()`) inherit from `SimpleLLMMetric` - [ ] Test that both patterns work correctly with their respective validation mixins - [ ] Ensure backward compatibility with existing metric imports - [ ] Validate all metric functionality (scoring, async operations, alignment) - [ ] Run full test suite to ensure no regressions 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Ani <[email protected]>
anistark
pushed a commit
that referenced
this pull request
Nov 17, 2025
…rics (#2320) ## Summary This PR adds persistence capabilities and better string representations for LLM-based metrics, making them easier to save, share, and debug. ## Changes ### 1. Save/Load Functionality - Added `save()` and `load()` methods to `SimpleLLMMetric` and its subclasses (`DiscreteMetric`, `NumericMetric`, `RankingMetric`) - Supports JSON format with optional gzip compression - Handles all prompt types including `Prompt` and `DynamicFewShotPrompt` - Smart defaults: `metric.save()` saves to `./metric_name.json` ### 2. Improved `__repr__` Methods - Clean, informative string representations for both LLM-based and decorator-based metrics - Removed implementation details (memory addresses, `<locals>`, internal attributes) - Smart prompt truncation (80 chars max) - Function signature display for decorator-based metrics **Before:** ```python create_metric_decorator.<locals>.decorator_factory.<locals>.decorator.<locals>.CustomMetric(name='summary_accuracy', _func=<function summary_accuracy at 0x151ffdf80>, ...) ``` **After:** ```python # LLM-based metrics DiscreteMetric(name='response_quality', allowed_values=['correct', 'incorrect'], prompt='Evaluate if the response...') # Decorator-based metrics summary_accuracy(user_input, response) -> DiscreteMetric[['pass', 'fail']] ``` ### 3. Response Model Handling - Added `create_auto_response_model()` factory to mark auto-generated models - Only warns about custom response models during save, not standard ones ## Usage Examples ```python # Save metric with default path metric.save() # → ./response_quality.json # Save with custom path metric.save("custom.json") metric.save("/path/to/metrics/") # → /path/to/metrics/response_quality.json metric.save("compressed.json.gz") # Compressed # Load metric loaded_metric = DiscreteMetric.load("response_quality.json") # For DynamicFewShotPrompt metrics loaded_metric = DiscreteMetric.load("metric.json", embedding_model=embeddings) ``` ## Testing - Comprehensive test suite with 8 tests covering all save/load scenarios - Tests for default paths, directory handling, compression - Tests for all prompt types and metric subclasses ## Dependencies **Note:** This PR builds on #2316 (Fix metric inheritance patterns) and requires it to be merged first. The changes here depend on the cleaned-up metric inheritance structure from that PR. ## Checklist - [x] Tests added - [x] Documentation in docstrings - [x] Backwards compatible (new functionality only) - [x] Follows TDD practices
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Primary Motivation: This PR fixes a fundamental inheritance pattern issue in the metrics system where factory-created metrics (via
@discrete_metric,@numeric_metric, etc.) and class-instantiated metrics (viaDiscreteMetric(),NumericMetric(), etc.) should have different base classes but were incorrectly sharing the same inheritance hierarchy.The Problem:
SimpleBaseMetric(lightweight, decorator-based)SimpleLLMMetric(LLM-enabled, full-featured)The Solution:
• Separated base classes: Created
SimpleBaseMetric(for factory) andSimpleLLMMetric(for class instantiation) as distinct, unrelated base classes• Removed
llm_based.py: ConsolidatedBaseLLMMetricandLLMMetricintobase.pyasSimpleBaseMetricandSimpleLLMMetric• Fixed decorator inheritance: Factory methods now create metrics that inherit from
SimpleBaseMetric + ValidatorMixinonly• Fixed class inheritance: Class-based metrics like
DiscreteMetricnow inherit fromSimpleLLMMetric + ValidatorMixin• Added validator system: Introduced modular validation mixins that work with both inheritance patterns
• Maintained backward compatibility: Added aliases
BaseMetric = SimpleBaseMetricandLLMMetric = SimpleLLMMetricExact Steps Taken:
7d6de2a- Updated gitignore for experimental directoriesc6101f8- Renamed classes and established proper naming convention46450d8- Refactored decorator and class-based inheritance patternsa464c37- Simplified validator system with proper mixinsfe996f6- Removedllm_based.pyafter consolidationTest plan
@discrete_metric) inherit fromSimpleBaseMetriconlyDiscreteMetric()) inherit fromSimpleLLMMetric🤖 Generated with Claude Code
Co-Authored-By: Claude [email protected]