fix: metric inheritance patterns: separate factory-created metrics from class-instantiated metrics #2316

jjmachan · 2025-09-26T02:47:49Z

Summary

Primary Motivation: This PR fixes a fundamental inheritance pattern issue in the metrics system where factory-created metrics (via @discrete_metric, @numeric_metric, etc.) and class-instantiated metrics (via DiscreteMetric(), NumericMetric(), etc.) should have different base classes but were incorrectly sharing the same inheritance hierarchy.

The Problem:

Factory-created metrics should inherit from SimpleBaseMetric (lightweight, decorator-based)
Class-instantiated metrics should inherit from SimpleLLMMetric (LLM-enabled, full-featured)
Previously, both paths incorrectly inherited from the same base classes, creating confusion and incorrect behavior

The Solution:
• Separated base classes: Created SimpleBaseMetric (for factory) and SimpleLLMMetric (for class instantiation) as distinct, unrelated base classes
• Removed llm_based.py: Consolidated BaseLLMMetric and LLMMetric into base.py as SimpleBaseMetric and SimpleLLMMetric
• Fixed decorator inheritance: Factory methods now create metrics that inherit from SimpleBaseMetric + ValidatorMixin only
• Fixed class inheritance: Class-based metrics like DiscreteMetric now inherit from SimpleLLMMetric + ValidatorMixin
• Added validator system: Introduced modular validation mixins that work with both inheritance patterns
• Maintained backward compatibility: Added aliases BaseMetric = SimpleBaseMetric and LLMMetric = SimpleLLMMetric

Exact Steps Taken:

7d6de2a - Updated gitignore for experimental directories
c6101f8 - Renamed classes and established proper naming convention
46450d8 - Refactored decorator and class-based inheritance patterns
a464c37 - Simplified validator system with proper mixins
fe996f6 - Removed llm_based.py after consolidation

Test plan

Verify factory-created metrics (@discrete_metric) inherit from SimpleBaseMetric only
Verify class-instantiated metrics (DiscreteMetric()) inherit from SimpleLLMMetric
Test that both patterns work correctly with their respective validation mixins
Ensure backward compatibility with existing metric imports
Validate all metric functionality (scoring, async operations, alignment)
Run full test suite to ensure no regressions

🤖 Generated with Claude Code

Co-Authored-By: Claude [email protected]

anistark

Looks great.

…rics (#2320) ## Summary This PR adds persistence capabilities and better string representations for LLM-based metrics, making them easier to save, share, and debug. ## Changes ### 1. Save/Load Functionality - Added `save()` and `load()` methods to `SimpleLLMMetric` and its subclasses (`DiscreteMetric`, `NumericMetric`, `RankingMetric`) - Supports JSON format with optional gzip compression - Handles all prompt types including `Prompt` and `DynamicFewShotPrompt` - Smart defaults: `metric.save()` saves to `./metric_name.json` ### 2. Improved `__repr__` Methods - Clean, informative string representations for both LLM-based and decorator-based metrics - Removed implementation details (memory addresses, `<locals>`, internal attributes) - Smart prompt truncation (80 chars max) - Function signature display for decorator-based metrics **Before:** ```python create_metric_decorator.<locals>.decorator_factory.<locals>.decorator.<locals>.CustomMetric(name='summary_accuracy', _func=<function summary_accuracy at 0x151ffdf80>, ...) ``` **After:** ```python # LLM-based metrics DiscreteMetric(name='response_quality', allowed_values=['correct', 'incorrect'], prompt='Evaluate if the response...') # Decorator-based metrics summary_accuracy(user_input, response) -> DiscreteMetric[['pass', 'fail']] ``` ### 3. Response Model Handling - Added `create_auto_response_model()` factory to mark auto-generated models - Only warns about custom response models during save, not standard ones ## Usage Examples ```python # Save metric with default path metric.save() # → ./response_quality.json # Save with custom path metric.save("custom.json") metric.save("/path/to/metrics/") # → /path/to/metrics/response_quality.json metric.save("compressed.json.gz") # Compressed # Load metric loaded_metric = DiscreteMetric.load("response_quality.json") # For DynamicFewShotPrompt metrics loaded_metric = DiscreteMetric.load("metric.json", embedding_model=embeddings) ``` ## Testing - Comprehensive test suite with 8 tests covering all save/load scenarios - Tests for default paths, directory handling, compression - Tests for all prompt types and metric subclasses ## Dependencies **Note:** This PR builds on #2316 (Fix metric inheritance patterns) and requires it to be merged first. The changes here depend on the cleaned-up metric inheritance structure from that PR. ## Checklist - [x] Tests added - [x] Documentation in docstrings - [x] Backwards compatible (new functionality only) - [x] Follows TDD practices

…om class-instantiated metrics (#2316) **Primary Motivation**: This PR fixes a fundamental inheritance pattern issue in the metrics system where factory-created metrics (via `@discrete_metric`, `@numeric_metric`, etc.) and class-instantiated metrics (via `DiscreteMetric()`, `NumericMetric()`, etc.) should have different base classes but were incorrectly sharing the same inheritance hierarchy. **The Problem**: - Factory-created metrics should inherit from `SimpleBaseMetric` (lightweight, decorator-based) - Class-instantiated metrics should inherit from `SimpleLLMMetric` (LLM-enabled, full-featured) - Previously, both paths incorrectly inherited from the same base classes, creating confusion and incorrect behavior **The Solution**: • **Separated base classes**: Created `SimpleBaseMetric` (for factory) and `SimpleLLMMetric` (for class instantiation) as distinct, unrelated base classes • **Removed `llm_based.py`**: Consolidated `BaseLLMMetric` and `LLMMetric` into `base.py` as `SimpleBaseMetric` and `SimpleLLMMetric` • **Fixed decorator inheritance**: Factory methods now create metrics that inherit from `SimpleBaseMetric + ValidatorMixin` only • **Fixed class inheritance**: Class-based metrics like `DiscreteMetric` now inherit from `SimpleLLMMetric + ValidatorMixin` • **Added validator system**: Introduced modular validation mixins that work with both inheritance patterns • **Maintained backward compatibility**: Added aliases `BaseMetric = SimpleBaseMetric` and `LLMMetric = SimpleLLMMetric` **Exact Steps Taken**: 1. `7d6de2a` - Updated gitignore for experimental directories 2. `c6101f8` - Renamed classes and established proper naming convention 3. `46450d8` - Refactored decorator and class-based inheritance patterns 4. `a464c37` - Simplified validator system with proper mixins 5. `fe996f6` - Removed `llm_based.py` after consolidation - [ ] Verify factory-created metrics (`@discrete_metric`) inherit from `SimpleBaseMetric` only - [ ] Verify class-instantiated metrics (`DiscreteMetric()`) inherit from `SimpleLLMMetric` - [ ] Test that both patterns work correctly with their respective validation mixins - [ ] Ensure backward compatibility with existing metric imports - [ ] Validate all metric functionality (scoring, async operations, alignment) - [ ] Run full test suite to ensure no regressions 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Ani <[email protected]>

…rics (#2320) ## Summary This PR adds persistence capabilities and better string representations for LLM-based metrics, making them easier to save, share, and debug. ## Changes ### 1. Save/Load Functionality - Added `save()` and `load()` methods to `SimpleLLMMetric` and its subclasses (`DiscreteMetric`, `NumericMetric`, `RankingMetric`) - Supports JSON format with optional gzip compression - Handles all prompt types including `Prompt` and `DynamicFewShotPrompt` - Smart defaults: `metric.save()` saves to `./metric_name.json` ### 2. Improved `__repr__` Methods - Clean, informative string representations for both LLM-based and decorator-based metrics - Removed implementation details (memory addresses, `<locals>`, internal attributes) - Smart prompt truncation (80 chars max) - Function signature display for decorator-based metrics **Before:** ```python create_metric_decorator.<locals>.decorator_factory.<locals>.decorator.<locals>.CustomMetric(name='summary_accuracy', _func=<function summary_accuracy at 0x151ffdf80>, ...) ``` **After:** ```python # LLM-based metrics DiscreteMetric(name='response_quality', allowed_values=['correct', 'incorrect'], prompt='Evaluate if the response...') # Decorator-based metrics summary_accuracy(user_input, response) -> DiscreteMetric[['pass', 'fail']] ``` ### 3. Response Model Handling - Added `create_auto_response_model()` factory to mark auto-generated models - Only warns about custom response models during save, not standard ones ## Usage Examples ```python # Save metric with default path metric.save() # → ./response_quality.json # Save with custom path metric.save("custom.json") metric.save("/path/to/metrics/") # → /path/to/metrics/response_quality.json metric.save("compressed.json.gz") # Compressed # Load metric loaded_metric = DiscreteMetric.load("response_quality.json") # For DynamicFewShotPrompt metrics loaded_metric = DiscreteMetric.load("metric.json", embedding_model=embeddings) ``` ## Testing - Comprehensive test suite with 8 tests covering all save/load scenarios - Tests for default paths, directory handling, compression - Tests for all prompt types and metric subclasses ## Dependencies **Note:** This PR builds on #2316 (Fix metric inheritance patterns) and requires it to be merged first. The changes here depend on the cleaned-up metric inheritance structure from that PR. ## Checklist - [x] Tests added - [x] Documentation in docstrings - [x] Backwards compatible (new functionality only) - [x] Follows TDD practices

jjmachan added 5 commits September 25, 2025 10:53

ignore somethings in examples

7d6de2a

renamed

c6101f8

refactored decorator and classbased

46450d8

simpler validator

a464c37

removed llm_based.py

fe996f6

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Sep 26, 2025

a slash command for claude to create the PR

9f72944

jjmachan requested a review from anistark September 26, 2025 16:31

jjmachan mentioned this pull request Sep 27, 2025

feat: Add save/load functionality and improved repr for LLM-based metrics #2320

Merged

4 tasks

jjmachan changed the title ~~Fix metric inheritance patterns: separate factory-created metrics from class-instantiated metrics~~ fix: metric inheritance patterns: separate factory-created metrics from class-instantiated metrics Sep 27, 2025

anistark approved these changes Sep 27, 2025

View reviewed changes

Merge branch 'main' into fix/refactor

e128b19

jjmachan merged commit 80e77b6 into main Sep 30, 2025
8 checks passed

jjmachan deleted the fix/refactor branch September 30, 2025 01:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: metric inheritance patterns: separate factory-created metrics from class-instantiated metrics #2316

fix: metric inheritance patterns: separate factory-created metrics from class-instantiated metrics #2316

Uh oh!

jjmachan commented Sep 26, 2025

Uh oh!

anistark left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: metric inheritance patterns: separate factory-created metrics from class-instantiated metrics #2316

fix: metric inheritance patterns: separate factory-created metrics from class-instantiated metrics #2316

Uh oh!

Conversation

jjmachan commented Sep 26, 2025

Summary

Test plan

Uh oh!

anistark left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants