Fix rebinning of partially occupied histograms#235
Conversation
Signed-off-by: Johannes Mueller <johannes.mueller4@de.bosch.com>
24e15da to
a5613c1
Compare
There was a problem hiding this comment.
Pull request overview
Fixes #234 by correcting how rebin_histogram(..., binning=int) determines target binning for partially occupied MultiIndex (e.g., sparse 2D histograms), avoiding over-fine bins when rebinned.
Changes:
- Adjust
rebin_histogramMultiIndex/int handling to compute a per-dimension “global” targetIntervalIndex, then subset it per group to the occupied range. - Update
_do_rebin_histogramNaN handling by dropping NaNs once up-front (instead of per-interval). - Add regression tests for fully/partially occupied 2D histograms (same/up/down) and document the fix in the changelog.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/pylife/utils/histogram.py |
Fixes bin calculation for sparse MultiIndex histograms when binning is an int; adjusts NaN filtering and a binning validity edge case. |
tests/utils/test_histogram.py |
Adds regression coverage for sparse vs fully occupied 2D histograms across same/up/down rebin scenarios. |
CHANGELOG.md |
Notes the bug fix under upcoming release bug fixes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return histogram.reorder_levels(original_names) | ||
|
|
||
| def _with_range_index(hist, names_to_drop): | ||
| new_hist = hist.copy().reset_index(drop=False) |
There was a problem hiding this comment.
PEP 8 expects two blank lines between top-level function definitions. Please add an extra blank line before the new helper defs so formatting stays consistent with the rest of this module.
| def _with_range_index(hist, names_to_drop): | ||
| new_hist = hist.copy().reset_index(drop=False) | ||
| for level in names_to_drop: |
There was a problem hiding this comment.
The helper parameter name names_to_drop is misleading because the levels are not dropped; they are encoded to integer codes and later restored. Consider renaming it to something like names_to_encode/names_to_map to make the intent clearer.
| def _with_range_index(hist, names_to_drop): | |
| new_hist = hist.copy().reset_index(drop=False) | |
| for level in names_to_drop: | |
| def _with_range_index(hist, names_to_encode): | |
| new_hist = hist.copy().reset_index(drop=False) | |
| for level in names_to_encode: |
| assert rebinned.shape == (8,) | ||
|
|
||
|
|
||
| def test_rebin_irregular_1d_histogam(): |
There was a problem hiding this comment.
Typo in test name: histogam should be histogram for clarity and discoverability.
| pd.testing.assert_series_equal(rebinned, expected) | ||
|
|
||
|
|
||
| def test_rebin_irregular_2d_histogam(): |
There was a problem hiding this comment.
Typo in test name: histogam should be histogram for clarity and discoverability.
Signed-off-by: Johannes Mueller <johannes.mueller4@de.bosch.com>
Work around pandas-dev/pandas#64825 Signed-off-by: Johannes Mueller <johannes.mueller4@de.bosch.com>
1568c69 to
e4ad206
Compare
Signed-off-by: Johannes Mueller <johannes.mueller4@de.bosch.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def binning_is_overlapping_or_non_monotonic_increasing(): | ||
| if binning.left.min() == binning.right.max(): | ||
| return | ||
| return ( | ||
| len(binning) > 0 | ||
| len(binning) > 1 |
There was a problem hiding this comment.
In _fail_if_binning_invalid(), the early return when binning.left.min() == binning.right.max() skips validation even if binning contains multiple identical zero-length intervals. That allows overlapping/non-monotonic binnings to pass silently; consider removing this early return or restricting it to the single-bin case (e.g., only bypass validation when len(binning) <= 1).
| def _with_range_index(hist, names_to_drop): | ||
| new_hist = hist.copy().reset_index(drop=False) | ||
| for level in names_to_drop: |
There was a problem hiding this comment.
_with_range_index() remaps index levels to integer codes but doesn’t actually drop them; the parameter name names_to_drop is misleading. Renaming it to something like levels_to_remap/levels_to_encode would make the intent clearer and reduce confusion for future maintainers.
| def _with_range_index(hist, names_to_drop): | |
| new_hist = hist.copy().reset_index(drop=False) | |
| for level in names_to_drop: | |
| def _with_range_index(hist, levels_to_remap=None, names_to_drop=None): | |
| if names_to_drop is not None: | |
| warnings.warn( | |
| "'names_to_drop' is deprecated, use 'levels_to_remap' instead.", | |
| DeprecationWarning, | |
| stacklevel=2, | |
| ) | |
| if levels_to_remap is None: | |
| levels_to_remap = names_to_drop | |
| if levels_to_remap is None: | |
| levels_to_remap = [] | |
| new_hist = hist.copy().reset_index(drop=False) | |
| for level in levels_to_remap: |
| rebinned = hi.rebin_histogram(hist, 8, nan_default=True).dropna() | ||
|
|
||
| print(hist.shape, rebinned.shape) | ||
| assert rebinned.shape == (8,) |
There was a problem hiding this comment.
Avoid print() statements in tests; they add noise to test output and make failures harder to read. Please remove this debug print (the assertion already captures the expected behavior).
| assert rebinned.shape == (8,) | ||
|
|
||
|
|
||
| def test_rebin_irregular_1d_histogam(): |
There was a problem hiding this comment.
Test function name has a typo: histogam → histogram. Renaming improves discoverability and consistency with the rest of the test suite.
| pd.testing.assert_series_equal(rebinned, expected) | ||
|
|
||
|
|
||
| def test_rebin_irregular_2d_histogam(): |
There was a problem hiding this comment.
Test function name has a typo: histogam → histogram. Renaming improves discoverability and consistency with the rest of the test suite.
|
|
||
| print(rebinned) | ||
| print(expected) | ||
|
|
There was a problem hiding this comment.
Avoid print() statements in tests; they add noise to test output and make failures harder to read. Please remove these debug prints.
| print(rebinned) | ||
| print(expected) | ||
|
|
There was a problem hiding this comment.
Avoid print() statements in tests; they add noise to test output and make failures harder to read. Please remove these debug prints.
| print(rebinned) | ||
| print(expected) | ||
|
|
There was a problem hiding this comment.
Avoid print() statements in tests; they add noise to test output and make failures harder to read. Please remove these debug prints.
| print("###############") | ||
|
|
||
| print(hist) | ||
| print(expected) | ||
| print(rebinned) |
There was a problem hiding this comment.
Avoid print() statements in tests; they add noise to test output and make failures harder to read. Please remove these debug prints.
Signed-off-by: Johannes Mueller <johannes.mueller4@de.bosch.com>
Fix #234