Skip to content

Fix rebinning of partially occupied histograms#235

Merged
johannes-mueller merged 5 commits into
developfrom
234-histogram-rebinning
Apr 9, 2026
Merged

Fix rebinning of partially occupied histograms#235
johannes-mueller merged 5 commits into
developfrom
234-histogram-rebinning

Conversation

@johannes-mueller
Copy link
Copy Markdown
Member

Fix #234

Signed-off-by: Johannes Mueller <johannes.mueller4@de.bosch.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes #234 by correcting how rebin_histogram(..., binning=int) determines target binning for partially occupied MultiIndex (e.g., sparse 2D histograms), avoiding over-fine bins when rebinned.

Changes:

  • Adjust rebin_histogram MultiIndex/int handling to compute a per-dimension “global” target IntervalIndex, then subset it per group to the occupied range.
  • Update _do_rebin_histogram NaN handling by dropping NaNs once up-front (instead of per-interval).
  • Add regression tests for fully/partially occupied 2D histograms (same/up/down) and document the fix in the changelog.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/pylife/utils/histogram.py Fixes bin calculation for sparse MultiIndex histograms when binning is an int; adjusts NaN filtering and a binning validity edge case.
tests/utils/test_histogram.py Adds regression coverage for sparse vs fully occupied 2D histograms across same/up/down rebin scenarios.
CHANGELOG.md Notes the bug fix under upcoming release bug fixes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/pylife/utils/histogram.py Outdated
Comment thread tests/utils/test_histogram.py
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 250 to +253
return histogram.reorder_levels(original_names)

def _with_range_index(hist, names_to_drop):
new_hist = hist.copy().reset_index(drop=False)
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 8 expects two blank lines between top-level function definitions. Please add an extra blank line before the new helper defs so formatting stays consistent with the rest of this module.

Copilot uses AI. Check for mistakes.
Comment thread src/pylife/utils/histogram.py Outdated
Comment on lines +252 to +254
def _with_range_index(hist, names_to_drop):
new_hist = hist.copy().reset_index(drop=False)
for level in names_to_drop:
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helper parameter name names_to_drop is misleading because the levels are not dropped; they are encoded to integer codes and later restored. Consider renaming it to something like names_to_encode/names_to_map to make the intent clearer.

Suggested change
def _with_range_index(hist, names_to_drop):
new_hist = hist.copy().reset_index(drop=False)
for level in names_to_drop:
def _with_range_index(hist, names_to_encode):
new_hist = hist.copy().reset_index(drop=False)
for level in names_to_encode:

Copilot uses AI. Check for mistakes.
Comment thread tests/utils/test_histogram.py Outdated
assert rebinned.shape == (8,)


def test_rebin_irregular_1d_histogam():
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in test name: histogam should be histogram for clarity and discoverability.

Copilot uses AI. Check for mistakes.
Comment thread tests/utils/test_histogram.py Outdated
pd.testing.assert_series_equal(rebinned, expected)


def test_rebin_irregular_2d_histogam():
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in test name: histogam should be histogram for clarity and discoverability.

Copilot uses AI. Check for mistakes.
Signed-off-by: Johannes Mueller <johannes.mueller4@de.bosch.com>
Work around pandas-dev/pandas#64825

Signed-off-by: Johannes Mueller <johannes.mueller4@de.bosch.com>
Signed-off-by: Johannes Mueller <johannes.mueller4@de.bosch.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 333 to +337
def binning_is_overlapping_or_non_monotonic_increasing():
if binning.left.min() == binning.right.max():
return
return (
len(binning) > 0
len(binning) > 1
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _fail_if_binning_invalid(), the early return when binning.left.min() == binning.right.max() skips validation even if binning contains multiple identical zero-length intervals. That allows overlapping/non-monotonic binnings to pass silently; consider removing this early return or restricting it to the single-bin case (e.g., only bypass validation when len(binning) <= 1).

Copilot uses AI. Check for mistakes.
Comment thread src/pylife/utils/histogram.py Outdated
Comment on lines +254 to +256
def _with_range_index(hist, names_to_drop):
new_hist = hist.copy().reset_index(drop=False)
for level in names_to_drop:
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_with_range_index() remaps index levels to integer codes but doesn’t actually drop them; the parameter name names_to_drop is misleading. Renaming it to something like levels_to_remap/levels_to_encode would make the intent clearer and reduce confusion for future maintainers.

Suggested change
def _with_range_index(hist, names_to_drop):
new_hist = hist.copy().reset_index(drop=False)
for level in names_to_drop:
def _with_range_index(hist, levels_to_remap=None, names_to_drop=None):
if names_to_drop is not None:
warnings.warn(
"'names_to_drop' is deprecated, use 'levels_to_remap' instead.",
DeprecationWarning,
stacklevel=2,
)
if levels_to_remap is None:
levels_to_remap = names_to_drop
if levels_to_remap is None:
levels_to_remap = []
new_hist = hist.copy().reset_index(drop=False)
for level in levels_to_remap:

Copilot uses AI. Check for mistakes.
Comment on lines +541 to +544
rebinned = hi.rebin_histogram(hist, 8, nan_default=True).dropna()

print(hist.shape, rebinned.shape)
assert rebinned.shape == (8,)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid print() statements in tests; they add noise to test output and make failures harder to read. Please remove this debug print (the assertion already captures the expected behavior).

Copilot uses AI. Check for mistakes.
Comment thread tests/utils/test_histogram.py Outdated
assert rebinned.shape == (8,)


def test_rebin_irregular_1d_histogam():
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test function name has a typo: histogamhistogram. Renaming improves discoverability and consistency with the rest of the test suite.

Copilot uses AI. Check for mistakes.
Comment thread tests/utils/test_histogram.py Outdated
pd.testing.assert_series_equal(rebinned, expected)


def test_rebin_irregular_2d_histogam():
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test function name has a typo: histogamhistogram. Renaming improves discoverability and consistency with the rest of the test suite.

Copilot uses AI. Check for mistakes.
Comment on lines +559 to +562

print(rebinned)
print(expected)

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid print() statements in tests; they add noise to test output and make failures harder to read. Please remove these debug prints.

Copilot uses AI. Check for mistakes.
Comment on lines +595 to +597
print(rebinned)
print(expected)

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid print() statements in tests; they add noise to test output and make failures harder to read. Please remove these debug prints.

Copilot uses AI. Check for mistakes.
Comment on lines +627 to +629
print(rebinned)
print(expected)

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid print() statements in tests; they add noise to test output and make failures harder to read. Please remove these debug prints.

Copilot uses AI. Check for mistakes.
Comment on lines +659 to +663
print("###############")

print(hist)
print(expected)
print(rebinned)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid print() statements in tests; they add noise to test output and make failures harder to read. Please remove these debug prints.

Copilot uses AI. Check for mistakes.
Signed-off-by: Johannes Mueller <johannes.mueller4@de.bosch.com>
@johannes-mueller johannes-mueller merged commit 83f2c4c into develop Apr 9, 2026
24 checks passed
@johannes-mueller johannes-mueller deleted the 234-histogram-rebinning branch April 9, 2026 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rebinning of partially occupied histogram results in too fine bins

2 participants