As I, was discussing on slack about this with some members;
This idea was discussed in the community: See the thread
I thought that currently, fit(Histogram, data) requires users to either specify the number of bins (nbins) or the exact bin edges. While flexible, a common workflow in scientific computing is to specify bins by a fixed width (e.g., "one bin every 0.5 units") independent of the data range.
Although this can be done manually with fit(Histogram, data, min:width:max), it requires calculating bounds and handling potential edge alignment issues manually.
Reference:
The current fit implementations for Histogram are centered around these methods:
|
fit(::Type{Histogram{T}},v::AbstractVector, edg::AbstractVector; closed::Symbol=:left) where {T} = |
|
fit(Histogram{T},(v,), (edg,), closed=closed) |
|
fit(::Type{Histogram{T}},v::AbstractVector; closed::Symbol=:left, nbins=sturges(length(v))) where {T} = |
|
fit(Histogram{T},(v,); closed=closed, nbins=nbins) |
|
fit(::Type{Histogram{T}},v::AbstractVector, wv::AbstractWeights, edg::AbstractVector; closed::Symbol=:left) where {T} = |
|
fit(Histogram{T},(v,), wv, (edg,), closed=closed) |
|
fit(::Type{Histogram{T}},v::AbstractVector, wv::AbstractWeights; closed::Symbol=:left, nbins=sturges(length(v))) where {T} = |
|
fit(Histogram{T}, (v,), wv; closed=closed, nbins=nbins) |
|
|
|
fit(::Type{Histogram}, v::AbstractVector, wv::AbstractWeights{W}, args...; kwargs...) where {W} = fit(Histogram{W}, v, wv, args...; kwargs...) |
Proposed Change:
Add a keyword argument or helper type to allow binning by width. Examples:
# Option 1: keyword argument
fit(Histogram, data; width=0.5)
# Option 2: helper type
fit(Histogram, data, BinWidth(0.5))
My proposal is to extend this dispatch pattern. Currently, we have:
fit(..., edges) (User provides the range)
fit(..., nbins) (Internal histrange calculates the range)
I will add a third path:
3. fit(..., binwidth) (A simple range min:binwidth:max is generated and passed to the existing edges method).
Additional Considerations:
Alignment/anchor: Optional parameter to control where the first bin edge starts.
Variable resolution: Optional method for "equal-count" bins using data quantiles.
This Simplifies the API for new users, aligns with common libraries (NumPy, matplotlib), and reduces errors in manual bin calculations.
As I, was discussing on slack about this with some members;
This idea was discussed in the community: See the thread
I thought that currently, fit(Histogram, data) requires users to either specify the number of bins (nbins) or the exact bin edges. While flexible, a common workflow in scientific computing is to specify bins by a fixed width (e.g., "one bin every 0.5 units") independent of the data range.
Although this can be done manually with fit(Histogram, data, min:width:max), it requires calculating bounds and handling potential edge alignment issues manually.
Reference:
The current fit implementations for Histogram are centered around these methods:
StatsBase.jl/src/hist.jl
Lines 298 to 307 in 7388a8e
Proposed Change:
Add a keyword argument or helper type to allow binning by width. Examples:
My proposal is to extend this dispatch pattern. Currently, we have:
fit(..., edges) (User provides the range)
fit(..., nbins) (Internal histrange calculates the range)
I will add a third path:
3. fit(..., binwidth) (A simple range min:binwidth:max is generated and passed to the existing edges method).
Additional Considerations:
Alignment/anchor: Optional parameter to control where the first bin edge starts.
Variable resolution: Optional method for "equal-count" bins using data quantiles.
This Simplifies the API for new users, aligns with common libraries (NumPy, matplotlib), and reduces errors in manual bin calculations.