-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using intervals to represent histogram binning #514
Comments
To me, this seems like complicating the code with very limited practical benefit. Introducing new types always has some overhead and I don't really see the benefit in this case. In particular, I question that allowing flexibility wrt which endpoints are included is worth the trouble. The underlying code would have to take this into account. Histograms are for continuous distributions so shouldn't really matter. Do you have applications where it would be useful? |
Sure - we frequently do detailed analysis of histograms. It would be much more natural if |
I'm well aware that you do a lot of work with histograms so my questions was more if you could give an example that could help me understand a situation where it would be easier to use intervals. I'd mainly like to understand where the new types would be useful and I'd like to avoid that users would have to bother with |
For example, when scanning a histogram for certain patterns (peaks, etc.), one often needs to take left and right bound of each bin (or it's width and center) into account. So you'll have code that looks at each bin and it's weight - but currently, we don't have a one-to-one relationship between weight-indices and bin-indices, the user has to reconstruct the bin information in an I think when dealing with histograms, we should be able to have a representation of a bin on an axis - and the natural representation is IMHO an interval. One could even broadcast over bins and weights and stuff like that.
Well,
I don't think users would have to bother with that at all - they would create and use histograms as they do now. The only API-change would be that long-term we'd deprecate accessing |
It would be interesting to try this to see how it works: it would be nice to if the Would be good to see if this could be build on top of https://github.com/JuliaMath/IntervalSets.jl, which was intended to be a common "interval" package (though I assume more functionality would be required than what is available there). |
Ok, in that case I will at least do a prototype, and then try to convince @andreasnoack based on that. :-) Will take a bit of time, have lot's of stuff (with deadlines) going on at me moment.
Yes, that's exactly what I have in mind. |
Update: Not abandoned, just deferred due to time constraints. I still definitely plan to pursue this. |
Just a life sign - I'm still interested in this, just overloaded, but it's definitely on my to-do list. |
Haven't forgotten about this. |
Still not forgotten, just overloaded ... |
Sorry for the long silence on this - I've been thinking a bit: Now that view-like wrapper objects don't have to be stack-allocated anymore, we can just add a method See also #650. |
Since I'll have to touch the structure of
Histogram
for 513 anyhow - there's something I've been thinking a bit for a while, and it might be easiest to do this in one go:I've always been a bit unhappy about the way histogram edges are handled - the fact that edge vectors are one entry longer than the weights each axis always struck me as somewhat inelegant. It also makes histogram analysis code a bit awkward at times.
I think the natural representations of bins are intervals. So the binning of an edge of a histogram should be a vector of (usually contiguous) intervals. Internally, it should in general still be stored the current way, as a vector of real numbers (length + 1), but it should look like a vector of intervals. So I'd like to propose the following:
A new type
ContiguousIntervals
that implements a view of anAbstractVector{T}
with length L + 1 as a vector ofInterval{:closed,:open,T}
(resp.Interval{:open,:closed,T}
) with length L. Maybe this could be contibuted toIntervalSets
(@timholy, do you think that would fit in)?Histogram
's fieldedges::E
gets replaced bybinning::E
withE<:NTuple{N,AbstractVector{<:IntervalSets.Interval}}
. By default, we'd useContiguousIntervals
, but we'd allow any vector of intervals.We use
Base.getproperty
and friends to provide a virtual propertyedges
, for backward compatibility.Operations like getting all left/right bin edges or all bin centers would simply become
broadcasts of
minimum
/maximum
/mean
over the binning intervals, etc. Stuff like plotting and histogram analysis code would become simpler and more elegant in general.I'd like to implement this, but as it's not a trivial change, I'd be glad to get a provisional Ok, before I go ahead (@nalimilan @ararslan @quinnj?).
The text was updated successfully, but these errors were encountered: