-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fit nbis=2 produces results of nbins=3 #410
Comments
I'm not familiar with the histogram code, but AFAICT that's expected, and I get the same thing with R. The number of bins is only a suggestion, and it's not always possible to achieve the requested number. Maybe the documentation needs to be more explicit (it already says "approximate number of bins"). |
Ok, I do not how R handles the issue. Then, to specify a trivial 2 bin histogram, Is one forced to do something like the following? h = fit(Histogram, [0,1,0,0,1,1,1,1], -0.02:0.51:1.0, closed=:right) Why can the function not resolve nbins=2 in this case to any valid edges (the above just one out of an infinite)? |
Just to revive an old issue or "feature": Having a function, guessing how many bins to chose if one do not provide anything, is useful. BUT if I do provide the number of bins, I want to be sure that I exactly get that number of bins in my histogram. What is the point of not having this behaviour? |
I think this can lead to unnecessary grief for the users: imagine two parties trying to sync bin counts... I think we can make |
@Moelf , could you be a bit pro precise on your example? I dont understand how two parties are involved in plotting a histogram. Could give an example? |
it's common in science/data science people who processed data independently want to verify the result are the same (sync). In this case, a naive understanding of |
@Moelf , yes sure! I misunderstood your comment. You suggest |
I fully agree: if the user specifies |
At the very least it would be interesting to start providing an argument to force using exactly the requested number of bins. That wouldn't be breaking. Then in the next breaking release we could change the default if we're happy with how the new argument works. |
It seems to be convention to make 'nice' numbers for bin widths. I like the suggestion by @nalimilan to add a new argument like
|
Hm, having two competing |
I believe the change is not breaking, reason: we said it's approximate anyways, which means
|
I think the cleanest solution would be to have binning algorithms, represented by types/structs that can then have parameters like |
I actually have a histogram package because StatsBase seems reluctant to support more advanced usage such as: hist1 / hist2
# thread-safe
push!(hist, val, weight) |
It would be really nice to support the thread-safe |
Yes, but I think the idea is this is "Base" histogram. Well, I'm happy to be part of StatsBase for sure since I'm already doing ( |
More advanced usage can maybe go in a new issue ... I just want my bin counts 🙃 |
See also #533.
Who said this? Actually at #650 I said almost the contrary. Improvements to histograms in StatsBase would be welcome (though I admit finding reviewers isn't always easy). |
Oh wow @oschulz have those methods not been moved yet? Now is clearly the time :-) |
You're so right @mkborregaard . We should do it dispatched-based, with binning algorithms structs (so they can, in principle, take parameters), not with keyword-symbols like in Plots. And it could tie in with #514 (still on my to-do list). |
If the number of bins is approximated it would be good to document how it is approximated. A few options I found are:
|
Yes, that's the kind of methods we have in Plot now and that we should move over to StatsBase (with algorithm structs, not keywords like in Plots). |
Gentle reminder if schedules permit: @oschulz ...
@mkborregaard ...
@oschulz ...
|
Thanks for the reminder! It's been on the back of my mind, I'll try to get on it soonish. |
This code works now: h = fit(Histogram, x, range(minimum(x), stop=maximum(x), length=nbins)). |
Hi @huixinzhang, I think the core of the discussion centers on the default response to using the |
When fitting a histogram to data with nbins=2 I get the results corresponding to 3 bins.
Same happens if I pass 0.0:0.5:1.5 and closed=:left instead. I get 3 bins.
I guess this is a bug, otherwise how to avoid this issue?
The text was updated successfully, but these errors were encountered: