-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Move histograms to separate package #650
Comments
I'd be willing to take care of the complete transition. |
What would be the advantage of doing this? I mean, most parts of StatsBase could live independently from each other, but that makes it hard for users to find where the functionality lives. |
Hm, I had the impression that other parts of StatsBase were coupled more tightly. And it would make it easier to evolve Histograms since StatsBase is such a central package that 0.x updates with breaking changes take ages to propagate though the ecosystem - so it takes a long time before higher-level packages can access the new version. Far, far fewer packages use histograms than use StatsBase. And just in general I thought more modularity would be a good thing. Also, thought (I speak under correction) that there was a longer term plan to restructure/modularize StatsBase anyhow (move parts to Statistics, other structural changes)? |
Yes the plan has been for a long time to move essential features to Statistics and other parts to specialized packages. So I assume you think histograms belong to the latter rather than the former?
Do you mean that you would like to make breaking changes to histogram code? Otherwise we can tag bugfix releases of StatsBase. We could also tag 1.0 so that we can make minor releases when we add new features. |
Yes, I think both StatsBase and a new "Histograms" package could both depends on Statistics without depending on each other.
Well, #513 and #514 (dragged them on too long, but I really want to work on those now) will require changes to the Histogram struct that may be too substantial for a bugfix release. If there are breaking will depend on how I do it exactly - and also on whether addition of new type parameters (array type, for #513) would be considered breaking or not. This can all be done within StateBase, of course, I just thought that if I'm going to do a bit of an overhaul on |
I can't remember why exactly, but KernelDensity.jl has its own implementation: |
I agree. |
Adding type parameters or changing the internal fields of the struct isn't considered as breaking. Anyway regarding #514 it seems that @andreasnoack wanted to see a proposal first and that didn't happen yet. So better decide whether we want to make changes before considering them as breaking. Also since we want to move to 1.0, being breaking isn't the end of the world, that's a good occasion to stop using 0.x.
"Monster" sounds like an exaggeration. StatsBase seems to load quite quickly, and it's much smaller than many packages. Yes it's a collection of many small functions in different areas, but that makes them easier to discover.
Sure. But KernelDensity already depends on StatsBase so that can perfectly be done now by implementing the relevant functions there. Overall I think the only relevant discussion is about the general organization of the stats packages. If we are certain we don't want histograms to live in Statistics at some point then it could make sense to move them to a separate package. However that's a quite limited set of features so I'm not sure that makes sense. We're not going to create one package for each page in the manual, that would be a nightmare for users and maintainers. |
Thanks!
Sure, I'll have to flesh that out, of course - was planning to do that in the form of a PR.
Yes, I admit that may have been a bit strong. :-) Package load time is about 0.35 s on my system (on v1.6-beta1), which is certainly not horrible (but also not negligible, IMHO).
I agree. I wasn't trying to argue that splitting off histograms is in any way necessary for impromevments like #513 and #514. It's just while I was finally finding some time to get on that, that I realized that a separate Histograms package might be a good idea in general. But this proposal is indeed not coupled to #513 and #514. |
@Moelf , you're also interested in histograms, what's your take on this? |
@oschulz thx for pinning me! I'm leaning towards "yes". I think a few good things can spin out of it: There are quite a few things we want to have in a comprehensive/"SOTA" histogram pkg:
These can be part of the StatsBase but it also may feel too heavy to maintain maybe. An alternative would be to keep some basic things in StatsBase but separate and greatly re-write / extend a new (I actually have a very crude fast 1D histogram implementation w/ error that is precisely based on I have strong interests in contributing to this, I will keep an eye on this thread. |
Thanks for your input, @Moelf ! This would actually be a good motivation for a standalone Histograms package - some of the stuff you want could be done within StatsBase, but the rest would IMHO better fit into a separate package. Maybe your "nice things" could be included too, without making it too heavy. Otherwise more special histogram types could of course go into separate packages, as you suggest. |
I dont know if its necessary to get move histograms into its own package. If it's really self contained than that makes sense. But I don't think it should be a moving target of a package. However I wouldn't mind an overhaul for how histograms support |
Oh, no. But histograms are really pretty much self-contained in StatsBase.
Good point! |
Histograms are kind of a standalone thing in StatsBase right now - they're not used anywhere in StatsBase, and they use little to no functionality from StatsBase themselves. So I propose to factor them out into a separate package "JuliaStats/Histograms.jl".
StatsBase.jl could reexport Histograms.jl for a while, to keep compatibility, and then stop reexporting it with the next breaking release of StatsBase.
@andreasnoack , @simonbyrne , what do you think?
The text was updated successfully, but these errors were encountered: