-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
elminate code duplication to make this much easier to maintain #21
Conversation
Ok, I have drastically cleaned this up and it is now passing tests. Shockingly, this should not be a breaking change. I only just became aware of I have reverted most of the Despite this I only declare the model parameters once with the exception of the objective which is set differently for the different types of models. All other defaults I took directly from the xgboost docs. I have not removed any tests, but I have reorganized them a bit to always use I should be done with this now, so please let me know what you think. |
Great to have this contribution @ExpandingMan . This is one of the first MLJ interfaces ever written and it shows! Will try to review mid-next week, unless @OkonSamuel is able to look at it sooner. |
Codecov Report
@@ Coverage Diff @@
## master #21 +/- ##
===========================================
- Coverage 89.74% 56.15% -33.59%
===========================================
Files 1 1
Lines 156 130 -26
===========================================
- Hits 140 73 -67
- Misses 16 57 +41
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
FYI, I found XGBoost.jl itself to be similarly wonky and I have undertaken a major overhual. I've wound up completely rewriting the package, so I can't say for sure that the maintainers will go for it (I might wind up forking it) but I think most of the changes shouldn't be very controversial and because there are relatively few functions in the external interface it shouldn't be as breaking as one might expect. Anyway it'll probably take them a while to review and if it does merge it won't break this package too badly so no need to wait on this. |
❤️ ❤️ My impression is that XGBoost.jl is looking for maintainers. It may be hard to get a quick review but ping me in your PR and I will add my enthusiastic support. |
Btw, my complete rewrite of XGBoost.jl itself now has this PR. As it is basically a new package I expect it may take a while to get reviewed, so I'm not sure if it's relevant to this discussion. It shouldn't require many changes to this package despite being so comprehensive. Regardless, I'll make the above changes when I get a chance. |
@ExpandingMan Something has gone astray here. The following does not fail on master: using MLJ, MLJXGBoostInterface
X, y = @load_crabs;
m = machine(XGBoostClassifier(), X, y)
julia> fit!(m)
[ Info: Training machine(XGBoostClassifier(test = 1, …), …).
┌ Error: Problem fitting the machine machine(XGBoostClassifier(test = 1, …), …).
└ @ MLJBase ~/.julia/packages/MLJBase/CtxrQ/src/machines.jl:627
[ Info: Running type checks...
[ Info: Type checks okay.
ERROR: Call to XGBoost C function XGBoosterEvalOneIter failed: [10:47:07] /workspace/srcdir/xgboost/src/metric/multiclass_metric.cu:35: Check failed: label_error >= 0 && label_error < static_cast<int32_t>(n_class): MultiClassEvaluation: label must be in [0, num_class), num_class=1 but found 1 in label
Stack trace:
[bt] (0) 1 libxgboost.dylib 0x00000001499d1965 dmlc::LogMessageFatal::~LogMessageFatal() + 117
[bt] (1) 2 libxgboost.dylib 0x0000000149aed161 xgboost::metric::MultiClassMetricsReduction<xgboost::metric::EvalMultiLogLoss>::CheckLabelError(int, unsigned long) const + 225
[bt] (2) 3 libxgboost.dylib 0x0000000149aecff4 xgboost::metric::MultiClassMetricsReduction<xgboost::metric::EvalMultiLogLoss>::CpuReduceMetrics(xgboost::HostDeviceVector<float> const&, xgboost::HostDeviceVector<float> const&, xgboost::HostDeviceVector<float> const&, unsigned long, int) const + 580
[bt] (3) 4 libxgboost.dylib 0x0000000149aecd32 xgboost::metric::EvalMClassBase<xgboost::metric::EvalMultiLogLoss>::Eval(xgboost::HostDeviceVector<float> const&, xgboost::MetaInfo const&, bool) + 1090
[bt] (4) 5 libxgboost.dylib 0x0000000149ab34b1 xgboost::LearnerImpl::EvalOneIter(int, std::__1::vector<std::__1::shared_ptr<xgboost::DMatrix>, std::__1::allocator<std::__1::shared_ptr<xgboost::DMatrix> > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) + 1793
[bt] (5) 6 libxgboost.dylib 0x00000001499e73ea XGBoosterEvalOneIter + 522
[bt] (6) 7 ??? 0x0000000149f536cd 0x0 + 5535774413
[bt] (7) 8 ??? 0x0000000149f54499 0x0 + 5535777945
[bt] (8) 9 ??? 0x0000000149f545f0 0x0 + 5535778288
Stacktrace:
[1] error(::String, ::String, ::String, ::String)
@ Base ./error.jl:42
[2] XGBoosterEvalOneIter(handle::Ptr{Nothing}, iter::Int32, dmats::Vector{Ptr{Nothing}}, evnames::Vector{String}, len::UInt64)
@ XGBoost ~/.julia/packages/XGBoost/D30Xd/src/xgboost_wrapper_h.jl:11
[3] eval_set(bst::XGBoost.Booster, watchlist::Vector{Tuple{XGBoost.DMatrix, String}}, iter
::Int64; feval::Type{Union{}})
@ XGBoost ~/.julia/packages/XGBoost/D30Xd/src/xgboost_lib.jl:229
[4] xgboost(data::Matrix{Float64}, nrounds::Int64; label::Vector{Bool}, param::Vector{Any}
, watchlist::Vector{Any}, metrics::Vector{Any}, obj::Type, feval::Type, group::Vector{Any},
kwargs::Base.Pairs{Symbol, Any, NTuple{39, Symbol}, NamedTuple{(:test, :num_round, :booster, :disable_default_eval_metric, :eta, :num_parallel_tree, :gamma, :max_depth, :min_child_weight, :max_delta_step, :subsample, :colsample_bytree, :colsample_bylevel, :colsample_bynode, :lambda, :alpha, :tree_method, :sketch_eps, :scale_pos_weight, :refresh_leaf, :process_type, :grow_policy, :max_leaves, :max_bin, :predictor, :sample_type, :normalize_type, :rate_drop, :one_drop, :skip_drop, :feature_selector, :top_k, :tweedie_variance_power, :objective, :base_score, :eval_metric, :nthread, :silent, :eval), Tuple{Int64, Int64, String, Int64, Float64, Int64, Float64, Int64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, String, Float64, Float64, Int64, String, String, Int64, Int64, String, String, String, Float64, Int64, Float64, String, Int64, Float64, String, Float64, String, Int64, Bool, String}}})
@ XGBoost ~/.julia/packages/XGBoost/D30Xd/src/xgboost_lib.jl:187
[5] fit(model::XGBoostClassifier, verbosity::Int64, X::NamedTuple{(:FL, :RW, :CL, :CW, :BD), NTuple{5, Vector{Float64}}}, y::CategoricalArrays.CategoricalVector{String, UInt32, String, CategoricalArrays.CategoricalValue{String, UInt32}, Union{}})
@ MLJXGBoostInterface ~/MLJ/MLJXGBoostInterface/src/MLJXGBoostInterface.jl:179
[6] fit_only!(mach::Machine{XGBoostClassifier, true}; rows::Nothing, verbosity::Int64, force::Bool)
@ MLJBase ~/.julia/packages/MLJBase/CtxrQ/src/machines.jl:625
[7] fit_only!
@ ~/.julia/packages/MLJBase/CtxrQ/src/machines.jl:577 [inlined]
[8] #fit!#61
@ ~/.julia/packages/MLJBase/CtxrQ/src/machines.jl:693 [inlined]
[9] fit!(mach::Machine{XGBoostClassifier, true})
@ MLJBase ~/.julia/packages/MLJBase/CtxrQ/src/machines.jl:691
[10] top-level scope
@ REPL[38]:1 FYI. I discovered this when running this test from MLJTestIntegration: using MLJTestIntegration
using MLJXGBoostInterface
const MLJTest = MLJTestIntegration
MLJTest.test_single_target_classifiers(
(name="XGBoostClassifier", package_name="XGBoost"),
level=3,
verbosity=2
)
|
I assume Depending on how much effort this looks like I might wait to see what happens with my XGBoost.jl PR. |
Maybe. It's still a bit experimental. But if you could please address the problem raised, as it is not a problem on the existing master branch. |
Alright, I have come back to this since XGBoost.jl 2.0 has been tagged. It allows for some further simplifications and is mostly working now, but something is still screwed up for multiclass that's probably going to call for some fixes in XGBoost.jl. |
Alright, this should now be completely working with XGBoost.jl 2.0.1. For tests to pass this PR must be merged and tagged. |
@ExpandingMan Appreciate the progress but could you please respond to the review comments and I'll take another look at this, thanks. |
Co-authored-by: Anthony Blaom, PhD <anthony.blaom@gmail.com>
Co-authored-by: Anthony Blaom, PhD <anthony.blaom@gmail.com>
Sorry, I somehow missed the inline comments from before my latest changes, I'll address them when I get a chance later. |
Co-authored-by: Okon Samuel <39421418+OkonSamuel@users.noreply.github.com>
Co-authored-by: Okon Samuel <39421418+OkonSamuel@users.noreply.github.com>
Co-authored-by: Okon Samuel <39421418+OkonSamuel@users.noreply.github.com>
Thanks to @OkonSamuel for helping me navigate the feature importance interface. I think I have addressed everything above, if I missed anything, please feel free to let me know. Again, this will fail tests until XGBoost.jl 2.0.1... we badly need someone else with merge permissions in that repo... I know that Aviks has permission, but I'm not sure if he can add these permissions for others, it would be nice if we could at least get @ablaom permission. |
@ExpandingMan Thanks for the progress!! I'll wait for 2.0.1 and then review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ExpandingMan Thanks for this huge contribution. 🦣 This will make maintenance far easier, moving forward.
Great to see that save
no longer writes to file to get a persistent representation of the model.
The only remains issue I can see is that there is no test for feature importances, and I think there is a bug there, that I pointed out, which highlights the need for one.
I've added a unit test for Everything again passing, that should be it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏾
Needs:
(I haven't tested this yet so I may have a flurry of smaller commits, but I wanted to get this up.)
This PR eliminates most of the code duplication in this package and should make it much easier to maintain. The main thing that has changed is that each model type is now defined exactly once using
Base.@kwdef
. This duplication could be further limited but I don't want to run afoul of the model interface and this should already be a huge improvement.Furthermore, all model arguments with the exception ofnum_round
(which has now default defined by xgboost) now acceptnothing
as an argument which means that they will use the built-in default. This eliminates the need for manually syncing default parameters between models and with xgboost itself.