elminate code duplication to make this much easier to maintain #21

ExpandingMan · 2022-09-14T20:22:17Z

Needs:

XGBoost 2.0.1 release

(I haven't tested this yet so I may have a flurry of smaller commits, but I wanted to get this up.)

This PR eliminates most of the code duplication in this package and should make it much easier to maintain. The main thing that has changed is that each model type is now defined exactly once using Base.@kwdef. This duplication could be further limited but I don't want to run afoul of the model interface and this should already be a huge improvement.

Furthermore, all model arguments with the exception of num_round (which has now default defined by xgboost) now accept nothing as an argument which means that they will use the built-in default. This eliminates the need for manually syncing default parameters between models and with xgboost itself.

ExpandingMan · 2022-09-15T00:20:57Z

Ok, I have drastically cleaned this up and it is now passing tests. Shockingly, this should not be a breaking change.

I only just became aware of MLJModelInterface.@mlj_model which I am now using in this PR.

I have reverted most of the nothing arguments to the models because it occurred to me that the default parameters listed in the xgboost docs were deliberately duplicated so that the model interface could track them. I made exceptions for updater and seed since the former is complicated and the latter is hidden and we should be able to duplicate the default seed behavior of the xgboost library.

Despite this I only declare the model parameters once with the exception of the objective which is set differently for the different types of models. All other defaults I took directly from the xgboost docs.

I have not removed any tests, but I have reorganized them a bit to always use @testset.

I should be done with this now, so please let me know what you think.

ablaom · 2022-09-16T05:52:16Z

Great to have this contribution @ExpandingMan . This is one of the first MLJ interfaces ever written and it shows!

Will try to review mid-next week, unless @OkonSamuel is able to look at it sooner.

codecov-commenter · 2022-09-16T05:54:43Z

Codecov Report

Merging #21 (f59e148) into master (d44debb) will decrease coverage by 33.58%.
The diff coverage is 50.90%.

@@             Coverage Diff             @@
##           master      #21       +/-   ##
===========================================
- Coverage   89.74%   56.15%   -33.59%     
===========================================
  Files           1        1               
  Lines         156      130       -26     
===========================================
- Hits          140       73       -67     
- Misses         16       57       +41

Impacted Files	Coverage Δ
src/MLJXGBoostInterface.jl	`56.15% <50.90%> (-33.59%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

ExpandingMan · 2022-09-16T13:27:44Z

FYI, I found XGBoost.jl itself to be similarly wonky and I have undertaken a major overhual. I've wound up completely rewriting the package, so I can't say for sure that the maintainers will go for it (I might wind up forking it) but I think most of the changes shouldn't be very controversial and because there are relatively few functions in the external interface it shouldn't be as breaking as one might expect.

Anyway it'll probably take them a while to review and if it does merge it won't break this package too badly so no need to wait on this.

src/MLJXGBoostInterface.jl

src/docstrings.jl

src/MLJXGBoostInterface.jl

ablaom · 2022-09-18T22:07:05Z

FYI, I found XGBoost.jl itself to be similarly wonky and I have undertaken a major overhual. I've wound up completely rewriting the package, so I can't say for sure that the maintainers will go for it (I might wind up forking it) but I think most of the changes shouldn't be very controversial and because there are relatively few functions in the external interface it shouldn't be as breaking as one might expect.

❤️ ❤️ My impression is that XGBoost.jl is looking for maintainers. It may be hard to get a quick review but ping me in your PR and I will add my enthusiastic support.

src/MLJXGBoostInterface.jl

ExpandingMan · 2022-09-18T22:47:20Z

Btw, my complete rewrite of XGBoost.jl itself now has this PR. As it is basically a new package I expect it may take a while to get reviewed, so I'm not sure if it's relevant to this discussion. It shouldn't require many changes to this package despite being so comprehensive.

Regardless, I'll make the above changes when I get a chance.

ablaom · 2022-09-18T22:54:47Z

@ExpandingMan Something has gone astray here. The following does not fail on master:

using MLJ, MLJXGBoostInterface
X, y = @load_crabs;
m = machine(XGBoostClassifier(), X, y)

julia> fit!(m)
[ Info: Training machine(XGBoostClassifier(test = 1, …), …).
┌ Error: Problem fitting the machine machine(XGBoostClassifier(test = 1, …), …). 
└ @ MLJBase ~/.julia/packages/MLJBase/CtxrQ/src/machines.jl:627
[ Info: Running type checks... 
[ Info: Type checks okay. 
ERROR: Call to XGBoost C function XGBoosterEvalOneIter failed: [10:47:07] /workspace/srcdir/xgboost/src/metric/multiclass_metric.cu:35: Check failed: label_error >= 0 && label_error < static_cast<int32_t>(n_class): MultiClassEvaluation: label must be in [0, num_class), num_class=1 but found 1 in label
Stack trace:
  [bt] (0) 1   libxgboost.dylib                    0x00000001499d1965 dmlc::LogMessageFatal::~LogMessageFatal() + 117
  [bt] (1) 2   libxgboost.dylib                    0x0000000149aed161 xgboost::metric::MultiClassMetricsReduction<xgboost::metric::EvalMultiLogLoss>::CheckLabelError(int, unsigned long) const + 225
  [bt] (2) 3   libxgboost.dylib                    0x0000000149aecff4 xgboost::metric::MultiClassMetricsReduction<xgboost::metric::EvalMultiLogLoss>::CpuReduceMetrics(xgboost::HostDeviceVector<float> const&, xgboost::HostDeviceVector<float> const&, xgboost::HostDeviceVector<float> const&, unsigned long, int) const + 580
  [bt] (3) 4   libxgboost.dylib                    0x0000000149aecd32 xgboost::metric::EvalMClassBase<xgboost::metric::EvalMultiLogLoss>::Eval(xgboost::HostDeviceVector<float> const&, xgboost::MetaInfo const&, bool) + 1090
  [bt] (4) 5   libxgboost.dylib                    0x0000000149ab34b1 xgboost::LearnerImpl::EvalOneIter(int, std::__1::vector<std::__1::shared_ptr<xgboost::DMatrix>, std::__1::allocator<std::__1::shared_ptr<xgboost::DMatrix> > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) + 1793
  [bt] (5) 6   libxgboost.dylib                    0x00000001499e73ea XGBoosterEvalOneIter + 522
  [bt] (6) 7   ???                                 0x0000000149f536cd 0x0 + 5535774413
  [bt] (7) 8   ???                                 0x0000000149f54499 0x0 + 5535777945
  [bt] (8) 9   ???                                 0x0000000149f545f0 0x0 + 5535778288


Stacktrace:
  [1] error(::String, ::String, ::String, ::String)
    @ Base ./error.jl:42
  [2] XGBoosterEvalOneIter(handle::Ptr{Nothing}, iter::Int32, dmats::Vector{Ptr{Nothing}}, evnames::Vector{String}, len::UInt64)                                                       
    @ XGBoost ~/.julia/packages/XGBoost/D30Xd/src/xgboost_wrapper_h.jl:11
  [3] eval_set(bst::XGBoost.Booster, watchlist::Vector{Tuple{XGBoost.DMatrix, String}}, iter
::Int64; feval::Type{Union{}})                                                             
    @ XGBoost ~/.julia/packages/XGBoost/D30Xd/src/xgboost_lib.jl:229
  [4] xgboost(data::Matrix{Float64}, nrounds::Int64; label::Vector{Bool}, param::Vector{Any}
, watchlist::Vector{Any}, metrics::Vector{Any}, obj::Type, feval::Type, group::Vector{Any}, 
kwargs::Base.Pairs{Symbol, Any, NTuple{39, Symbol}, NamedTuple{(:test, :num_round, :booster, :disable_default_eval_metric, :eta, :num_parallel_tree, :gamma, :max_depth, :min_child_weight, :max_delta_step, :subsample, :colsample_bytree, :colsample_bylevel, :colsample_bynode, :lambda, :alpha, :tree_method, :sketch_eps, :scale_pos_weight, :refresh_leaf, :process_type, :grow_policy, :max_leaves, :max_bin, :predictor, :sample_type, :normalize_type, :rate_drop, :one_drop, :skip_drop, :feature_selector, :top_k, :tweedie_variance_power, :objective, :base_score, :eval_metric, :nthread, :silent, :eval), Tuple{Int64, Int64, String, Int64, Float64, Int64, Float64, Int64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, String, Float64, Float64, Int64, String, String, Int64, Int64, String, String, String, Float64, Int64, Float64, String, Int64, Float64, String, Float64, String, Int64, Bool, String}}})
    @ XGBoost ~/.julia/packages/XGBoost/D30Xd/src/xgboost_lib.jl:187
  [5] fit(model::XGBoostClassifier, verbosity::Int64, X::NamedTuple{(:FL, :RW, :CL, :CW, :BD), NTuple{5, Vector{Float64}}}, y::CategoricalArrays.CategoricalVector{String, UInt32, String, CategoricalArrays.CategoricalValue{String, UInt32}, Union{}})      
    @ MLJXGBoostInterface ~/MLJ/MLJXGBoostInterface/src/MLJXGBoostInterface.jl:179
  [6] fit_only!(mach::Machine{XGBoostClassifier, true}; rows::Nothing, verbosity::Int64, force::Bool)                                                                                
    @ MLJBase ~/.julia/packages/MLJBase/CtxrQ/src/machines.jl:625
  [7] fit_only!
    @ ~/.julia/packages/MLJBase/CtxrQ/src/machines.jl:577 [inlined]
  [8] #fit!#61
    @ ~/.julia/packages/MLJBase/CtxrQ/src/machines.jl:693 [inlined]
  [9] fit!(mach::Machine{XGBoostClassifier, true})
    @ MLJBase ~/.julia/packages/MLJBase/CtxrQ/src/machines.jl:691
 [10] top-level scope
    @ REPL[38]:1

FYI. I discovered this when running this test from MLJTestIntegration:

using MLJTestIntegration
using MLJXGBoostInterface

const MLJTest = MLJTestIntegration

MLJTest.test_single_target_classifiers(
    (name="XGBoostClassifier", package_name="XGBoost"),
    level=3,
    verbosity=2
)

ExpandingMan · 2022-09-18T23:06:28Z

I assume MLJTestIntegration should be added to unit tests? I was unaware of it.

Depending on how much effort this looks like I might wait to see what happens with my XGBoost.jl PR.

ablaom · 2022-09-19T00:34:12Z

I assume MLJTestIntegration should be added to unit tests? I was unaware of it.

Maybe. It's still a bit experimental.

But if you could please address the problem raised, as it is not a problem on the existing master branch.

…Boost.jl

ExpandingMan · 2022-11-02T23:17:57Z

Alright, I have come back to this since XGBoost.jl 2.0 has been tagged. It allows for some further simplifications and is mostly working now, but something is still screwed up for multiclass that's probably going to call for some fixes in XGBoost.jl.

ExpandingMan · 2022-11-03T22:08:06Z

Alright, this should now be completely working with XGBoost.jl 2.0.1. For tests to pass this PR must be merged and tagged.

src/docstrings.jl

ablaom · 2022-11-06T22:32:09Z

@ExpandingMan Appreciate the progress but could you please respond to the review comments and I'll take another look at this, thanks.

Co-authored-by: Anthony Blaom, PhD <anthony.blaom@gmail.com>

ExpandingMan · 2022-11-07T14:54:57Z

Sorry, I somehow missed the inline comments from before my latest changes, I'll address them when I get a chance later.

src/MLJXGBoostInterface.jl

Co-authored-by: Okon Samuel <39421418+OkonSamuel@users.noreply.github.com>

ExpandingMan · 2022-11-07T23:43:04Z

Thanks to @OkonSamuel for helping me navigate the feature importance interface.

I think I have addressed everything above, if I missed anything, please feel free to let me know.

Again, this will fail tests until XGBoost.jl 2.0.1... we badly need someone else with merge permissions in that repo... I know that Aviks has permission, but I'm not sure if he can add these permissions for others, it would be nice if we could at least get @ablaom permission.

ablaom · 2022-11-08T21:55:32Z

@ExpandingMan Thanks for the progress!! I'll wait for 2.0.1 and then review.

ablaom

@ExpandingMan Thanks for this huge contribution. 🦣 This will make maintenance far easier, moving forward.

Great to see that save no longer writes to file to get a persistent representation of the model.

The only remains issue I can see is that there is no test for feature importances, and I think there is a bug there, that I pointed out, which highlights the need for one.

src/docstrings.jl

Project.toml

src/MLJXGBoostInterface.jl

ExpandingMan · 2022-11-10T22:49:35Z

I've added a unit test for feature_importances as that was badly broken.

Everything again passing, that should be it.

ablaom

👍🏾

ExpandingMan added 3 commits September 14, 2022 14:15

added colsample_bynode parameter

a233034

eliminated most code duplication

3af8ea5

utilize @mlj_model; lots of cleanup; passing tests

823cd9e

ablaom reviewed Sep 18, 2022

View reviewed changes

src/MLJXGBoostInterface.jl Outdated Show resolved Hide resolved

ablaom reviewed Sep 18, 2022

View reviewed changes

src/MLJXGBoostInterface.jl Outdated Show resolved Hide resolved

ablaom reviewed Sep 18, 2022

View reviewed changes

src/docstrings.jl Outdated Show resolved Hide resolved

ablaom reviewed Sep 18, 2022

View reviewed changes

src/docstrings.jl Outdated Show resolved Hide resolved

OkonSamuel reviewed Sep 18, 2022

View reviewed changes

src/MLJXGBoostInterface.jl Outdated Show resolved Hide resolved

OkonSamuel reviewed Sep 18, 2022

View reviewed changes

src/MLJXGBoostInterface.jl Outdated Show resolved Hide resolved

ablaom reviewed Sep 18, 2022

View reviewed changes

src/MLJXGBoostInterface.jl Outdated Show resolved Hide resolved

ablaom mentioned this pull request Sep 21, 2022

Add generic interface tests from MLJTestIntegration #22

Closed

still something FUBAR with multiclass, probably will need fixes in XG…

62fc631

…Boost.jl

all tests now pass with XGBoost.jl 2.0.1

d34728b

tylerjthomas9 reviewed Nov 5, 2022

View reviewed changes

src/docstrings.jl Outdated Show resolved Hide resolved

ExpandingMan and others added 3 commits November 7, 2022 09:52

Update src/docstrings.jl

7defb18

Co-authored-by: Anthony Blaom, PhD <anthony.blaom@gmail.com>

Update src/docstrings.jl

440dabc

Co-authored-by: Anthony Blaom, PhD <anthony.blaom@gmail.com>

Update docstrings.jl

fdda16b

OkonSamuel reviewed Nov 7, 2022

View reviewed changes

src/MLJXGBoostInterface.jl Outdated Show resolved Hide resolved

OkonSamuel reviewed Nov 7, 2022

View reviewed changes

src/MLJXGBoostInterface.jl Outdated Show resolved Hide resolved

OkonSamuel reviewed Nov 7, 2022

View reviewed changes

src/MLJXGBoostInterface.jl Outdated Show resolved Hide resolved

OkonSamuel reviewed Nov 7, 2022

View reviewed changes

src/MLJXGBoostInterface.jl Outdated Show resolved Hide resolved

ExpandingMan and others added 4 commits November 7, 2022 18:30

Update src/MLJXGBoostInterface.jl

181269a

Co-authored-by: Okon Samuel <39421418+OkonSamuel@users.noreply.github.com>

Update src/MLJXGBoostInterface.jl

3cb3a52

Co-authored-by: Okon Samuel <39421418+OkonSamuel@users.noreply.github.com>

Update src/MLJXGBoostInterface.jl

0b1e564

Co-authored-by: Okon Samuel <39421418+OkonSamuel@users.noreply.github.com>

some cleanup

448c803

ablaom closed this Nov 10, 2022

ablaom reopened this Nov 10, 2022

ablaom reviewed Nov 10, 2022

View reviewed changes

more fixes

f59e148

ablaom approved these changes Nov 11, 2022

View reviewed changes

ablaom merged commit 6df79c1 into JuliaAI:master Nov 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

elminate code duplication to make this much easier to maintain #21

elminate code duplication to make this much easier to maintain #21

ExpandingMan commented Sep 14, 2022 •

edited by ablaom

Loading

ExpandingMan commented Sep 15, 2022

ablaom commented Sep 16, 2022

codecov-commenter commented Sep 16, 2022 •

edited

Loading

ExpandingMan commented Sep 16, 2022

ablaom commented Sep 18, 2022

ExpandingMan commented Sep 18, 2022

ablaom commented Sep 18, 2022 •

edited

Loading

ExpandingMan commented Sep 18, 2022

ablaom commented Sep 19, 2022 •

edited

Loading

ExpandingMan commented Nov 2, 2022

ExpandingMan commented Nov 3, 2022

ablaom commented Nov 6, 2022

ExpandingMan commented Nov 7, 2022 •

edited

Loading

ExpandingMan commented Nov 7, 2022

ablaom commented Nov 8, 2022

ablaom left a comment

ExpandingMan commented Nov 10, 2022

ablaom left a comment

elminate code duplication to make this much easier to maintain #21

elminate code duplication to make this much easier to maintain #21

Conversation

ExpandingMan commented Sep 14, 2022 • edited by ablaom Loading

ExpandingMan commented Sep 15, 2022

ablaom commented Sep 16, 2022

codecov-commenter commented Sep 16, 2022 • edited Loading

Codecov Report

ExpandingMan commented Sep 16, 2022

ablaom commented Sep 18, 2022

ExpandingMan commented Sep 18, 2022

ablaom commented Sep 18, 2022 • edited Loading

ExpandingMan commented Sep 18, 2022

ablaom commented Sep 19, 2022 • edited Loading

ExpandingMan commented Nov 2, 2022

ExpandingMan commented Nov 3, 2022

ablaom commented Nov 6, 2022

ExpandingMan commented Nov 7, 2022 • edited Loading

ExpandingMan commented Nov 7, 2022

ablaom commented Nov 8, 2022

ablaom left a comment

Choose a reason for hiding this comment

ExpandingMan commented Nov 10, 2022

ablaom left a comment

Choose a reason for hiding this comment

ExpandingMan commented Sep 14, 2022 •

edited by ablaom

Loading

codecov-commenter commented Sep 16, 2022 •

edited

Loading

ablaom commented Sep 18, 2022 •

edited

Loading

ablaom commented Sep 19, 2022 •

edited

Loading

ExpandingMan commented Nov 7, 2022 •

edited

Loading