Serialized Composite Model Fails with XGBoost #927

pazzo83 · 2023-08-18T02:28:14Z

I am having an issue with a serialized MLJ pipeline that includes an XGBoostClassifier - when trying to run predict on the loaded model, I get the following error:

┌ Error: Failed to apply the operation `predict` to the machine machine(:xg_boost_classifier, …), which receives it's data arguments from one or more nodes in a learning network. Possibly, one of these nodes is delivering data that is incompatible with the machine's model.
│ Model (xg_boost_classifier):
│ input_scitype = Unknown
│ target_scitype =Unknown
│ output_scitype =Unknown
│
│ Incoming data:
│ arg of predict	scitype
│ -------------------------------------------
│ Node @103	Table{AbstractVector{Continuous}}
│
│ Learning network sources:
│ source	scitype
│ -------------------------------------------
│ Source @253	Nothing
│ Source @396	Nothing
└ @ MLJBase ~/.julia/packages/MLJBase/0rn2V/src/composition/learning_networks/nodes.jl:153
ERROR: XGBoostError: (caller: XGBoosterPredictFromDMatrix)
[00:01:27] /workspace/srcdir/xgboost/src/c_api/c_api.cc:913: Booster has not been initialized or has already been disposed.
Stack trace:
  [bt] (0) 1   libxgboost.dylib                    0x000000028780cc4c dmlc::LogMessageFatal::~LogMessageFatal() + 124
  [bt] (1) 2   libxgboost.dylib                    0x0000000287827614 XGBoosterPredictFromDMatrix + 116
  [bt] (2) 3   ???                                 0x00000002a858c0c8 0x0 + 11414323400
  [bt] (3) 4   ???                                 0x00000002a8404194 0x0 + 11412717972
  [bt] (4) 5   libjulia-internal.1.9.dylib         0x00000001048cdacc do_apply + 744
  [bt] (5) 6   ???                                 0x00000002a802c1a0 0x0 + 11408687520
  [bt] (6) 7   ???                                 0x00000002a79c40d8 0x0 + 11401969880
  [bt] (7) 8   ???                                 0x00000002a78e40d4 0x0 + 11401052372
  [bt] (8) 9   libjulia-internal.1.9.dylib         0x00000001048da53c do_call + 188

I've noted this on both Julia 1.8.5 and Julia 1.9.2 using up-to-date libraries for MLJ and XGBoost (which I believe recently updated to a version 2.0.0). I also have noted that recently, the MLJXGBoostInterface library changed the way they serialize the XGBoost models.

The text was updated successfully, but these errors were encountered:

pazzo83 · 2023-08-18T02:41:05Z

My work around right now is serializing them separately and then having some wrapper code to use them as a "pipeline" when I deserialize them (basically my entire pipeline minus the classifier is one component, then the xgboost model is the other) - and that works.

ablaom · 2024-01-17T04:09:11Z

Sorry, this one seems to have slipped by. Thanks for reporting this potential bug!

I am not able to reproduce. For example, the code below works for me. Can I have more context please?

using MLJBase, MLJModels
booster = (@load XGBoostClassifier)()
pipe = ContinuousEncoder |> booster
X, y = @load_crabs
mach = machine(pipe, X, y) |> fit!
MLJBase.save("junk.jls", mach)
mach2 = machine("junk.jls")
julia> predict(mach2, X)
200-element UnivariateFiniteVector{Multiclass{2}, String, UInt32, Float32}:
 UnivariateFinite{Multiclass{2}}(B=>0.883, O=>0.117)
 UnivariateFinite{Multiclass{2}}(B=>0.883, O=>0.117)
 UnivariateFinite{Multiclass{2}}(B=>0.897, O=>0.103)
 ⋮
 UnivariateFinite{Multiclass{2}}(B=>0.0963, O=>0.904)

(jl_XBBUpe) pkg> st
Status `/private/var/folders/4n/gvbmlhdc8xj973001s6vdyw00000gq/T/jl_XBBUpe/Project.toml`
  [a7f614a8] MLJBase v1.1.0
  [d491faf4] MLJModels v0.16.14
  [54119dfa] MLJXGBoostInterface v0.3.10

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 12 × Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
  Threads: 17 on 12 virtual cores
Environment:
  JULIA_LTS_PATH = /Applications/Julia-1.6.app/Contents/Resources/julia/bin/julia
  JULIA_PATH = /Applications/Julia-1.10.app/Contents/Resources/julia/bin/julia
  JULIA_EGLOT_PATH = /Applications/Julia-1.10.app/Contents/Resources/julia/bin/julia
  JULIA_NUM_THREADS = 12
  DYLD_LIBRARY_PATH = /usr/local/homebrew/Cellar/libomp/9.0.1/lib/
  JULIA_NIGHTLY_PATH = /Applications/Julia-1.10.app/Contents/Resources/julia/bin/julia

ablaom · 2024-02-21T22:56:49Z

But this is failing for me (adapted from aJulia Discourse post):

using MLJ

XGBC = @load XGBoostClassifier
xgb = XGBC()
ohe = OneHotEncoder()

# Pipeline OneHotEncoder > XGBoost
xgb_pipe = ohe |> xgb

# Setting Target and Features tables:
# y, X = unpack(df, ==(:y_label), col->true)
X, y = @load_iris

train, test = partition(1:length(y), 0.1, shuffle=true)

xgbm = machine(xgb_pipe, X, y, cache=false)
fit!(xgbm, rows=train, verbosity=0)

MLJ.save("mach_xgb_pipe.jls", xgbm)

# Restoring the model and using for predictions:
mach_restored = machine("mach_xgb_pipe.jls")

yhat = predict_mode(mach_restored , selectrows(X, test))

# ┌ Error: Failed to apply the operation `predict` to the machine machine(:xg_boost_classifier, …), which receives it's data arguments from one or more nodes in a learning network. Possibly, one of these nodes is delivering data that is incompatible with the machine's model.
# │ Model (xg_boost_classifier):
# │ input_scitype = Unknown
# │ target_scitype =Unknown
# │ output_scitype =Unknown
# │ 
# │ Incoming data:
# │ arg of predict        scitype
# │ -------------------------------------------
# │ Node @034 → :one_hot_encoder  Table{AbstractVector{Continuous}}
# │ 
# │ Learning network sources:
# │ source        scitype
# │ -------------------------------------------
# │ Source @978   Nothing
# │ Source @367   Nothing
# └ @ MLJBase ~/.julia/packages/MLJBase/mIaqI/src/composition/learning_networks/nodes.jl:153
# ERROR: XGBoostError: (caller: XGBoosterPredictFromDMatrix)
# [11:50:45] /workspace/srcdir/xgboost/src/c_api/c_api.cc:1059: Booster has not been initialized or has already been disposed.
# Stack trace:
#   [bt] (0) 1   libxgboost.dylib                    0x00000001723c8325 dmlc::LogMessageFatal::~LogMessageFatal() + 117
#   [bt] (1) 2   libxgboost.dylib                    0x00000001723e91f9 XGBoosterPredictFromDMatrix + 121
#   [bt] (2) 3   ???                                 0x0000000170d61f13 0x0 + 6188048147


# Stacktrace:
#  [1] _apply(y_plus::Tuple{…}, input::@NamedTuple{…}; kwargs::@Kwargs{})
#    @ MLJBase ~/.julia/packages/MLJBase/mIaqI/src/composition/learning_networks/nodes.jl:159
#  [2] _apply
#    @ MLJBase ~/.julia/packages/MLJBase/mIaqI/src/composition/learning_networks/nodes.jl:144 [inlined]                                                                                  
#  [3] (::Node{…})(Xnew::@NamedTuple{…})
#    @ MLJBase ~/.julia/packages/MLJBase/mIaqI/src/composition/learning_networks/nodes.jl:140
#  [4] output_and_report(signature::MLJBase.Signature{…}, operation::Symbol, Xnew::@NamedTuple
# {…})                                                                                       
#    @ MLJBase ~/.julia/packages/MLJBase/mIaqI/src/composition/learning_networks/signatures.jl:374                                                                           
#  [5] predict
#    @ ~/.julia/packages/MLJBase/mIaqI/src/operations.jl:191 [inlined]
#  [6] predict_mode(m::MLJBase.ProbabilisticPipeline{…}, fitresult::MLJBase.Signature{…}, Xnew
# ::@NamedTuple{…})                                                                          
#    @ MLJBase ~/.julia/packages/MLJBase/mIaqI/src/operations.jl:209
#  [7] predict_mode(mach::Machine{…}, Xraw::@NamedTuple{…})
#    @ MLJBase ~/.julia/packages/MLJBase/mIaqI/src/operations.jl:133
#  [8] top-level scope
#    @ REPL[28]:1

# caused by: XGBoostError: (caller: XGBoosterPredictFromDMatrix)
# [11:50:45] /workspace/srcdir/xgboost/src/c_api/c_api.cc:1059: Booster has not been initialized or has already been disposed.  
# Stack trace:
#   [bt] (0) 1   libxgboost.dylib                    0x00000001723c8325 dmlc::LogMessageFatal::~LogMessageFatal() + 117
#   [bt] (1) 2   libxgboost.dylib                    0x00000001723e91f9 XGBoosterPredictFromDMatrix + 121
#   [bt] (2) 3   ???                                 0x0000000170d61f13 0x0 + 6188048147


# Stacktrace:
#     [1] xgbcall(::Function, ::Ptr{Nothing}, ::Vararg{Any})

# < abridged >

(jl_6G9Ncl) pkg> st
Status `/private/var/folders/4n/gvbmlhdc8xj973001s6vdyw00000gq/T/jl_6G9Ncl/Project.toml`
  [add582a8] MLJ v0.20.2
  [54119dfa] MLJXGBoostInterface v0.3.10

(jl_6G9Ncl) pkg> st
Status `/private/var/folders/4n/gvbmlhdc8xj973001s6vdyw00000gq/T/jl_6G9Ncl/Project.toml`
  [add582a8] MLJ v0.20.2
  [54119dfa] MLJXGBoostInterface v0.3.10

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 12 × Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
  Threads: 17 on 12 virtual cores
Environment:
  JULIA_LTS_PATH = /Applications/Julia-1.6.app/Contents/Resources/julia/bin/julia
  JULIA_PATH = /Applications/Julia-1.10.app/Contents/Resources/julia/bin/julia
  JULIA_EGLOT_PATH = /Applications/Julia-1.7.app/Contents/Resources/julia/bin/julia
  JULIA_NUM_THREADS = 12
  JULIA_NIGHTLY_PATH = /Applications/Julia-1.10.app/Contents/Resources/julia/bin/julia

ablaom · 2024-02-21T23:02:05Z

@ExpandingMan I wonder if there is something suspect about the way we are currently serialising XGBoost models in MLJXGBoostInterface.jl. Is it really persistent? Or if you have any other ideas?

The problem here is that XGBoost models do serialise, but not in pipelines.

In particular, note the following part of the stack trace:

# ERROR: XGBoostError: (caller: XGBoosterPredictFromDMatrix)
# [11:50:45] /workspace/srcdir/xgboost/src/c_api/c_api.cc:1059: Booster has not been initialized or has already been disposed.
# Stack trace:
#   [bt] (0) 1   libxgboost.dylib                    0x00000001723c8325 dmlc::LogMessageFatal::~LogMessageFatal() + 117
#   [bt] (1) 2   libxgboost.dylib                    0x00000001723e91f9 XGBoosterPredictFromDMatrix + 121
#   [bt] (2) 3   ???                                 0x0000000170d61f13 0x0 + 6188048147

paulotrefosco · 2024-02-22T10:15:35Z

Hello! @ablaom

I also run into this problem. I'll try to provide everything here with a toy example. Since I am not proficient on git/codes/etc, let me know if there are other things that I can do to help!

df_example_MLJ.csv

using CSV
using DataFrames

df = CSV.read("df_example_MLJ.csv", DataFrame)

df = coerce(df, :x1 => Continuous,
                :x2 => Multiclass,
                :x3 => Multiclass,
                :x4 => Multiclass,
                :x5 => Continuous,
                :x6 => Continuous,
                :x7 => Continuous,
                :x8 => Multiclass,
                :x9 => Multiclass,
                :y  => OrderedFactor)
schema(df)

using MLJ
XGBC = @load XGBoostClassifier
xgb = XGBC()
ohe = OneHotEncoder()

# Pipeline OneHotEncoder > XGBoost
xgb_pipe = ohe |> xgb

# Setting Target and Features tables:
y, X = unpack(df, ==(:y), col->true)

train, test = partition(1:length(y), 0.1, shuffle=true)

xgbm = machine(xgb_pipe, X, y, cache=false)
fit!(xgbm, rows=train, verbosity=0)

yhat = predict_mode(xgbm, X[test,:])
println(accuracy(yhat, y[test]))

MLJ.save("mach_test_xgb_MLJ.jls", xgbm)
mach_restored = machine("mach_test_xgb_MLJ.jls")

yhat = predict_mode(mach_restored, X[test,:])
println(accuracy(yhat, y[test]))

ExpandingMan · 2024-02-22T15:11:04Z

I just ran a bunch of tests of XGBoost.save and XGBoost.load (which are used by the interface) including things that I thought it might be doing, and it seems to work fine. I therefore am unable to reproduce this directly with XGBoost.jl.

@ablaom would it be difficult for you to create an MWE that mimics all the calls to XGBoost.jl but unwrapped from MLJ? In particular I'm suspicious of what might be going on in machine in these examples, as it is not obvious to me from looking at MLJXGBoostInterface. Also, what does the cache=false kw arg do?

ablaom · 2024-02-27T00:33:04Z

The reason I suspect the XGBoost interface is that I cannot reproduce this problem for other models. For example, you can replace XGBoostClassifier with SVC from LIBSVM (another wrapped C-library) and there is no problem in the code above. (And the value of the cache flag does not matter).

ablaom · 2024-02-27T00:49:01Z

What does it mean for a booster to be "disposed"?

Booster has not been initialized or has already been disposed.

ablaom · 2024-02-27T02:01:04Z

Related?: https://stackoverflow.com/questions/56249115/how-to-fix-dmatrix-booster-has-not-been-intialized-or-has-already-been-disposed

ExpandingMan · 2024-02-27T14:50:14Z

That doesn't look related, but who knows? I wasn't doubting that it's an XGBoost specific problem, I just can't reproduce it without knowing a lot more about the internals of machine and whatever else.

Again, can you help me reproduce the exact calls that are being made to the booster object in the above script? If you can reproduce the problem with XGBoost.jl alone I'll be able to figure it out, but I'm still far from knowing what that'll look like. I don't know how hard it is to figure that out, but it's probably worth dedicated effort to make sure it's always as easy to do that as possible with any model.

ablaom · 2024-02-27T19:40:52Z

The problem indeed appears to be on the MLJBase end, after all. Thanks for your investigations.
The issue is only specific to XGBoost because this is the only MLJ model that needs to overload MLJModelInterface.save because it's fitresult is not persistent.

It looks like fixing the issue on the MLJBase is side involves major gymnastics and I have to wonder if it's worth the effort and added complexity for one (out of more than 200) MLJ models, a model that is not Julia and which has a compelling Julia substitute: EvoTrees.jl is quite mature and still actively developed.

An alternative solution here would be to change fitresult to include the persistent version (output of XGBoost.save(booster, Vector{UInt8}) and recreate the Booster object with XGBoost.load in every call to predict (unless there is actually a way to test if a Booster object is usable). Assuming the impact on performance is rarely significant (which I have not investigated) would this be acceptable?

ExpandingMan · 2024-02-27T22:54:32Z

An alternative solution here would be to change fitresult to include the persistent version (output of XGBoost.save(booster, Vector{UInt8}) and recreate the Booster object with XGBoost.load in every call to predict (unless there is actually a way to test if a Booster object is usable). Assuming the impact on performance is rarely significant (which I have not investigated) would this be acceptable?

Not entirely sure this was directed at me, but yeah I'd say anything that needs to be done in MLJXGBoostInterface.jl to adapt is fine. Doesn't affect anything else so I don't think there's ever a reason not to implement a fix there.

ablaom · 2024-03-01T02:09:43Z

@paulotrefosco Be great if you can confirm MLJBase 1.1.2 resolves your particular example.

paulotrefosco · 2024-03-01T12:57:34Z

@paulotrefosco Be great if you can confirm MLJBase 1.1.2 resolves your particular example.

Hello @ablaom ,
Now it is working! Thanks a lot for the efforts!

ablaom added a commit that referenced this issue Feb 29, 2024

add fixes to nested serialization to close #927

41181d8

ablaom mentioned this issue Feb 29, 2024

Fix problem with serialization of nested models when component model overload save/restore #960

Merged

ablaom closed this as completed in #960 Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialized Composite Model Fails with XGBoost #927

Serialized Composite Model Fails with XGBoost #927

pazzo83 commented Aug 18, 2023 •

edited

Loading

pazzo83 commented Aug 18, 2023

ablaom commented Jan 17, 2024

ablaom commented Feb 21, 2024

ablaom commented Feb 21, 2024

paulotrefosco commented Feb 22, 2024

ExpandingMan commented Feb 22, 2024

ablaom commented Feb 27, 2024 •

edited

Loading

ablaom commented Feb 27, 2024

ablaom commented Feb 27, 2024

ExpandingMan commented Feb 27, 2024

ablaom commented Feb 27, 2024

ExpandingMan commented Feb 27, 2024

ablaom commented Mar 1, 2024

paulotrefosco commented Mar 1, 2024 •

edited

Loading

Serialized Composite Model Fails with XGBoost #927

Serialized Composite Model Fails with XGBoost #927

Comments

pazzo83 commented Aug 18, 2023 • edited Loading

pazzo83 commented Aug 18, 2023

ablaom commented Jan 17, 2024

ablaom commented Feb 21, 2024

ablaom commented Feb 21, 2024

paulotrefosco commented Feb 22, 2024

ExpandingMan commented Feb 22, 2024

ablaom commented Feb 27, 2024 • edited Loading

ablaom commented Feb 27, 2024

ablaom commented Feb 27, 2024

ExpandingMan commented Feb 27, 2024

ablaom commented Feb 27, 2024

ExpandingMan commented Feb 27, 2024

ablaom commented Mar 1, 2024

paulotrefosco commented Mar 1, 2024 • edited Loading

pazzo83 commented Aug 18, 2023 •

edited

Loading

ablaom commented Feb 27, 2024 •

edited

Loading

paulotrefosco commented Mar 1, 2024 •

edited

Loading