Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialized Composite Model Fails with XGBoost #927

Closed
pazzo83 opened this issue Aug 18, 2023 · 14 comments · Fixed by #960
Closed

Serialized Composite Model Fails with XGBoost #927

pazzo83 opened this issue Aug 18, 2023 · 14 comments · Fixed by #960

Comments

@pazzo83
Copy link
Collaborator

pazzo83 commented Aug 18, 2023

I am having an issue with a serialized MLJ pipeline that includes an XGBoostClassifier - when trying to run predict on the loaded model, I get the following error:

┌ Error: Failed to apply the operation `predict` to the machine machine(:xg_boost_classifier, …), which receives it's data arguments from one or more nodes in a learning network. Possibly, one of these nodes is delivering data that is incompatible with the machine's model.
│ Model (xg_boost_classifier):
│ input_scitype = Unknown
│ target_scitype =Unknown
│ output_scitype =Unknown
│
│ Incoming data:
│ arg of predict	scitype
│ -------------------------------------------
│ Node @103	Table{AbstractVector{Continuous}}
│
│ Learning network sources:
│ source	scitype
│ -------------------------------------------
│ Source @253	Nothing
│ Source @396	Nothing
└ @ MLJBase ~/.julia/packages/MLJBase/0rn2V/src/composition/learning_networks/nodes.jl:153
ERROR: XGBoostError: (caller: XGBoosterPredictFromDMatrix)
[00:01:27] /workspace/srcdir/xgboost/src/c_api/c_api.cc:913: Booster has not been initialized or has already been disposed.
Stack trace:
  [bt] (0) 1   libxgboost.dylib                    0x000000028780cc4c dmlc::LogMessageFatal::~LogMessageFatal() + 124
  [bt] (1) 2   libxgboost.dylib                    0x0000000287827614 XGBoosterPredictFromDMatrix + 116
  [bt] (2) 3   ???                                 0x00000002a858c0c8 0x0 + 11414323400
  [bt] (3) 4   ???                                 0x00000002a8404194 0x0 + 11412717972
  [bt] (4) 5   libjulia-internal.1.9.dylib         0x00000001048cdacc do_apply + 744
  [bt] (5) 6   ???                                 0x00000002a802c1a0 0x0 + 11408687520
  [bt] (6) 7   ???                                 0x00000002a79c40d8 0x0 + 11401969880
  [bt] (7) 8   ???                                 0x00000002a78e40d4 0x0 + 11401052372
  [bt] (8) 9   libjulia-internal.1.9.dylib         0x00000001048da53c do_call + 188

I've noted this on both Julia 1.8.5 and Julia 1.9.2 using up-to-date libraries for MLJ and XGBoost (which I believe recently updated to a version 2.0.0). I also have noted that recently, the MLJXGBoostInterface library changed the way they serialize the XGBoost models.

@pazzo83
Copy link
Collaborator Author

pazzo83 commented Aug 18, 2023

My work around right now is serializing them separately and then having some wrapper code to use them as a "pipeline" when I deserialize them (basically my entire pipeline minus the classifier is one component, then the xgboost model is the other) - and that works.

@ablaom
Copy link
Member

ablaom commented Jan 17, 2024

Sorry, this one seems to have slipped by. Thanks for reporting this potential bug!

I am not able to reproduce. For example, the code below works for me. Can I have more context please?

using MLJBase, MLJModels
booster = (@load XGBoostClassifier)()
pipe = ContinuousEncoder |> booster
X, y = @load_crabs
mach = machine(pipe, X, y) |> fit!
MLJBase.save("junk.jls", mach)
mach2 = machine("junk.jls")
julia> predict(mach2, X)
200-element UnivariateFiniteVector{Multiclass{2}, String, UInt32, Float32}:
 UnivariateFinite{Multiclass{2}}(B=>0.883, O=>0.117)
 UnivariateFinite{Multiclass{2}}(B=>0.883, O=>0.117)
 UnivariateFinite{Multiclass{2}}(B=>0.897, O=>0.103)
 
 UnivariateFinite{Multiclass{2}}(B=>0.0963, O=>0.904)
(jl_XBBUpe) pkg> st
Status `/private/var/folders/4n/gvbmlhdc8xj973001s6vdyw00000gq/T/jl_XBBUpe/Project.toml`
  [a7f614a8] MLJBase v1.1.0
  [d491faf4] MLJModels v0.16.14
  [54119dfa] MLJXGBoostInterface v0.3.10

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 12 × Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
  Threads: 17 on 12 virtual cores
Environment:
  JULIA_LTS_PATH = /Applications/Julia-1.6.app/Contents/Resources/julia/bin/julia
  JULIA_PATH = /Applications/Julia-1.10.app/Contents/Resources/julia/bin/julia
  JULIA_EGLOT_PATH = /Applications/Julia-1.10.app/Contents/Resources/julia/bin/julia
  JULIA_NUM_THREADS = 12
  DYLD_LIBRARY_PATH = /usr/local/homebrew/Cellar/libomp/9.0.1/lib/
  JULIA_NIGHTLY_PATH = /Applications/Julia-1.10.app/Contents/Resources/julia/bin/julia

@ablaom
Copy link
Member

ablaom commented Feb 21, 2024

But this is failing for me (adapted from aJulia Discourse post):

using MLJ

XGBC = @load XGBoostClassifier
xgb = XGBC()
ohe = OneHotEncoder()

# Pipeline OneHotEncoder > XGBoost
xgb_pipe = ohe |> xgb

# Setting Target and Features tables:
# y, X = unpack(df, ==(:y_label), col->true)
X, y = @load_iris

train, test = partition(1:length(y), 0.1, shuffle=true)

xgbm = machine(xgb_pipe, X, y, cache=false)
fit!(xgbm, rows=train, verbosity=0)

MLJ.save("mach_xgb_pipe.jls", xgbm)

# Restoring the model and using for predictions:
mach_restored = machine("mach_xgb_pipe.jls")

yhat = predict_mode(mach_restored , selectrows(X, test))

# ┌ Error: Failed to apply the operation `predict` to the machine machine(:xg_boost_classifier, …), which receives it's data arguments from one or more nodes in a learning network. Possibly, one of these nodes is delivering data that is incompatible with the machine's model.
# │ Model (xg_boost_classifier):
# │ input_scitype = Unknown
# │ target_scitype =Unknown
# │ output_scitype =Unknown
#
# │ Incoming data:
# │ arg of predict        scitype
# │ -------------------------------------------
# │ Node @034 → :one_hot_encoder  Table{AbstractVector{Continuous}}
#
# │ Learning network sources:
# │ source        scitype
# │ -------------------------------------------
# │ Source @978   Nothing
# │ Source @367   Nothing
# └ @ MLJBase ~/.julia/packages/MLJBase/mIaqI/src/composition/learning_networks/nodes.jl:153
# ERROR: XGBoostError: (caller: XGBoosterPredictFromDMatrix)
# [11:50:45] /workspace/srcdir/xgboost/src/c_api/c_api.cc:1059: Booster has not been initialized or has already been disposed.
# Stack trace:
#   [bt] (0) 1   libxgboost.dylib                    0x00000001723c8325 dmlc::LogMessageFatal::~LogMessageFatal() + 117
#   [bt] (1) 2   libxgboost.dylib                    0x00000001723e91f9 XGBoosterPredictFromDMatrix + 121
#   [bt] (2) 3   ???                                 0x0000000170d61f13 0x0 + 6188048147


# Stacktrace:
#  [1] _apply(y_plus::Tuple{…}, input::@NamedTuple{…}; kwargs::@Kwargs{})
#    @ MLJBase ~/.julia/packages/MLJBase/mIaqI/src/composition/learning_networks/nodes.jl:159
#  [2] _apply
#    @ MLJBase ~/.julia/packages/MLJBase/mIaqI/src/composition/learning_networks/nodes.jl:144 [inlined]                                                                                  
#  [3] (::Node{…})(Xnew::@NamedTuple{…})
#    @ MLJBase ~/.julia/packages/MLJBase/mIaqI/src/composition/learning_networks/nodes.jl:140
#  [4] output_and_report(signature::MLJBase.Signature{…}, operation::Symbol, Xnew::@NamedTuple
# {…})                                                                                       
#    @ MLJBase ~/.julia/packages/MLJBase/mIaqI/src/composition/learning_networks/signatures.jl:374                                                                           
#  [5] predict
#    @ ~/.julia/packages/MLJBase/mIaqI/src/operations.jl:191 [inlined]
#  [6] predict_mode(m::MLJBase.ProbabilisticPipeline{…}, fitresult::MLJBase.Signature{…}, Xnew
# ::@NamedTuple{…})                                                                          
#    @ MLJBase ~/.julia/packages/MLJBase/mIaqI/src/operations.jl:209
#  [7] predict_mode(mach::Machine{…}, Xraw::@NamedTuple{…})
#    @ MLJBase ~/.julia/packages/MLJBase/mIaqI/src/operations.jl:133
#  [8] top-level scope
#    @ REPL[28]:1

# caused by: XGBoostError: (caller: XGBoosterPredictFromDMatrix)
# [11:50:45] /workspace/srcdir/xgboost/src/c_api/c_api.cc:1059: Booster has not been initialized or has already been disposed.  
# Stack trace:
#   [bt] (0) 1   libxgboost.dylib                    0x00000001723c8325 dmlc::LogMessageFatal::~LogMessageFatal() + 117
#   [bt] (1) 2   libxgboost.dylib                    0x00000001723e91f9 XGBoosterPredictFromDMatrix + 121
#   [bt] (2) 3   ???                                 0x0000000170d61f13 0x0 + 6188048147


# Stacktrace:
#     [1] xgbcall(::Function, ::Ptr{Nothing}, ::Vararg{Any})

# < abridged >
(jl_6G9Ncl) pkg> st
Status `/private/var/folders/4n/gvbmlhdc8xj973001s6vdyw00000gq/T/jl_6G9Ncl/Project.toml`
  [add582a8] MLJ v0.20.2
  [54119dfa] MLJXGBoostInterface v0.3.10

(jl_6G9Ncl) pkg> st
Status `/private/var/folders/4n/gvbmlhdc8xj973001s6vdyw00000gq/T/jl_6G9Ncl/Project.toml`
  [add582a8] MLJ v0.20.2
  [54119dfa] MLJXGBoostInterface v0.3.10

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 12 × Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
  Threads: 17 on 12 virtual cores
Environment:
  JULIA_LTS_PATH = /Applications/Julia-1.6.app/Contents/Resources/julia/bin/julia
  JULIA_PATH = /Applications/Julia-1.10.app/Contents/Resources/julia/bin/julia
  JULIA_EGLOT_PATH = /Applications/Julia-1.7.app/Contents/Resources/julia/bin/julia
  JULIA_NUM_THREADS = 12
  JULIA_NIGHTLY_PATH = /Applications/Julia-1.10.app/Contents/Resources/julia/bin/julia

@ablaom
Copy link
Member

ablaom commented Feb 21, 2024

@ExpandingMan I wonder if there is something suspect about the way we are currently serialising XGBoost models in MLJXGBoostInterface.jl. Is it really persistent? Or if you have any other ideas?

The problem here is that XGBoost models do serialise, but not in pipelines.

In particular, note the following part of the stack trace:

# ERROR: XGBoostError: (caller: XGBoosterPredictFromDMatrix)
# [11:50:45] /workspace/srcdir/xgboost/src/c_api/c_api.cc:1059: Booster has not been initialized or has already been disposed.
# Stack trace:
#   [bt] (0) 1   libxgboost.dylib                    0x00000001723c8325 dmlc::LogMessageFatal::~LogMessageFatal() + 117
#   [bt] (1) 2   libxgboost.dylib                    0x00000001723e91f9 XGBoosterPredictFromDMatrix + 121
#   [bt] (2) 3   ???                                 0x0000000170d61f13 0x0 + 6188048147

@paulotrefosco
Copy link

Hello! @ablaom

I also run into this problem. I'll try to provide everything here with a toy example. Since I am not proficient on git/codes/etc, let me know if there are other things that I can do to help!

df_example_MLJ.csv

using CSV
using DataFrames

df = CSV.read("df_example_MLJ.csv", DataFrame)

df = coerce(df, :x1 => Continuous,
                :x2 => Multiclass,
                :x3 => Multiclass,
                :x4 => Multiclass,
                :x5 => Continuous,
                :x6 => Continuous,
                :x7 => Continuous,
                :x8 => Multiclass,
                :x9 => Multiclass,
                :y  => OrderedFactor)
schema(df)

using MLJ
XGBC = @load XGBoostClassifier
xgb = XGBC()
ohe = OneHotEncoder()

# Pipeline OneHotEncoder > XGBoost
xgb_pipe = ohe |> xgb

# Setting Target and Features tables:
y, X = unpack(df, ==(:y), col->true)

train, test = partition(1:length(y), 0.1, shuffle=true)

xgbm = machine(xgb_pipe, X, y, cache=false)
fit!(xgbm, rows=train, verbosity=0)

yhat = predict_mode(xgbm, X[test,:])
println(accuracy(yhat, y[test]))

MLJ.save("mach_test_xgb_MLJ.jls", xgbm)
mach_restored = machine("mach_test_xgb_MLJ.jls")

yhat = predict_mode(mach_restored, X[test,:])
println(accuracy(yhat, y[test]))

@ExpandingMan
Copy link

I just ran a bunch of tests of XGBoost.save and XGBoost.load (which are used by the interface) including things that I thought it might be doing, and it seems to work fine. I therefore am unable to reproduce this directly with XGBoost.jl.

@ablaom would it be difficult for you to create an MWE that mimics all the calls to XGBoost.jl but unwrapped from MLJ? In particular I'm suspicious of what might be going on in machine in these examples, as it is not obvious to me from looking at MLJXGBoostInterface. Also, what does the cache=false kw arg do?

@ablaom
Copy link
Member

ablaom commented Feb 27, 2024

The reason I suspect the XGBoost interface is that I cannot reproduce this problem for other models. For example, you can replace XGBoostClassifier with SVC from LIBSVM (another wrapped C-library) and there is no problem in the code above. (And the value of the cache flag does not matter).

@ablaom
Copy link
Member

ablaom commented Feb 27, 2024

What does it mean for a booster to be "disposed"?

Booster has not been initialized or has already been disposed.

@ExpandingMan
Copy link

That doesn't look related, but who knows? I wasn't doubting that it's an XGBoost specific problem, I just can't reproduce it without knowing a lot more about the internals of machine and whatever else.

Again, can you help me reproduce the exact calls that are being made to the booster object in the above script? If you can reproduce the problem with XGBoost.jl alone I'll be able to figure it out, but I'm still far from knowing what that'll look like. I don't know how hard it is to figure that out, but it's probably worth dedicated effort to make sure it's always as easy to do that as possible with any model.

@ablaom
Copy link
Member

ablaom commented Feb 27, 2024

The problem indeed appears to be on the MLJBase end, after all. Thanks for your investigations.
The issue is only specific to XGBoost because this is the only MLJ model that needs to overload MLJModelInterface.save because it's fitresult is not persistent.

It looks like fixing the issue on the MLJBase is side involves major gymnastics and I have to wonder if it's worth the effort and added complexity for one (out of more than 200) MLJ models, a model that is not Julia and which has a compelling Julia substitute: EvoTrees.jl is quite mature and still actively developed.

An alternative solution here would be to change fitresult to include the persistent version (output of XGBoost.save(booster, Vector{UInt8}) and recreate the Booster object with XGBoost.load in every call to predict (unless there is actually a way to test if a Booster object is usable). Assuming the impact on performance is rarely significant (which I have not investigated) would this be acceptable?

@ExpandingMan
Copy link

An alternative solution here would be to change fitresult to include the persistent version (output of XGBoost.save(booster, Vector{UInt8}) and recreate the Booster object with XGBoost.load in every call to predict (unless there is actually a way to test if a Booster object is usable). Assuming the impact on performance is rarely significant (which I have not investigated) would this be acceptable?

Not entirely sure this was directed at me, but yeah I'd say anything that needs to be done in MLJXGBoostInterface.jl to adapt is fine. Doesn't affect anything else so I don't think there's ever a reason not to implement a fix there.

@ablaom
Copy link
Member

ablaom commented Mar 1, 2024

@paulotrefosco Be great if you can confirm MLJBase 1.1.2 resolves your particular example.

@paulotrefosco
Copy link

paulotrefosco commented Mar 1, 2024

@paulotrefosco Be great if you can confirm MLJBase 1.1.2 resolves your particular example.

Hello @ablaom ,
Now it is working! Thanks a lot for the efforts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
4 participants