Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sklearn2pmml does not work with xgboost >= 2.0.0 #402

Closed
ghost opened this issue Dec 6, 2023 · 9 comments
Closed

sklearn2pmml does not work with xgboost >= 2.0.0 #402

ghost opened this issue Dec 6, 2023 · 9 comments

Comments

@ghost
Copy link

ghost commented Dec 6, 2023

When trying to generate a PMML for an xgboost model, I get the following error:

Standard output is empty
Standard error:
Exception in thread "main" java.lang.IllegalArgumentException: Attribute 'xgboost.sklearn.XGBClassifier.classes_' not set
	at org.jpmml.python.PythonObject.get(PythonObject.java:82)
	at org.jpmml.python.PythonObject.getList(PythonObject.java:282)
	at org.jpmml.python.PythonObject.getListLike(PythonObject.java:311)
	at sklearn.Classifier.getClasses(Classifier.java:83)
	at sklearn.LabelEncoderClassifier.getClasses(LabelEncoderClassifier.java:40)
	at sklearn.Classifier.encodeLabel(Classifier.java:99)
	at org.jpmml.sklearn.SkLearnEncoder.initLabel(SkLearnEncoder.java:147)
	at sklearn.Composite.initLabel(Composite.java:225)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:103)
	at com.sklearn2pmml.Main.run(Main.java:80)
	at com.sklearn2pmml.Main.main(Main.java:65)

Libraries anf versions:

python: 3.12.0
sklearn2pmml: 0.99.3
sklearn: 1.2.2
sklearn_pandas: 2.2.0
pandas: 2.1.3
numpy: 1.26.2
dill: 0.3.7
joblib: 1.3.2
java: 21.0.1
xgboost: 2.0.2

After downgrading xgboost to 1.7.6, everything works fine.

@vruusmann
Copy link
Member

I don't think that anything major has happened to the xgboost.Booster itself. It's just its Scikit-Learn wrapper (ie. the XGBClassifier class) that has had some attributes renamed or removed.

Will have to update the JPMML-XGBoost project first...

In the meantime, can you try adding the missing XGBClassifier.classes_ attribute yourself? Should be a Python list of class labels. Perhaps the conversion succeeds after that.

Something like this:

classifier = XGBClassifier()

pipeline = Pipeline([
  ("classifier", classifier)
])
pipeline.fit(X, y)

# After fitting, set the missing `classes_` attribute
classifier.classes_ = numpy.unique(y)

sklearn2pmml(pipeline, "XGBClassifier.pmml")

@ghost
Copy link
Author

ghost commented Dec 7, 2023

Actually, the classes_ property still exists in XGBClassifier:

@property
def classes_(self) -> np.ndarray:
   return np.arange(self.n_classes_)

It is also accessible read-only in Python via classifier.classes_(it just cannot be modified/set since no setter is defined for it). I guess PythonObject.java needs to be changed to handle the @property style.

@vruusmann
Copy link
Member

vruusmann commented Dec 7, 2023

I guess PythonObject.java needs to be changed to handle the @Property style.

The @property attribute only exists in a "live" Python environment. If you persist a XGBClassifier v2 object, then the pickle file does not contain any trace about it.

Yep, and since there is a virtual XGBClassifier.classes_ attribute defined (in @property form), then you cannot re-assign it manually. Missed that technical reality in my above comment.

Now, the workaround would be to update the Java handler of XGBClassifier class to emulate the latest "class labels are integers [0, 1, .., n - 1]" behaviour.

Another idea would be to define a special-purpose pmml_classes_ attribute, which allows you to override any Python class label list with your own custom list. The JPMML-SkLearn converter could look for this extension attribute on every classifier class, not just XGBClassifier. That's a neat idea!

@vruusmann
Copy link
Member

vruusmann commented Dec 7, 2023

Now, the workaround would be to update the Java handler of XGBClassifier class to emulate the latest "class labels are integers [0, 1, .., n - 1]" behaviour.

FYI: I'm currently working on getting Scikit-Learn 1.3.X supported. Should be ready any day now.

After that, I'll do a round of XGBoost and LightGBM library updates, and will fix this particular issue among others. Should be done by end of next week.

@ghost
Copy link
Author

ghost commented Dec 7, 2023

Now, the workaround would be to update the Java handler of XGBClassifier class to emulate the latest "class labels are integers [0, 1, .., n - 1]" behaviour.

Another idea would be to define a special-purpose pmml_classes_ attribute, which allows you to override any Python class label list with your own custom list. The JPMML-SkLearn converter could look for this extension attribute on every classifier class, not just XGBClassifier. That's a neat idea!

Both options sound feasible to me, the second one is probably easier to implement and more flexible. And there is already a similar construct with pmml_feature_importance_ ...

Thanks a lot for all your great work and quick response!

@vruusmann
Copy link
Member

The conversion should succeed with SkLearn2PMML 0.100.1 and newer.

@vruusmann
Copy link
Member

The conversion should succeed with SkLearn2PMML 0.100.1 and newer.

Damn, forgot that in addition to JPMML-XGBoost library update, it will be necessary to provide a workaround for the now-missing XGBClassifier.classes_ attribute.

Working on it right now.

@vruusmann
Copy link
Member

The conversion of XGBoost 2.0.X models should succeed with SkLearn2PMML 0.100.2 and newer.

The XGBClassifier class is missing a persistent classes_ attribute now. As discussed above, it can be worked around by declaring a pmml_classes_ attribute manually:

transformer = ...
classifier = XGBClassifier()

pipeline = PMMLPipeline([
    ("transformer", transformer),
    ("classifier", classifier)
])
pipeline.fit(df, df["target"])

# THIS! make the virtual property persistent
classifier.pmml_classes_ = classifier.classes_

sklearn2pmml(pipeline, "XGBoost.pmml")

@ghost
Copy link
Author

ghost commented Dec 19, 2023

confirm it is working now again. thanks a lot for the quick fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant