Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues on multiclass classification classifier with xgboost #7688

Closed
PanZiwei opened this issue Feb 21, 2022 · 6 comments
Closed

Issues on multiclass classification classifier with xgboost #7688

PanZiwei opened this issue Feb 21, 2022 · 6 comments

Comments

@PanZiwei
Copy link

PanZiwei commented Feb 21, 2022

Hi,
I am interested in developing a multi-class classifier with XGboost but I am still a newborn. I would really appreciate it if you can help with the following issues.

  1. It seems that when the class number is greater than 2, it will modify the obj parameter to multi:softmax. But since the code is under @_deprecate_positional_args, is that mean that the multi-class classifier is no longer available with current XGboost?

    if self.n_classes_ > 2:
    # Switch to using a multiclass objective in the underlying XGB instance
    if params.get("objective", None) != "multi:softmax":
    params["objective"] = "multi:softprob"
    params["num_class"] = self.n_classes_

  2. How can I use the latest stable XGboost (v1.5.2 by far) to develop the multi-class classifier? Do you think it makes sense to add a single output model with sklearn.multioutput.MultiOutputClassfier, something like classifier = MultiOutputClassifier(XGBClassifier())?

  3. You mentioned in Initial support for multi-output regression. #7309 (comment) that "I think for multi-target classification models, xgboost needs a new interface and potentially lots of refactoring. I will focus on regression for now. Thank you for joining the discussion and feel free to test the new feature. ;-)"
    Can you be more specific on the term multi-target? Is it for multi-class classification or multi-label classification or multi-output classification?

Thank you so much for your help.

@PanZiwei PanZiwei changed the title Multiclass classification with xgboost classifier? Issues on multiclass classification classifier with xgboost Feb 21, 2022
@trivialfis
Copy link
Member

trivialfis commented Feb 21, 2022

How can I use the latest stable XGboost (v1.5.2 by far) to develop the multi-class classifier?

https://github.com/dmlc/xgboost/blob/master/demo/guide-python/sklearn_examples.py

under @_deprecate_positional_args,

That means the users need to specify the python arguments as keyword arguments: fit(X=x, y=y, weight=weight) instead of fit(X, y, weight) for parameters after the *.

Can you be more specific on the term multi-target?

Apologies for the ambiguity. What I was trying to say is, XGBoost won't support multi-target-multi-class classification. That's when the data contains multiple output targets, and each target is a classification problem with multiple classes. Multi-class means your data have only 1 target, which has multiple classes. Multi-label means you have multiple targets, each target has 2 classes.

@PanZiwei
Copy link
Author

PanZiwei commented Feb 21, 2022

Thank you so much for the quick response and clarification! 2 following questions:

  1. For multiclass on XGBoost, it is a One-vs-Rest or One-vs-One?
  2. If I want to use the multi-label version, I should install v1.6.0-dev via Nightly build model right?
    It seems that XGboost 1.6.0 dev version begin to support multi-label classification, but v1.6.0 is still under development and the mature 1.5.2 didn’t have the feature yet.
    faaa47c#diff-e0c1153dfc69e8953158acd86131e51590b759923cc9821e2a449ef919d8355a

@trivialfis
Copy link
Member

trivialfis commented Feb 21, 2022

For multiclass on XGBoost, it is a One-vs-Rest or One-vs-One?

1-vs-rest

If I want to use the multi-label version, I should install v1.6.0-dev via Nightly build model right?

Correct. The feature is pretty basic at the moment. You can achieve the same result using sklearn meta estimators.

@PanZiwei
Copy link
Author

PanZiwei commented Feb 23, 2022

Hi @trivialfis
Sorry to bother you again. I am trying to find the best xgboost classifier to deal with an imbalanced dataset with multiple classes with hyperparameter tuning and would really appreciate it if you can help.

  1. Any suggestions to handle the process?
  2. Can I use scale_pos_weight for multiple classes? I checked the https://xgboost.readthedocs.io/en/latest/tutorials/param_tuning.html#handle-imbalanced-dataset but it seems that the parameter can only deal with binary classes instead of multiple classes?
  3. Apart from the hyperparameter tuning step, do I need to set the weight in the fit step also as you suggested? https://discuss.xgboost.ai/t/unbalanced-multi-class-data-using-xgboost/2277/2

Thank you so much for your help!

@hcho3
Copy link
Collaborator

hcho3 commented Feb 23, 2022

@PanZiwei No, you cannot use scale_pos_weight for multiple classes. Please add sample_weight argument when calling fit(). The sample_weight should consist of weights given to all data points. The weights can be adjusted according to what class the data points belong to.

@PanZiwei
Copy link
Author

PanZiwei commented Feb 23, 2022

How about the parameter max_delta_step suggested in handle imbalanced dataset?

So the imbalanced dataset can be only learned in the fit process instead of the randomized search step?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants