Multi-parameter optimization with custom loss function for probabilistic forecasting #8

StatMixedML · 2020-07-06T09:58:55Z

Description

Dear community,

I am currently working in a probabilistic extension of XGBoost that models all parameters of a distribution. This allows to create probabilistic forecasts from which prediction intervals and quantiles of interest can be derived.

The problem is that XGBoost doesn't permit do optimize over several parameters. Assume we have a Normal distribution y ~ N(µ, sigma). So far, my approach is a two-step procedure, where I first optimize µ with sigma fixed, and then optimize sigma with µ fixed and then iterate between these two.

Since this is inefficient, are there any ways of simultaneously optimize both µ and sigma using a custom loss function?

ja-thomas · 2020-07-06T16:13:44Z

I don't think this is really possible.

You get two gradient vectors that point in different direction (and potentially have a very different scale). In general you could scalarize them by a (random) convex combination in each iteration, but I have no idea how well this will work, or if the averages point in the right direction (probably not).

If you have a look how xgboost implements multiclass classification - where we also estimate multiple parameters - they just apply one-vs-rest classification and fit nclass trees per iteration, same as you are fitting two trees for mu and sigma.

If you don't like to iterate between the parameters you can also adaptively choose which to update in a iteration (based on the outer loss).

I would be quite interested if there actually is a solution for this.

StatMixedML · 2020-07-06T16:51:28Z

@ja-thomas Thanks for your suggestions, really appreciated!

You get two gradient vectors that point in different direction (and potentially have a very different scale). In general you could scalarize them by a (random) convex combination in each iteration, but I have no idea how well this will work, or if the averages point in the right direction (probably not).

I recall having tried averaging the gradients some time back, but it didn't work well.

If you don't like to iterate between the parameters you can also adaptively choose which to update in a iteration (based on the outer loss).

I guess you are referring to the paper Gradient boosting for distributional regression: faster tuning and improved variable selection via noncyclical updates, where the outer loss is the negative log likelihood. Preferably, I'd rather not follow this approach, as the model should update the parameters simultaneously. But thanks anyway for pointing this out.

If you have a look how xgboost implements multiclass classification - where we also estimate multiple parameters - they just apply one-vs-rest classification and fit nclass trees per iteration, same as you are fitting two trees for mu and sigma.

Using the idea of multiclass classification is also my preferred way of approaching the problem, at least so far. But I haven't really tried - feel free to open a pull request if you have some spare time ... :-). The fact that we're having k = 1, ..., K gradient/hessian vectors, where K is the number of distributional parameters, makes the thing difficult, indeed. Potentially, I would need to modify the way XGBoost training is done. Not sure though if that is the way to go.

StatMixedML · 2020-07-06T17:46:25Z

Any thoughts on incremental training of XGBoost:

initialize distributional parameters theta_k, for k = 1, ..., K
For each k, m = 1, ..., M-Iterations
2.1 Train m-iterations, e.g., m = 1, ..., 50
2.2 Update each parameter with fitted values
2.3 Train m-iterations, e.g., m = 50, ..., 100
2.4 Update each parameter with fitted values
....

Each parameter would get updated every m-th iteration. However, not sure how this affects hyper-parameter selection using Bayesian Optimization and if that is available in mlr, @ja-thomas ?

ja-thomas · 2020-07-06T18:01:25Z

Hi,

just to make sure I get you current approach right:
You fully train your model (for however many iterations / with early stopping ?) on the mu gradient/hessian keeping sigma fixed. Then plug in the final mu estimate as the constant new mu and fully train sigma. Then revert back to training mu and so on until convergence?

The incremental approach you just described does the same but only does m iterations at a time before moving to the next param?

If you want to emulate the regular gamboostLSS behaviour you would just cycle between mu and sigma after a single iteration, which is probably quite inefficient.

bryorsnef · 2020-07-06T18:08:45Z

Just spit-balling here. (Really looking forward to trying this library when released). With boosting, you have a learning rate parameter. Is it possible to to have some fixed amount of "update" that gets split between the distributional parameters? I.e. Estimate mu, estimate sigma, then dynamically decide on how much shrinkage applies to each one? I guess that doesn't really solve the simultaneous parameter estimation problem though?

palexbg · 2020-07-07T15:50:17Z

Hi Alex,

I have by no means anywhere near as much understanding as you in the field (still have to read that paper! ;) but I recently stumbled upon an alternative approach that has the same aim of probabilistic boosting.

https://arxiv.org/pdf/1910.03225.pdf

Would it maybe be of help for your particular problem? At some point they talk about optimizing multiple parameters and handling it with generalized natural gradients (page 2, right column top)
Cheers

StatMixedML · 2020-07-07T16:15:41Z

Thanks everyone for the discussion so far!

@palexbg I know the NGBoost approach, but thanks for sharing anyways. I`d assume that the way XGBoost deals with the gradient/hessian vectors is different from NGBoost, so I guess the architectures are different, implying that we cannot transfer the approach from NGBoost. But I am not sure, however. Might be worth exploring.

StatMixedML · 2020-07-07T16:27:14Z

Of all what I have read so far, not only in this chat, but also when reading papers on XGBoost, I have a growing impression that I might need to change the way XGBoost is trained in order to efficiently deal with K-gradient/hessian vectors, where K is the number of parameters of the distribution.

What do you think?

StatMixedML · 2020-07-10T06:59:27Z

Looping in dmlc/xgboost#5859 to the discussion.

tanvibagla · 2020-08-12T21:01:02Z

Hi, Can you please tell what library to use in R, to run XGBoostLSS. Thanks!

StatMixedML · 2020-08-15T08:14:20Z

@t0504b Referring to Issue #7

StatMixedML · 2021-04-17T07:44:13Z

@ja-thomas

I would be quite interested if there actually is a solution for this.

In fact, quite recently (after a 1.5 years of pausing the project) I managed to estimate all parameters of the Normal distribution without cycling between mu and sigma, hence in a simultaneous estimation run. As such, we now have a more elegant and efficient way of training that can be applied to basically all available gamlss distribution families. Hope to commit a very early version soon. Since it is based on python, there will probably only be a jupyter-notebook for now, without a proper package structure.

StatMixedML added the help wanted Extra attention is needed label Jul 6, 2020

StatMixedML mentioned this issue Jul 7, 2020

Multi-parameter optimization with custom loss function for probabilistic forecasting StatMixedML/LightGBMLSS#2

Closed

StatMixedML mentioned this issue Feb 21, 2021

Project dead ? StatMixedML/CatBoostLSS#8

Open

StatMixedML closed this as completed Apr 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-parameter optimization with custom loss function for probabilistic forecasting #8

Multi-parameter optimization with custom loss function for probabilistic forecasting #8

StatMixedML commented Jul 6, 2020

ja-thomas commented Jul 6, 2020

StatMixedML commented Jul 6, 2020 •

edited

Loading

StatMixedML commented Jul 6, 2020

ja-thomas commented Jul 6, 2020

bryorsnef commented Jul 6, 2020

palexbg commented Jul 7, 2020 •

edited

Loading

StatMixedML commented Jul 7, 2020 •

edited

Loading

StatMixedML commented Jul 7, 2020 •

edited

Loading

StatMixedML commented Jul 10, 2020

tanvibagla commented Aug 12, 2020

StatMixedML commented Aug 15, 2020

StatMixedML commented Apr 17, 2021 •

edited

Loading

Multi-parameter optimization with custom loss function for probabilistic forecasting #8

Multi-parameter optimization with custom loss function for probabilistic forecasting #8

Comments

StatMixedML commented Jul 6, 2020

Description

ja-thomas commented Jul 6, 2020

StatMixedML commented Jul 6, 2020 • edited Loading

StatMixedML commented Jul 6, 2020

ja-thomas commented Jul 6, 2020

bryorsnef commented Jul 6, 2020

palexbg commented Jul 7, 2020 • edited Loading

StatMixedML commented Jul 7, 2020 • edited Loading

StatMixedML commented Jul 7, 2020 • edited Loading

StatMixedML commented Jul 10, 2020

tanvibagla commented Aug 12, 2020

StatMixedML commented Aug 15, 2020

StatMixedML commented Apr 17, 2021 • edited Loading

StatMixedML commented Jul 6, 2020 •

edited

Loading

palexbg commented Jul 7, 2020 •

edited

Loading

StatMixedML commented Jul 7, 2020 •

edited

Loading

StatMixedML commented Jul 7, 2020 •

edited

Loading

StatMixedML commented Apr 17, 2021 •

edited

Loading