Add mlr algorithm into angel-ps #176

hbghhy · 2017-08-18T03:23:14Z

The detail of mlr algorithm can be seen at http://castellanzhang.github.io/2017/06/01/mlr_plm/.
Or the paper of alibaba: 《Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction》
I implement a mini-batch-gd version of the algorithm in angel-ps.(The paper use a optimization method like owlqn)

And I compare the performance of it with the lr in local with the a9a.train data:(since the parameter of mlr is more than lr, the regularization reg is different)
origin lr:

epoch=19 success. epoch cost 47 ms.train cost 35 ms. validation cost 12 ms.
write app report to file successfully jobReport {
  jobState: J_SUCCEEDED
  curIteration: 20
  totalIteration: 100
  diagnostics: ""
  metrics {
    key: "train.loss"
    value: "0.4104978639734121"
  }
  metrics {
    key: "validate.loss"
    value: "0.4202429934726492"
  }
}

mlr:

epoch=19 success. epoch cost 365 ms.train cost 306 ms. validation cost 59 ms.
write app report to file successfully jobReport {
  jobState: J_SUCCEEDED
  curIteration: 20
  totalIteration: 100
  diagnostics: ""
  metrics {
    key: "train.loss"
    value: "0.4196211116272123"
  }
  metrics {
    key: "validate.loss"
    value: "0.43189225968220946"
  }
}

And I test it on a more complicate data generate by the below code of sklearn:

X, y = datasets.make_classification(n_samples=100000, n_features=50,
                                    n_informative=30, n_redundant=10,n_clusters_per_class=5,
                                    random_state=42)

Here is the compare:
origin lr:

epoch=19 success. epoch cost 80 ms.train cost 45 ms. validation cost 35 ms.
write app report to file successfully jobReport {
  jobState: J_SUCCEEDED
  curIteration: 20
  totalIteration: 100
  diagnostics: ""
  metrics {
    key: "train.loss"
    value: "7.28320777491272"
  }
  metrics {
    key: "validate.loss"
    value: "7.421573883625887"
  }
}

mlr:

epoch=19 success. epoch cost 681 ms.train cost 507 ms. validation cost 174 ms.
write app report to file successfully jobReport {
  jobState: J_SUCCEEDED
  curIteration: 20
  totalIteration: 100
  diagnostics: ""
  metrics {
    key: "train.loss"
    value: "0.3514853859170796"
  }
  metrics {
    key: "validate.loss"
    value: "0.35450721676624813"
  }
}

Since the data is nonlinear separable, the lr plays poor.

Please review my code and test it on a cluster with larger data. You can check if there is some bug or some advice to improve the time. (I used to implement the mlr on spark by l-bfgs and owlqn and it run faster than fm and a bit slower than lr. So maybe there are some way to improve the speed of my code)

THX

andyyehoo · 2017-08-21T08:07:07Z

Thanks for your excellent work. :-)

hbghhy added 6 commits August 11, 2017 20:33

Merge branch 'branch-1.1.0' of github.com:Tencent/angel

341240b

fix lr and b

dc69a32

Merge branch 'branch-1.1.0' into mlr

5b1a781

add mlr algorithm

56bec7e

fix nan

d731fcc

fix conf and metrics

294d6ce

andyyehoo merged commit 6b7d6db into Angel-ML:branch-1.2.0 Aug 21, 2017

hbghhy deleted the mlr branch August 22, 2017 09:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mlr algorithm into angel-ps #176

Add mlr algorithm into angel-ps #176

hbghhy commented Aug 18, 2017 •

edited

Loading

andyyehoo commented Aug 21, 2017

Add mlr algorithm into angel-ps #176

Add mlr algorithm into angel-ps #176

Conversation

hbghhy commented Aug 18, 2017 • edited Loading

andyyehoo commented Aug 21, 2017

hbghhy commented Aug 18, 2017 •

edited

Loading