Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mlr algorithm into angel-ps #176

Merged
merged 6 commits into from
Aug 21, 2017
Merged

Conversation

hbghhy
Copy link
Contributor

@hbghhy hbghhy commented Aug 18, 2017

The detail of mlr algorithm can be seen at http://castellanzhang.github.io/2017/06/01/mlr_plm/.
Or the paper of alibaba: 《Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction》
I implement a mini-batch-gd version of the algorithm in angel-ps.(The paper use a optimization method like owlqn)

And I compare the performance of it with the lr in local with the a9a.train data:(since the parameter of mlr is more than lr, the regularization reg is different)
origin lr:

epoch=19 success. epoch cost 47 ms.train cost 35 ms. validation cost 12 ms.
write app report to file successfully jobReport {
  jobState: J_SUCCEEDED
  curIteration: 20
  totalIteration: 100
  diagnostics: ""
  metrics {
    key: "train.loss"
    value: "0.4104978639734121"
  }
  metrics {
    key: "validate.loss"
    value: "0.4202429934726492"
  }
}

mlr:

epoch=19 success. epoch cost 365 ms.train cost 306 ms. validation cost 59 ms.
write app report to file successfully jobReport {
  jobState: J_SUCCEEDED
  curIteration: 20
  totalIteration: 100
  diagnostics: ""
  metrics {
    key: "train.loss"
    value: "0.4196211116272123"
  }
  metrics {
    key: "validate.loss"
    value: "0.43189225968220946"
  }
}

And I test it on a more complicate data generate by the below code of sklearn:

X, y = datasets.make_classification(n_samples=100000, n_features=50,
                                    n_informative=30, n_redundant=10,n_clusters_per_class=5,
                                    random_state=42)

Here is the compare:
origin lr:

epoch=19 success. epoch cost 80 ms.train cost 45 ms. validation cost 35 ms.
write app report to file successfully jobReport {
  jobState: J_SUCCEEDED
  curIteration: 20
  totalIteration: 100
  diagnostics: ""
  metrics {
    key: "train.loss"
    value: "7.28320777491272"
  }
  metrics {
    key: "validate.loss"
    value: "7.421573883625887"
  }
}

mlr:

epoch=19 success. epoch cost 681 ms.train cost 507 ms. validation cost 174 ms.
write app report to file successfully jobReport {
  jobState: J_SUCCEEDED
  curIteration: 20
  totalIteration: 100
  diagnostics: ""
  metrics {
    key: "train.loss"
    value: "0.3514853859170796"
  }
  metrics {
    key: "validate.loss"
    value: "0.35450721676624813"
  }
}

Since the data is nonlinear separable, the lr plays poor.

Please review my code and test it on a cluster with larger data. You can check if there is some bug or some advice to improve the time. (I used to implement the mlr on spark by l-bfgs and owlqn and it run faster than fm and a bit slower than lr. So maybe there are some way to improve the speed of my code)

THX

@andyyehoo andyyehoo merged commit 6b7d6db into Angel-ML:branch-1.2.0 Aug 21, 2017
@andyyehoo
Copy link
Contributor

Thanks for your excellent work. :-)

@hbghhy hbghhy deleted the mlr branch August 22, 2017 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants