-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement secure boost scheme phase 1 - vertical pipeline with hist sync #10037
Implement secure boost scheme phase 1 - vertical pipeline with hist sync #10037
Conversation
…ute under secure scenario
…valent to broadcast
…lobal best split, but need to further apply split correctly
…ute under secure scenario
…valent to broadcast
…lobal best split, but need to further apply split correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on federated learning! Exciting new features.
Initial review as I'm still reading the secure boost paper. It would be great if you could make a summary of the various differences between all data modes.
@@ -401,6 +413,9 @@ class HistEvaluator { | |||
if (is_col_split_) { | |||
// With column-wise data split, we gather the best splits from all the workers and update the | |||
// expand entries accordingly. | |||
// Note that under secure vertical setting, only the label owner is able to evaluate the split | |||
// based on the global histogram. The other parties will receive the final best splits | |||
// allgather is capable of performing this (0-gain entries for non-label owners), | |||
auto all_entries = AllgatherColumnSplit(entries); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this part of the code even useful for passive parties? Considering that they don't evaluate splits. If not, then it would be much cleaner to skip the call to evaluation altogether. Keep spreading conditions like if (secure)
can make the code difficult to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently the (secure && passive parties) is skipped with "if ((!is_secure_) || (collective::GetRank() == 0)) {", recommendations on skipping it in other places?
@@ -190,6 +193,17 @@ class HistogramBuilder { | |||
reinterpret_cast<double *>(this->hist_[first_nidx].data()), n); | |||
} | |||
|
|||
if (is_distributed_ && is_col_split_ && is_secure_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to allgather the histogram across workers? I thought we only need to send it to the active worker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we only need to collect histograms to the active party, but my understanding is we currently do not have a "gather" function to do that? it will be great if we have it, similar to broadcast(..., rank), just reverse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for sharing, I can look into a gather function in the future.
I have enabled all the CI pipelines, please don't push until they finish, otherwise a new commit will interrupt the previous run. The PR looks good to me overall and will approve once all tests pass. Please note that after having all the desired features in the feature branch and having a full picture of the code changes, we might do a few rounds of refactors before merging into the master. This way we can unblock these individual PRs while keeping the code maintainable in the future. |
sounds good! Thanks a lot. :) |
The first phase is to implement an alternative vertical pipeline that syncs the histograms from clients to the label owner.
For implementing Vertical Federated Learning with Secure Features, as discussed in
#9987
The first phase is to implement an alternative vertical pipeline that sync the histograms from clients to label owner.
This PR implemented this feature as a standalone data mode.
Functional changes finished, currently adding unit testings
Note: phase 2 will be adding HE encryption features, which will be added in an independent PR