Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] xgboost-spark warning when Spark encryption is turned on #3667

Merged
merged 10 commits into from
Sep 10, 2018

Conversation

jkbradley
Copy link
Contributor

@jkbradley jkbradley commented Sep 4, 2018

Tracking Github issue: #3647

Issue: Apache Spark users running XGBoost on Spark may expect over-the-wire encryption for XGBoost based on Spark confs, but they will not get it and may never know about the security issue.

This PR changes XGBoost.trainDistributed to check the Spark conf spark.ssl.enabled and throw an exception if the user expects encryption over-the-wire.

In the error message, this tells the user how to get around this security check at their own risk.
The user can tell xgboost-spark to ignore spark.ssl.enabled by setting an xgboost-specific conf in the SparkConf: xgboost.spark.ignoreSsl.

This adds 1 unit test with XGBoostClassifier to check this behavior.

s"Spark Conf spark.ssl.enabled=true was overridden with xgboost.spark.ignoreSsl=true.")
} else {
throw new Exception("xgboost-spark found spark.ssl.enabled=true to encrypt data " +
"in transit, but xgboost-spark uses MPI to send non-encrypted data over the wire. " +
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't use MPI to send data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right : P will fix!

@mengxr
Copy link

mengxr commented Sep 5, 2018

LGTM except for one minor message. cc: @CodingCat for making a pass

@CodingCat
Copy link
Member

In Parental leave........will look at it later, sorry for the delay

@jkbradley
Copy link
Contributor Author

The last failure was spurious (from timeout). Pushing NOP update to retest.

@jkbradley
Copy link
Contributor Author

It seems like other Travis builds are having timeout issues too. Will return to this tomorrow.

@jkbradley jkbradley changed the title [WIP][jvm-packages] xgboost-spark warning when Spark encryption is turned on [jvm-packages] xgboost-spark warning when Spark encryption is turned on Sep 7, 2018
@jkbradley
Copy link
Contributor Author

@CodingCat OK definitely understandable that you're busy!

@hcho3 Any chance you'd be able to check this out? I appreciate it!

@hcho3
Copy link
Collaborator

hcho3 commented Sep 7, 2018

@jkbradley Wish I knew anything about Java world hehe :)

Copy link
Member

@CodingCat CodingCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left one minor comment

"To override this protection and still use xgboost-spark at your own risk, " +
"you can set the SparkSession conf to use xgboost.spark.ignoreSsl=true.")
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you please make this part as well as the lines until to L210 as a separate function (validation something), as this function has been too long

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thanks for checking this out : )

@CodingCat CodingCat merged commit 14a8b96 into dmlc:master Sep 10, 2018
@CodingCat
Copy link
Member

LGTM. merged to master, thanks!

@jkbradley
Copy link
Contributor Author

Thank you! Btw, do you know what the ETA for a new release of xgboost-spark is?

@jkbradley jkbradley deleted the spark-xgboost-encryption-warning branch September 11, 2018 17:31
CodingCat pushed a commit to CodingCat/xgboost that referenced this pull request Sep 18, 2018
…on (dmlc#3667)

* added test, commented out right now

* reinstated test

* added fix for checking encryption settings

* fix by using RDD conf

* fix compilation

* renamed conf

* use SparkSession if available

* fix message

* nop

* code review fixes
alois-bissuel pushed a commit to criteo-forks/xgboost that referenced this pull request Dec 4, 2018
…on (dmlc#3667)

* added test, commented out right now

* reinstated test

* added fix for checking encryption settings

* fix by using RDD conf

* fix compilation

* renamed conf

* use SparkSession if available

* fix message

* nop

* code review fixes
@lock lock bot locked as resolved and limited conversation to collaborators Dec 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants