Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] enable deterministic repartitioning when checkpoint is enabled #4807

Merged
merged 4 commits into from
Sep 19, 2019

Conversation

CodingCat
Copy link
Member

part of #4786

@CodingCat
Copy link
Member Author

@trams

Copy link
Contributor

@trams trams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general.
My only concern is

Why do we need to add a feature value at all to our hash function?
HashCode of Vector is murmur hash which is based purely on values of this vector.

If the method of adding a value from this vector makes the hashing better why don't Murmur do this? Is this because Murmur is essentially streaming?

If this method does not make the hashing "better" is there some other reason I am missing to do this?

Why can't we just use a hash value and HashPartitioner? It should simplify logic

@CodingCat
Copy link
Member Author

@trivialfis any idea on the flaky gpu test?

@trivialfis
Copy link
Member

Looks like network failure ...

@hcho3 hcho3 mentioned this pull request Sep 16, 2019
@CodingCat CodingCat merged commit fc8c9b0 into dmlc:master Sep 19, 2019
@CodingCat CodingCat deleted the deterministic_partitioning_2 branch September 19, 2019 22:21
@lock lock bot locked as resolved and limited conversation to collaborators Dec 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants