Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Shims v2 for ShuffledBatchRDD #3459

Merged
merged 6 commits into from
Sep 14, 2021

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Sep 13, 2021

This fixes #3372

@revans2 revans2 added the bug Something isn't working label Sep 13, 2021
@revans2 revans2 added this to the Sep 13 - Sep 24 milestone Sep 13, 2021
@revans2 revans2 self-assigned this Sep 13, 2021
@revans2 revans2 requested a review from tgravescs as a code owner September 13, 2021 10:27
@revans2
Copy link
Collaborator Author

revans2 commented Sep 13, 2021

build

val n = parent.numPartitions
val result = new Array[Int](n)
for (i <- partitionStartIndices.indices) {
for (i <- 0 until partitionStartIndices.length) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note. There are several changes here, like this one, that make it look like the code is going backwards based off of the IDEs warnings. This is because I made the code match as closely as possible to the Spark 3.2.0-RC2 code. This was so that differences between it and other versions of the code are much more obvious.

numReducers,
context,
sqlMetricsReporter)
val blocksByAddress = shim.getMapSizesByExecutorId(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only place that I have really added new code from scratch. The rest has been refactoring existing code and pulling in things either from Spark or from existing code in our plugin. I am not really sure how to trigger this part of the code base to test it. If someone really wants me to get tests for this I'll try to find a way to make sure that I cover all of the existing code paths.

jlowe
jlowe previously approved these changes Sep 13, 2021
@revans2 revans2 marked this pull request as draft September 13, 2021 14:17
@revans2
Copy link
Collaborator Author

revans2 commented Sep 13, 2021

Converted to draft because I need to work out some issues with datbricks build.

@revans2 revans2 marked this pull request as ready for review September 13, 2021 16:28
@revans2
Copy link
Collaborator Author

revans2 commented Sep 13, 2021

build

1 similar comment
@revans2
Copy link
Collaborator Author

revans2 commented Sep 13, 2021

build

@revans2
Copy link
Collaborator Author

revans2 commented Sep 13, 2021

Build failed trying to get the code coverage dependency, so I kicked it again.

@revans2
Copy link
Collaborator Author

revans2 commented Sep 13, 2021

build

@revans2
Copy link
Collaborator Author

revans2 commented Sep 13, 2021

build

1 similar comment
@pxLi
Copy link
Collaborator

pxLi commented Sep 14, 2021

build

@revans2 revans2 merged commit 126c7d3 into NVIDIA:branch-21.10 Sep 14, 2021
@revans2 revans2 deleted the 3_2_0_shuffled_batch_rdd branch September 14, 2021 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Spark 3.2+
Projects
None yet
4 participants