Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-dedupe ASM-relocated shim dependencies #3565

Merged
merged 1 commit into from
Sep 21, 2021

Conversation

gerashegalov
Copy link
Collaborator

  1. Remove the spark version classifier from the package name and just
    com.nvidia.spark.shaded for ASM-based relocation
  2. Remove com/nvidia/spark/shaded from unshimmed classes
  3. As a result they get auto-deduped in spark3xx-common

For minimumFeatureVersionMix profile consisting of

  • spark302
  • spark311cdh
  • spark312
  • spark320

the jar size goes down from 29M to 17M after compression

Pre-compression breakdown

$ du -h -s dist/target/parallel-world/spark*/com/nvidia/shaded
372K	dist/target/parallel-world/spark302/com/nvidia/shaded
372K	dist/target/parallel-world/spark311cdh/com/nvidia/shaded
372K	dist/target/parallel-world/spark312/com/nvidia/shaded
372K	dist/target/parallel-world/spark320/com/nvidia/shaded
15M	dist/target/parallel-world/spark3xx-common/com/nvidia/shade

Instead of storing 15M per shim we end up storing 312K delta per shim

This will only improve once we have commons per FEATURE versions lines:
3.0.x, 3.1.x, 3.2.x

Signed-off-by: Gera Shegalov gera@apache.org

1. Remove the spark version classifier from the package name and just
   com.nvidia.spark.shaded for ASM-based relocation
2. Remove com/nvidia/spark/shaded from unshimmed classes
3. As a result they get auto-deduped in spark3xx-common

For minimumFeatureVersionMix profile consisting of
 - spark302
 - spark311cdh
 - spark312
 - spark320

the jar size goes down from 29M to 17M after compression

Pre-compression breakdown
```
$ du -h -s dist/target/parallel-world/spark*/com/nvidia/shaded
372K	dist/target/parallel-world/spark302/com/nvidia/shaded
372K	dist/target/parallel-world/spark311cdh/com/nvidia/shaded
372K	dist/target/parallel-world/spark312/com/nvidia/shaded
372K	dist/target/parallel-world/spark320/com/nvidia/shaded
15M	dist/target/parallel-world/spark3xx-common/com/nvidia/shade
```

Instead of storing 15M per shim we end up storing 312K delta per shim

This will only improve once we have commons per FEATURE versions lines:
3.0.x, 3.1.x, 3.2.x

Signed-off-by: Gera Shegalov <gera@apache.org>
@gerashegalov gerashegalov self-assigned this Sep 21, 2021
@gerashegalov gerashegalov added the build Related to CI / CD or cleanly building label Sep 21, 2021
@gerashegalov gerashegalov added this to the Sep 13 - Sep 24 milestone Sep 21, 2021
@gerashegalov
Copy link
Collaborator Author

build

@tgravescs tgravescs merged commit 99beba1 into NVIDIA:branch-21.10 Sep 21, 2021
@gerashegalov gerashegalov deleted the dedupeASMShadedClasses branch September 21, 2021 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants