Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic conversion to shimplified directory structure [databricks] #7222

Merged
merged 120 commits into from
Mar 6, 2023

Conversation

gerashegalov
Copy link
Collaborator

@gerashegalov gerashegalov commented Dec 1, 2022

Command executed:

mvn generate-sources -Dshimplify=true -Dshimplify.move=true

Verified with:

  1. Build a multi-shim jar
mvn clean
echo 314 321 331 | xargs -t -n 1 bash -c 'mvn package -pl aggregator -am -Dbuildver=$1  -DskipTests -Dskip -Dmaven.jadadoc.skip' '{}'
mvn package -pl dist -Dincluded_buildvers=314,321,331 -Ddist.jar.compress=false 
  1. Run a smoke test for all shims
SPARK_HOME=~/gits/apache/spark NUM_LOCAL_EXECS=2 PYSP_TEST_spark_rapids_shuffle_mode=MULTITHREADED PYSP_TEST_spark_rapids_shuffle_multiThreaded_writer_threads=2 PYSP_TEST_spark_rapids_shuffle_multiThreaded_reader_threads=2 PYSP_TEST_spark_shuffle_manager=com.nvidia.spark.rapids.spark314.RapidsShuffleManager PYSP_TEST_spark_rapids_memory_gpu_minAllocFraction=0 PYSP_TEST_spark_rapids_memory_gpu_maxAllocFraction=0.1 PYSP_TEST_spark_rapids_memory_gpu_allocFraction=0.1 ./integration_tests/run_pyspark_from_build.sh -k test_hash_grpby_sum
SPARK_HOME=~/dist/spark-3.2.1-bin-hadoop3.2 NUM_LOCAL_EXECS=2 PYSP_TEST_spark_rapids_shuffle_mode=MULTITHREADED PYSP_TEST_spark_rapids_shuffle_multiThreaded_writer_threads=2 PYSP_TEST_spark_rapids_shuffle_multiThreaded_reader_threads=2 PYSP_TEST_spark_shuffle_manager=com.nvidia.spark.rapids.spark321.RapidsShuffleManager PYSP_TEST_spark_rapids_memory_gpu_minAllocFraction=0 PYSP_TEST_spark_rapids_memory_gpu_maxAllocFraction=0.1 PYSP_TEST_spark_rapids_memory_gpu_allocFraction=0.1 ./integration_tests/run_pyspark_from_build.sh -k test_hash_grpby_sum
SPARK_HOME=~/dist/spark-3.3.1-bin-hadoop3 NUM_LOCAL_EXECS=2 PYSP_TEST_spark_rapids_shuffle_mode=MULTITHREADED PYSP_TEST_spark_rapids_shuffle_multiThreaded_writer_threads=2 PYSP_TEST_spark_rapids_shuffle_multiThreaded_reader_threads=2 PYSP_TEST_spark_shuffle_manager=com.nvidia.spark.rapids.spark331.RapidsShuffleManager PYSP_TEST_spark_rapids_memory_gpu_minAllocFraction=0 PYSP_TEST_spark_rapids_memory_gpu_maxAllocFraction=0.1 PYSP_TEST_spark_rapids_memory_gpu_allocFraction=0.1 ./integration_tests/run_pyspark_from_build.sh -k test_hash_grpby_sum

Separate PR will be posted removing old sparkXYZ.sources definitions for the previous build. We'll merge it once we are confident in the transition robustness.

Signed-off-by: Gera Shegalov gera@apache.org

Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
@gerashegalov
Copy link
Collaborator Author

build

@firestarman
Copy link
Collaborator

firestarman commented Dec 2, 2022

How do we add a new shim according to this new comment way ?

@GaryShen2008
Copy link
Collaborator

To add a new spark version support, it'll require us to update the comments in all the necessary files one by one?

@res-life
Copy link
Collaborator

res-life commented Dec 2, 2022

After selecting a Spark3xx profile in IDEA Maven panel, seems IDEA can't automatically add the codes in target/src as source code.
There are some compile errors in the IDEA GUI although actually they are not and the whole project can be compiled by "mvn clean install".

@abellina
Copy link
Collaborator

abellina commented Dec 2, 2022

I do not understand how this patch makes the codebase easier to work with. Now we have encoded into a comment the information on what version of Spark should each file go to? This seems like a burden to maintain and I do not know what makes the current method so bad that we need to change things this drastically.

@abellina abellina self-requested a review December 2, 2022 14:35
TODO bug with tests?

Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
- and fix regex

Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Command executed:

```bash
mvn generate-sources -Dshimplify=true -Dshimplify.move=true
```

Verified with:

1) Build a two-shim jar

```bash
mvn package -Dbuildver=321  -DskipTests -Dskip -Dmaven.jadadoc.skip -Ddist.jar.compress=false
mvn package -Dbuildver=331  -DskipTests -Dskip -Dmaven.jadadoc.skip -Ddist.jar.compress=false

2) Run a smoke test for both shims

```bash
SPARK_HOME=~/dist/spark-3.2.1-bin-hadoop3 NUM_LOCAL_EXECS=2 PYSP_TEST_spark_rapids_shuffle_mode=MULTITHREADED PYSP_TEST_spark_rapids_shuffle_multiThreaded_writer_threads=2 PYSP_TEST_spark_rapids_shuffle_multiThreaded_reader_threads=2 PYSP_TEST_spark_shuffle_manager=com.nvidia.spark.rapids.spark321.RapidsShuffleManager PYSP_TEST_spark_rapids_memory_gpu_minAllocFraction=0 PYSP_TEST_spark_rapids_memory_gpu_maxAllocFraction=0.1 PYSP_TEST_spark_rapids_memory_gpu_allocFraction=0.1 ./integration_tests/run_pyspark_from_build.sh -k test_hash_grpby_sum

SPARK_HOME=~/dist/spark-3.3.1-bin-hadoop3 NUM_LOCAL_EXECS=2 PYSP_TEST_spark_rapids_shuffle_mode=MULTITHREADED PYSP_TEST_spark_rapids_shuffle_multiThreaded_writer_threads=2 PYSP_TEST_spark_rapids_shuffle_multiThreaded_reader_threads=2 PYSP_TEST_spark_shuffle_manager=com.nvidia.spark.rapids.spark331.RapidsShuffleManager PYSP_TEST_spark_rapids_memory_gpu_minAllocFraction=0 PYSP_TEST_spark_rapids_memory_gpu_maxAllocFraction=0.1 PYSP_TEST_spark_rapids_memory_gpu_allocFraction=0.1 ./integration_tests/run_pyspark_from_build.sh -k test_hash_grpby_sum
```

Signed-off-by: Gera Shegalov <gera@apache.org>
@gerashegalov gerashegalov changed the title [WIP] Simplify shim source code layout [databricks] Automatic conversion to shimplified directory structure [databricks] Feb 23, 2023
@gerashegalov gerashegalov marked this pull request as ready for review February 23, 2023 22:43
@gerashegalov
Copy link
Collaborator Author

build

@gerashegalov gerashegalov added the task Work required that improves the product but is not user facing label Feb 23, 2023
@gerashegalov gerashegalov self-assigned this Feb 23, 2023
@gerashegalov
Copy link
Collaborator Author

build

1 similar comment
@gerashegalov
Copy link
Collaborator Author

build

Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
@gerashegalov
Copy link
Collaborator Author

build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the long delay in reviewing this again. I wasn't able to review every single file, but what I did review looks correct to me and tests passing is a huge part of verifying this change. Thanks @gerashegalov for all the hard work on this!

@gerashegalov gerashegalov merged commit c087d2e into NVIDIA:branch-23.04 Mar 6, 2023
@gerashegalov gerashegalov deleted the generateSymlinks branch March 6, 2023 17:58
@mattahrens mattahrens added the build Related to CI / CD or cleanly building label Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shimplify: Simplify shim source code management
9 participants