-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic conversion to shimplified directory structure [databricks] #7222
Automatic conversion to shimplified directory structure [databricks] #7222
Conversation
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
build |
How do we add a new shim according to this new comment way ? |
To add a new spark version support, it'll require us to update the comments in all the necessary files one by one? |
After selecting a Spark3xx profile in IDEA Maven panel, seems IDEA can't automatically add the codes in |
I do not understand how this patch makes the codebase easier to work with. Now we have encoded into a comment the information on what version of Spark should each file go to? This seems like a burden to maintain and I do not know what makes the current method so bad that we need to change things this drastically. |
TODO bug with tests? Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
- and fix regex Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Command executed: ```bash mvn generate-sources -Dshimplify=true -Dshimplify.move=true ``` Verified with: 1) Build a two-shim jar ```bash mvn package -Dbuildver=321 -DskipTests -Dskip -Dmaven.jadadoc.skip -Ddist.jar.compress=false mvn package -Dbuildver=331 -DskipTests -Dskip -Dmaven.jadadoc.skip -Ddist.jar.compress=false 2) Run a smoke test for both shims ```bash SPARK_HOME=~/dist/spark-3.2.1-bin-hadoop3 NUM_LOCAL_EXECS=2 PYSP_TEST_spark_rapids_shuffle_mode=MULTITHREADED PYSP_TEST_spark_rapids_shuffle_multiThreaded_writer_threads=2 PYSP_TEST_spark_rapids_shuffle_multiThreaded_reader_threads=2 PYSP_TEST_spark_shuffle_manager=com.nvidia.spark.rapids.spark321.RapidsShuffleManager PYSP_TEST_spark_rapids_memory_gpu_minAllocFraction=0 PYSP_TEST_spark_rapids_memory_gpu_maxAllocFraction=0.1 PYSP_TEST_spark_rapids_memory_gpu_allocFraction=0.1 ./integration_tests/run_pyspark_from_build.sh -k test_hash_grpby_sum SPARK_HOME=~/dist/spark-3.3.1-bin-hadoop3 NUM_LOCAL_EXECS=2 PYSP_TEST_spark_rapids_shuffle_mode=MULTITHREADED PYSP_TEST_spark_rapids_shuffle_multiThreaded_writer_threads=2 PYSP_TEST_spark_rapids_shuffle_multiThreaded_reader_threads=2 PYSP_TEST_spark_shuffle_manager=com.nvidia.spark.rapids.spark331.RapidsShuffleManager PYSP_TEST_spark_rapids_memory_gpu_minAllocFraction=0 PYSP_TEST_spark_rapids_memory_gpu_maxAllocFraction=0.1 PYSP_TEST_spark_rapids_memory_gpu_allocFraction=0.1 ./integration_tests/run_pyspark_from_build.sh -k test_hash_grpby_sum ``` Signed-off-by: Gera Shegalov <gera@apache.org>
build |
build |
1 similar comment
build |
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
Signed-off-by: Gera Shegalov <gera@apache.org>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the long delay in reviewing this again. I wasn't able to review every single file, but what I did review looks correct to me and tests passing is a huge part of verifying this change. Thanks @gerashegalov for all the hard work on this!
Command executed:
Verified with:
SPARK_HOME=~/gits/apache/spark NUM_LOCAL_EXECS=2 PYSP_TEST_spark_rapids_shuffle_mode=MULTITHREADED PYSP_TEST_spark_rapids_shuffle_multiThreaded_writer_threads=2 PYSP_TEST_spark_rapids_shuffle_multiThreaded_reader_threads=2 PYSP_TEST_spark_shuffle_manager=com.nvidia.spark.rapids.spark314.RapidsShuffleManager PYSP_TEST_spark_rapids_memory_gpu_minAllocFraction=0 PYSP_TEST_spark_rapids_memory_gpu_maxAllocFraction=0.1 PYSP_TEST_spark_rapids_memory_gpu_allocFraction=0.1 ./integration_tests/run_pyspark_from_build.sh -k test_hash_grpby_sum
SPARK_HOME=~/dist/spark-3.2.1-bin-hadoop3.2 NUM_LOCAL_EXECS=2 PYSP_TEST_spark_rapids_shuffle_mode=MULTITHREADED PYSP_TEST_spark_rapids_shuffle_multiThreaded_writer_threads=2 PYSP_TEST_spark_rapids_shuffle_multiThreaded_reader_threads=2 PYSP_TEST_spark_shuffle_manager=com.nvidia.spark.rapids.spark321.RapidsShuffleManager PYSP_TEST_spark_rapids_memory_gpu_minAllocFraction=0 PYSP_TEST_spark_rapids_memory_gpu_maxAllocFraction=0.1 PYSP_TEST_spark_rapids_memory_gpu_allocFraction=0.1 ./integration_tests/run_pyspark_from_build.sh -k test_hash_grpby_sum
SPARK_HOME=~/dist/spark-3.3.1-bin-hadoop3 NUM_LOCAL_EXECS=2 PYSP_TEST_spark_rapids_shuffle_mode=MULTITHREADED PYSP_TEST_spark_rapids_shuffle_multiThreaded_writer_threads=2 PYSP_TEST_spark_rapids_shuffle_multiThreaded_reader_threads=2 PYSP_TEST_spark_shuffle_manager=com.nvidia.spark.rapids.spark331.RapidsShuffleManager PYSP_TEST_spark_rapids_memory_gpu_minAllocFraction=0 PYSP_TEST_spark_rapids_memory_gpu_maxAllocFraction=0.1 PYSP_TEST_spark_rapids_memory_gpu_allocFraction=0.1 ./integration_tests/run_pyspark_from_build.sh -k test_hash_grpby_sum
Separate PR will be posted removing old sparkXYZ.sources definitions for the previous build. We'll merge it once we are confident in the transition robustness.
Signed-off-by: Gera Shegalov gera@apache.org