forked from NVIDIA/spark-rapids
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebase Databricks 14.3
feature branch to 24.12
#5
Closed
mythrocks
wants to merge
63
commits into
razajafri:SP-10661-db-14.3
from
mythrocks:databricks-14.3-rebased-to-24.12
Closed
Rebase Databricks 14.3
feature branch to 24.12
#5
mythrocks
wants to merge
63
commits into
razajafri:SP-10661-db-14.3
from
mythrocks:databricks-14.3-rebased-to-24.12
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…arily (NVIDIA#11469) Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
…abricks] (NVIDIA#11466) * Switch to a regular try Signed-off-by: Gera Shegalov <gera@apache.org> * drop Maven tarball Signed-off-by: Gera Shegalov <gera@apache.org> * unused import Signed-off-by: Gera Shegalov <gera@apache.org> * repro Signed-off-by: Gera Shegalov <gera@apache.org> --------- Signed-off-by: Gera Shegalov <gera@apache.org>
…s temporarily (NVIDIA#11469)" (NVIDIA#11473) This reverts commit 5beeba8. Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
Signed-off-by: Peixin Li <pxLi@nyu.edu>
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
NVIDIA#11449) * Support yyyyMMdd in GetTimestamp operator for LEGACY mode Signed-off-by: Chong Gao <res_life@163.com> Co-authored-by: Chong Gao <res_life@163.com>
… [databricks] (NVIDIA#11462) * Support non-UTC timezone for casting from date type to timestamp type Signed-off-by: Chong Gao <res_life@163.com> Co-authored-by: Chong Gao <res_life@163.com>
* Install cuDF-py against python 3.10 on Databricks Fix on Databricks runtime for : NVIDIA#11394 Enable the udf_cudf_test test case for Databricks-13.3 Rapids 24.10+ drops python 3.9 or below conda packages. ref: https://docs.rapids.ai/notices/rsn0040/ Install cuDF-py packages against python 3.10 and above on Databricks runtime to run UDF cuDF tests, because on DB-13.3 Conda is not installed by default. Signed-off-by: timl <timl@nvidia.com> * Check if 'conda' exists to make the if/else expression more readable Signed-off-by: timl <timl@nvidia.com> --------- Signed-off-by: timl <timl@nvidia.com>
* add parquet column index ut test Signed-off-by: fejiang <fejiang@nvidia.comm> * change Signed-off-by: fejiang <fejiang@nvidia.comm> * added parquet suite Signed-off-by: fejiang <fejiang@nvidia.com> * pom changed Signed-off-by: fejiang <fejiang@nvidia.com> * DeltaEncoding Suite Signed-off-by: fejiang <fejiang@nvidia.com> * enable more suites Signed-off-by: fejiang <fejiang@nvidia.com> * remove ignored case Signed-off-by: fejiang <fejiang@nvidia.com> * format Signed-off-by: fejiang <fejiang@nvidia.com> * added ignored cases Signed-off-by: fejiang <fejiang@nvidia.com> * change to parquet hadoop version Signed-off-by: fejiang <fejiang@nvidia.comm> * remove parquet.version Signed-off-by: fejiang <fejiang@nvidia.comm> * adding scope and classifier Signed-off-by: fejiang <fejiang@nvidia.comm> * pom remove unused Signed-off-by: fejiang <fejiang@nvidia.com> * pom chang3 2.13 Signed-off-by: fejiang <fejiang@nvidia.com> * add schema suite Signed-off-by: fejiang <fejiang@nvidia.comm> * remove dataframe Signed-off-by: fejiang <fejiang@nvidia.comm> * RapidsParquetThriftCompatibilitySuite Signed-off-by: fejiang <fejiang@nvidia.com> * ThriftCompaSuite added Signed-off-by: fejiang <fejiang@nvidia.com> * more suites but the RowIndexSuite one Signed-off-by: fejiang <fejiang@nvidia.com> * formatting issues Signed-off-by: fejiang <fejiang@nvidia.com> * exlude SPARK-36803: Signed-off-by: fejiang <fejiang@nvidia.comm> * setting change Signed-off-by: fejiang <fejiang@nvidia.comm> * setting change Signed-off-by: fejiang <fejiang@nvidia.comm> * adjust order Signed-off-by: fejiang <fejiang@nvidia.comm> * adjust settings Signed-off-by: fejiang <fejiang@nvidia.comm> * adjust settings Signed-off-by: fejiang <fejiang@nvidia.comm> * RapidsParquetThriftCompatibilitySuite settings * known issue added Signed-off-by: fejiang <fejiang@nvidia.com> * format new line Signed-off-by: fejiang <fejiang@nvidia.com> * known issue added Signed-off-by: fejiang <fejiang@nvidia.com> * RapidsParquetDeltaByteArrayEncodingSuite Signed-off-by: fejiang <fejiang@nvidia.comm> * RapidsParquetAvroCompatibilitySuite Signed-off-by: fejiang <fejiang@nvidia.comm> * ParquetFiledIdSchemaSuite and Avro suite added * pom Avro suite modified * ParquetFileFormatSuite added * RapidsParquetRebaseDatetimeSuite and QuerySuite added * RapidsParquetSchemaPruningSuite added * setting adjust Signed-off-by: fejiang <fejiang@nvidia.com> * setting adjust Signed-off-by: fejiang <fejiang@nvidia.com> * UT adjuct exclude added Signed-off-by: fejiang <fejiang@nvidia.com> * RapidsParquetThriftCompatibilitySuite adjust setting Signed-off-by: fejiang <fejiang@nvidia.com> * comment Create parquet table with compression Signed-off-by: fejiang <fejiang@nvidia.com> * SPARK_HOME NOT FOUND issue solved. Signed-off-by: fejiang <fejiang@nvidia.com> * enabling more suite Signed-off-by: fejiang <fejiang@nvidia.com> * remove exclude from RapidsParquetFieldIdIOSuite Signed-off-by: fejiang <fejiang@nvidia.com> * formate and remove parquet files Signed-off-by: fejiang <fejiang@nvidia.com> * comment setting Signed-off-by: fejiang <fejiang@nvidia.com> * pom modified and remove unnecess case Signed-off-by: fejiang <fejiang@nvidia.com> --------- Signed-off-by: fejiang <fejiang@nvidia.comm> Signed-off-by: fejiang <fejiang@nvidia.com> Co-authored-by: fejiang <fejiang@nvidia.comm>
Keep the rapids JNI and private dependency version at 24.10.0-SNAPSHOT until the nightly CI for the branch-24.12 branch is complete. Track the dependency update process at: NVIDIA#11492 Signed-off-by: nvauto <70000568+nvauto@users.noreply.github.com>
…0798) * optimzing Expand+Aggregate in sqlw with many count distinct Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> * simplify Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> * add comment Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> * address comments Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> --------- Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
To fix: NVIDIA#11502 Download jars using wget instead of 'mvn dependency:get' to fix 'missing intermediate jars' failures, as we stopped deploying these intermediate jars since version 24.10 Signed-off-by: timl <timl@nvidia.com>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
* Support legacy mode for yyyymmdd format Signed-off-by: Chong Gao <res_life@163.com> Co-authored-by: Chong Gao <res_life@163.com>
* quick workaround to make image build work Signed-off-by: Peixin Li <pxLi@nyu.edu> * use mamba directly --------- Signed-off-by: Peixin Li <pxLi@nyu.edu>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
* add max memory watermark metric Signed-off-by: Zach Puller <zpuller@nvidia.com> --------- Signed-off-by: Zach Puller <zpuller@nvidia.com>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
* Updated parameters to enable file overwriting when dumping. Signed-off-by: ustcfy <yafeng@nvidia.com> * Validate LORE dump root path before execution Signed-off-by: ustcfy <yafeng@nvidia.com> * Add loreOutputRootPathChecked map for tracking lore output root path checks. Signed-off-by: ustcfy <yafeng@nvidia.com> * Delay path and filesystem initialization until actually needed. Signed-off-by: ustcfy <yafeng@nvidia.com> * Add test and update dev/lore.md doc. Signed-off-by: ustcfy <yafeng@nvidia.com> * Format code to ensure line length does not exceed 100 characters Signed-off-by: ustcfy <fengyan_@mail.ustc.edu.cn> * Format code to ensure line length does not exceed 100 characters Signed-off-by: ustcfy <fengyan_@mail.ustc.edu.cn> * Improved resource management by using withResource. Signed-off-by: ustcfy <fengyan_@mail.ustc.edu.cn> * Update docs/dev/lore.md Co-authored-by: Renjie Liu <liurenjie2008@gmail.com> * Improved resource management by using withResource. Signed-off-by: ustcfy <fengyan_@mail.ustc.edu.cn> * Removed for FileSystem instance. Signed-off-by: ustcfy <fengyan_@mail.ustc.edu.cn> * Update docs/dev/lore.md Co-authored-by: Gera Shegalov <gshegalov@nvidia.com> --------- Signed-off-by: ustcfy <yafeng@nvidia.com> Signed-off-by: ustcfy <fengyan_@mail.ustc.edu.cn> Co-authored-by: Renjie Liu <liurenjie2008@gmail.com> Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
Signed-off-by: Peixin Li <pxLi@nyu.edu>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
…lanca timezone and LEGACY mode (NVIDIA#11567) Signed-off-by: Chong Gao <res_life@163.com>
…CI (NVIDIA#11544) Signed-off-by: Chong Gao <res_life@163.com> Co-authored-by: Chong Gao <res_life@163.com>
…IA#11561) Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
* implement watermark Signed-off-by: Zach Puller <zpuller@nvidia.com> * consolidate/fix disk spill metric Signed-off-by: Zach Puller <zpuller@nvidia.com> --------- Signed-off-by: Zach Puller <zpuller@nvidia.com>
…DIA#11569) Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
Signed-off-by: Gera Shegalov <gera@apache.org>
Fix merge conflict with branch-24.10
…A#11559) * Spark 4: Addressed cast_test.py failures. Fixes NVIDIA#11009 and NVIDIA#11530. This commit addresses the test failures in cast_test.py, on Spark 4.0. These generally have to do with changes in behaviour of Spark when ANSI mode is enabled. In these cases, the tests have been split out into ANSI=on and ANSI=off. The bugs uncovered from the tests have been spun into their own issues; fixing all of them was beyond the scope of this change. Signed-off-by: MithunR <mithunr@nvidia.com>
* use task id as tie breaker Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> * save threadlocal lookup Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> --------- Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
* avoid long tail tasks due to PrioritySemaphore (NVIDIA#11574) * use task id as tie breaker Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> * save threadlocal lookup Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> --------- Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> * addressing jason's comment Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> --------- Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
* Fix collection_ops_tests for Spark 4.0. Fixes NVIDIA#11011. This commit fixes the failures in `collection_ops_tests` on Spark 4.0. On all versions of Spark, when a Sequence is collected with rows that exceed MAX_INT, an exception is thrown indicating that the collected Sequence/array is larger than permissible. The different versions of Spark vary in the contents of the exception message. On Spark 4, one sees that the error message now contains more information than all prior versions, including: 1. The name of the op causing the error 2. The errant sequence size This commit introduces a shim to make this new information available in the exception. Note that this shim does not fit cleanly in RapidsErrorUtils, because there are differences within major Spark versions. For instance, Spark 3.4.0-1 have a different message as compared to 3.4.2 and 3.4.3. Likewise, the differences in 3.5.0, 3.5.1, 3.5.2. Signed-off-by: MithunR <mithunr@nvidia.com> * Fixed formatting error. * Review comments. This moves the construction of the long-sequence error strings into RapidsErrorUtils. The process involved introducing many new RapidsErrorUtils classes, and using mix-ins of concrete implementations for the error-string construction. * Added missing shim tag for 3.5.2. * Review comments: Fixed code style. * Reformatting, per project guideline. * Fixed missed whitespace problem. --------- Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: liyuan <yuali@nvidia.com>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
\nWait for the pre-merge CI job to SUCCEED Signed-off-by: nvauto <70000568+nvauto@users.noreply.github.com>
* Update latest changelog [skip ci] Update change log with CLI: \n\n scripts/generate-changelog --token=<GIT_TOKEN> --releases=24.08,24.10 Signed-off-by: nvauto <70000568+nvauto@users.noreply.github.com> * Update changelog Signed-off-by: timl <timl@nvidia.com> * Update changelog Signed-off-by: timl <timl@nvidia.com> --------- Signed-off-by: nvauto <70000568+nvauto@users.noreply.github.com> Signed-off-by: timl <timl@nvidia.com> Co-authored-by: timl <timl@nvidia.com>
…-11604 Fix auto merge conflict 11604 [skip ci]
* xfail regexp tests to unblock CI Signed-off-by: Jason Lowe <jlowe@nvidia.com> * Disable failing regexp unit test to unblock CI --------- Signed-off-by: Jason Lowe <jlowe@nvidia.com>
* Remove an unused config shuffle.spillThreads Signed-off-by: Alessandro Bellina <abellina@nvidia.com> * update configs.md --------- Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
Needed minor modifications. Signed-off-by: MithunR <mithunr@nvidia.com>
Closing this PR in favour of NVIDIA#11635. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I have rebased
SP-10661-db-14.3
tobranch-24.12
, if only to make the recentRapidsErrorUtils
refactor available in this branch.