Skip to content

Commit

Permalink
Merge branch 'branch-21.06' of https://github.com/NVIDIA/spark-rapids
Browse files Browse the repository at this point in the history
…into branch-0.6-cloudera
  • Loading branch information
rishi committed May 25, 2021
2 parents fa1f30b + 3f64354 commit c2a6c71
Show file tree
Hide file tree
Showing 137 changed files with 6,156 additions and 1,930 deletions.
7 changes: 3 additions & 4 deletions api_validation/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,10 @@
<parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-parent</artifactId>
<version>0.6.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
<version>21.06.0-SNAPSHOT</version>
</parent>
<artifactId>rapids-4-spark-api-validation</artifactId>
<version>0.6.0-SNAPSHOT</version>
<version>21.06.0-SNAPSHOT</version>

<profiles>
<profile>
Expand Down Expand Up @@ -96,7 +95,7 @@
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-shims-aggregator_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<version>21.06.0-SNAPSHOT</version>
<scope>provided</scope>
</dependency>
</dependencies>
Expand Down
5 changes: 2 additions & 3 deletions dist/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,12 @@
<parent>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-parent</artifactId>
<version>0.6.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
<version>21.06.0-SNAPSHOT</version>
</parent>
<artifactId>rapids-4-spark_2.12</artifactId>
<name>RAPIDS Accelerator for Apache Spark Distribution</name>
<description>Creates the distribution package of the RAPIDS plugin for Apache Spark</description>
<version>0.6.0-SNAPSHOT</version>
<version>21.06.0-SNAPSHOT</version>

<dependencies>
<dependency>
Expand Down
6 changes: 4 additions & 2 deletions docs/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The following is the list of options that `rapids-plugin-4-spark` supports.
On startup use: `--conf [conf key]=[conf value]`. For example:

```
${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-0.6.0-SNAPSHOT.jar,cudf-0.20-SNAPSHOT-cuda11.jar' \
${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-21.06.0-SNAPSHOT.jar,cudf-21.06.0-SNAPSHOT-cuda11.jar' \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.rapids.sql.incompatibleOps.enabled=true
```
Expand Down Expand Up @@ -49,6 +49,7 @@ Name | Description | Default Value
<a name="shuffle.transport.earlyStart"></a>spark.rapids.shuffle.transport.earlyStart|Enable early connection establishment for RAPIDS Shuffle|true
<a name="shuffle.transport.earlyStart.heartbeatInterval"></a>spark.rapids.shuffle.transport.earlyStart.heartbeatInterval|Shuffle early start heartbeat interval (milliseconds)|5000
<a name="shuffle.transport.maxReceiveInflightBytes"></a>spark.rapids.shuffle.transport.maxReceiveInflightBytes|Maximum aggregate amount of bytes that be fetched at any given time from peers during shuffle|1073741824
<a name="shuffle.ucx.activeMessages.mode"></a>spark.rapids.shuffle.ucx.activeMessages.mode|Set to 'rndv', 'eager', or 'auto' to indicate what UCX Active Message mode to use. We set 'rndv' (Rendezvous) by default because UCX 1.10.x doesn't support 'eager' fully. This restriction can be lifted if the user is running UCX 1.11+.|rndv
<a name="shuffle.ucx.managementServerHost"></a>spark.rapids.shuffle.ucx.managementServerHost|The host to be used to start the management server|null
<a name="shuffle.ucx.useWakeup"></a>spark.rapids.shuffle.ucx.useWakeup|When set to true, use UCX's event-based progress (epoll) in order to wake up the progress thread when needed, instead of a hot loop.|true
<a name="sql.batchSizeBytes"></a>spark.rapids.sql.batchSizeBytes|Set the target number of bytes for a GPU batch. Splits sizes for input data is covered by separate configs. The maximum setting is 2 GB to avoid exceeding the cudf row count limit of a column.|2147483647
Expand Down Expand Up @@ -155,7 +156,7 @@ Name | SQL Function(s) | Description | Default Value | Notes
<a name="sql.expression.Ceil"></a>spark.rapids.sql.expression.Ceil|`ceiling`, `ceil`|Ceiling of a number|true|None|
<a name="sql.expression.CheckOverflow"></a>spark.rapids.sql.expression.CheckOverflow| |CheckOverflow after arithmetic operations between DecimalType data|true|None|
<a name="sql.expression.Coalesce"></a>spark.rapids.sql.expression.Coalesce|`coalesce`|Returns the first non-null argument if exists. Otherwise, null|true|None|
<a name="sql.expression.Concat"></a>spark.rapids.sql.expression.Concat|`concat`|String concatenate NO separator|true|None|
<a name="sql.expression.Concat"></a>spark.rapids.sql.expression.Concat|`concat`|List/String concatenate|true|None|
<a name="sql.expression.Contains"></a>spark.rapids.sql.expression.Contains| |Contains|true|None|
<a name="sql.expression.Cos"></a>spark.rapids.sql.expression.Cos|`cos`|Cosine|true|None|
<a name="sql.expression.Cosh"></a>spark.rapids.sql.expression.Cosh|`cosh`|Hyperbolic cosine|true|None|
Expand All @@ -172,6 +173,7 @@ Name | SQL Function(s) | Description | Default Value | Notes
<a name="sql.expression.DayOfWeek"></a>spark.rapids.sql.expression.DayOfWeek|`dayofweek`|Returns the day of the week (1 = Sunday...7=Saturday)|true|None|
<a name="sql.expression.DayOfYear"></a>spark.rapids.sql.expression.DayOfYear|`dayofyear`|Returns the day of the year from a date or timestamp|true|None|
<a name="sql.expression.Divide"></a>spark.rapids.sql.expression.Divide|`/`|Division|true|None|
<a name="sql.expression.ElementAt"></a>spark.rapids.sql.expression.ElementAt|`element_at`|Returns element of array at given(1-based) index in value if column is array. Returns value for the given key in value if column is map.|true|None|
<a name="sql.expression.EndsWith"></a>spark.rapids.sql.expression.EndsWith| |Ends with|true|None|
<a name="sql.expression.EqualNullSafe"></a>spark.rapids.sql.expression.EqualNullSafe|`<=>`|Check if the values are equal including nulls <=>|true|None|
<a name="sql.expression.EqualTo"></a>spark.rapids.sql.expression.EqualTo|`=`, `==`|Check if the values are equal|true|None|
Expand Down
4 changes: 2 additions & 2 deletions docs/get-started/Dockerfile.cuda
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,8 @@ COPY spark-3.0.2-bin-hadoop3.2/examples /opt/spark/examples
COPY spark-3.0.2-bin-hadoop3.2/kubernetes/tests /opt/spark/tests
COPY spark-3.0.2-bin-hadoop3.2/data /opt/spark/data

COPY cudf-0.20-SNAPSHOT-cuda11.jar /opt/sparkRapidsPlugin
COPY rapids-4-spark_2.12-0.6.0-SNAPSHOT.jar /opt/sparkRapidsPlugin
COPY cudf-21.06.0-SNAPSHOT-cuda11.jar /opt/sparkRapidsPlugin
COPY rapids-4-spark_2.12-21.06.0-SNAPSHOT.jar /opt/sparkRapidsPlugin
COPY getGpusResources.sh /opt/sparkRapidsPlugin

RUN mkdir /opt/spark/python
Expand Down
8 changes: 4 additions & 4 deletions docs/get-started/getting-started-on-prem.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,17 +53,17 @@ CUDA and will not run on other versions. The jars use a maven classifier to keep
- CUDA 11.0/11.1/11.2 => classifier cuda11

For example, here is a sample version of the jars and cudf with CUDA 11.0 support:
- cudf-0.20-SNAPSHOT-cuda11.jar
- rapids-4-spark_2.12-0.6.0-SNAPSHOT.jar
- cudf-21.06.0-SNAPSHOT-cuda11.jar
- rapids-4-spark_2.12-21.06.0-SNAPSHOT.jar
jar that your version of the accelerator depends on.


For simplicity export the location to these jars. This example assumes the sample jars above have
been placed in the `/opt/sparkRapidsPlugin` directory:
```shell
export SPARK_RAPIDS_DIR=/opt/sparkRapidsPlugin
export SPARK_CUDF_JAR=${SPARK_RAPIDS_DIR}/cudf-0.20-SNAPSHOT-cuda11.jar
export SPARK_RAPIDS_PLUGIN_JAR=${SPARK_RAPIDS_DIR}/rapids-4-spark_2.12-0.6.0-SNAPSHOT.jar
export SPARK_CUDF_JAR=${SPARK_RAPIDS_DIR}/cudf-21.06.0-SNAPSHOT-cuda11.jar
export SPARK_RAPIDS_PLUGIN_JAR=${SPARK_RAPIDS_DIR}/rapids-4-spark_2.12-21.06.0-SNAPSHOT.jar
```

## Install the GPU Discovery Script
Expand Down
Loading

0 comments on commit c2a6c71

Please sign in to comment.