Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs for 21.08 release #3080

Merged
merged 12 commits into from
Aug 9, 2021
Merged
22 changes: 21 additions & 1 deletion docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,8 @@ Spark driver and executor logs with messages that are similar to the following:

### What is the right hardware setup to run GPU accelerated Spark?

Reference architectures should be available around Q1 2021.
GPU accelerated Spark can run on any NVIDIA Pascal or better GPU architecture, including Volta,
Turing or Ampere.

### What parts of Apache Spark are accelerated?

Expand Down Expand Up @@ -385,6 +386,25 @@ for this issue.
To fix it you can either disable the IOMMU, or you can disable using pinned memory by setting
[spark.rapids.memory.pinnedPool.size](configs.md#memory.pinnedPool.size) to 0.

### Why am I getting a buffer overflow error when using the KryoSerializer?
Buffer overflow will happen when trying to serialize an object larger than
[`spark.kryoserializer.buffer.max`](https://spark.apache.org/docs/latest/configuration.html#compression-and-serialization),
and may result in an error such as:
```
Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 636
at com.esotericsoftware.kryo.io.Output.require(Output.java:167)
at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:251)
at com.esotericsoftware.kryo.io.Output.write(Output.java:219)
at java.base/java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1859)
at java.base/java.io.ObjectOutputStream.write(ObjectOutputStream.java:712)
at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:123)
at java.base/java.io.DataOutputStream.write(DataOutputStream.java:107)
...
```
Try increasing the
[`spark.kryoserializer.buffer.max`](https://spark.apache.org/docs/latest/configuration.html#compression-and-serialization)
from a default of 64M to something larger, for example 512M.

### Is speculative execution supported?

Yes, speculative execution in Spark is fine with the RAPIDS Accelerator plugin.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ Optional:
You do not need to compile the jar yourself because you can download it from the Maven repository directly.

Here are 2 options:
1. Download the jar file from [Maven repository](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/21.06.0/)
1. Download the jar file from [Maven repository](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/21.08.0/)

2. Compile the jar from github repo
```bash
Expand Down
2 changes: 1 addition & 1 deletion docs/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The following is the list of options that `rapids-plugin-4-spark` supports.
On startup use: `--conf [conf key]=[conf value]`. For example:

```
${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-21.08.0-SNAPSHOT.jar,cudf-21.08.0-SNAPSHOT-cuda11.jar' \
${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-21.08.0.jar,cudf-21.08.0-cuda11.jar' \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-21.08.0.jar,cudf-21.08.0-cuda11.jar' \
${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-21.08.0.jar,cudf-21.08.2-cuda11.jar' \

--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.rapids.sql.incompatibleOps.enabled=true
```
Expand Down
2 changes: 1 addition & 1 deletion docs/demo/Databricks/generate-init-script-cuda11.ipynb
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-21.06.1.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.06.1/rapids-4-spark_2.12-21.06.1.jar\nsudo wget -O /databricks/jars/cudf-21.06.1-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/21.06.1/cudf-21.06.1-cuda11.jar\n\nsudo wget -O /etc/apt/preferences.d/cuda-repository-pin-600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin\nsudo wget -O ~/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb https://developer.download.nvidia.com/compute/cuda/11.0.3/local_installers/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb\nsudo dpkg -i ~/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb\nsudo apt-key add /var/cuda-repo-ubuntu1804-11-0-local/7fa2af80.pub\nsudo apt-get update\nsudo apt -y install cuda-toolkit-11-0\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0}
{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-21.08.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.08.0/rapids-4-spark_2.12-21.08.0.jar\nsudo wget -O /databricks/jars/cudf-21.08.0-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/21.08.0/cudf-21.08.0-cuda11.jar\n\nsudo wget -O /etc/apt/preferences.d/cuda-repository-pin-600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin\nsudo wget -O ~/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb https://developer.download.nvidia.com/compute/cuda/11.0.3/local_installers/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb\nsudo dpkg -i ~/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb\nsudo apt-key add /var/cuda-repo-ubuntu1804-11-0-local/7fa2af80.pub\nsudo apt-get update\nsudo apt -y install cuda-toolkit-11-0\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-21.08.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.08.0/rapids-4-spark_2.12-21.08.0.jar\nsudo wget -O /databricks/jars/cudf-21.08.0-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/21.08.0/cudf-21.08.0-cuda11.jar\n\nsudo wget -O /etc/apt/preferences.d/cuda-repository-pin-600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin\nsudo wget -O ~/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb https://developer.download.nvidia.com/compute/cuda/11.0.3/local_installers/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb\nsudo dpkg -i ~/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb\nsudo apt-key add /var/cuda-repo-ubuntu1804-11-0-local/7fa2af80.pub\nsudo apt-get update\nsudo apt -y install cuda-toolkit-11-0\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0}
{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-21.08.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.08.0/rapids-4-spark_2.12-21.08.0.jar\nsudo wget -O /databricks/jars/cudf-21.08.2-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/21.08.2/cudf-21.08.2-cuda11.jar\n\nsudo wget -O /etc/apt/preferences.d/cuda-repository-pin-600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin\nsudo wget -O ~/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb https://developer.download.nvidia.com/compute/cuda/11.0.3/local_installers/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb\nsudo dpkg -i ~/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb\nsudo apt-key add /var/cuda-repo-ubuntu1804-11-0-local/7fa2af80.pub\nsudo apt-get update\nsudo apt -y install cuda-toolkit-11-0\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0}

2 changes: 1 addition & 1 deletion docs/demo/Databricks/generate-init-script.ipynb
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-21.06.1.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.06.1/rapids-4-spark_2.12-21.06.1.jar\nsudo wget -O /databricks/jars/cudf-21.06.1-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/21.06.1/cudf-21.06.1-cuda11.jar\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0}
{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-21.08.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.08.0/rapids-4-spark_2.12-21.08.0.jar\nsudo wget -O /databricks/jars/cudf-21.08.0-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/21.08.0/cudf-21.08.0-cuda11.jar\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-21.08.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.08.0/rapids-4-spark_2.12-21.08.0.jar\nsudo wget -O /databricks/jars/cudf-21.08.0-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/21.08.0/cudf-21.08.0-cuda11.jar\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0}
{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-21.08.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.08.0/rapids-4-spark_2.12-21.08.0.jar\nsudo wget -O /databricks/jars/cudf-21.08.2-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/21.08.2/cudf-21.08.2-cuda11.jar\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0}

6 changes: 6 additions & 0 deletions docs/dev/adaptive-query.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
---
layout: page
title: Adaptive Query Execution with the RAPIDS Accelerator for Apache Spark
nav_order: 1
parent: Developer Overview
---
# Adaptive Query Execution with the RAPIDS Accelerator for Apache Spark

The benefits of AQE are not specific to CPU execution and can provide
Expand Down
2 changes: 1 addition & 1 deletion docs/dev/nvtx_profiling.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
layout: page
title: NVTX Ranges
nav_order: 2
nav_order: 3
parent: Developer Overview
---
# Using NVTX Ranges with the RAPIDS Plugin for Spark
Expand Down
2 changes: 1 addition & 1 deletion docs/dev/testing.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
layout: page
title: Testing
nav_order: 1
nav_order: 2
parent: Developer Overview
---
An overview of testing can be found within the repository at:
Expand Down
57 changes: 56 additions & 1 deletion docs/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,61 @@ cuDF jar, that is either preinstalled in the Spark classpath on all nodes or sub
that uses the RAPIDS Accelerator For Apache Spark. See the [getting-started
guide](https://nvidia.github.io/spark-rapids/Getting-Started/) for more details.

## Release v21.08.0
Hardware Requirements:

The plugin is tested on the following architectures:

GPU Architecture: NVIDIA V100, T4 and A10/A30/A100 GPUs

Software Requirements:

OS: Ubuntu 18.04, Ubuntu 20.04 or CentOS 7, CentOS 8

CUDA & Nvidia Drivers*: 11.0-11.4 & v450.80.02+

Apache Spark 3.0.1, 3.0.2, 3.0.3, 3.1.1, 3.1.2, Cloudera CDP 7.1.6, 7.1.7, Databricks 7.3 ML LTS or 8.2 ML Runtime, and GCP Dataproc 2.0

Apache Hadoop 2.10+ or 3.1.1+ (3.1.1 for nvidia-docker version 2)

Python 3.6+, Scala 2.12, Java 8

*Some hardware may have a minimum driver version greater than v450.80.02+. Check the GPU spec sheet
for your hardware's minimum driver version.

### Download v21.08.0
* Download the [RAPIDS
Accelerator for Apache Spark 21.08.0 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.08.0/rapids-4-spark_2.12-21.08.0.jar)
* Download the [RAPIDS cuDF 21.08.0 jar](https://repo1.maven.org/maven2/ai/rapids/cudf/21.08.0/cudf-21.08.0-cuda11.jar)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Download the [RAPIDS cuDF 21.08.0 jar](https://repo1.maven.org/maven2/ai/rapids/cudf/21.08.0/cudf-21.08.0-cuda11.jar)
* Download the [RAPIDS cuDF 21.08.2 jar](https://repo1.maven.org/maven2/ai/rapids/cudf/21.08.2/cudf-21.08.2-cuda11.jar)


This package is built against CUDA 11.2 and has [CUDA forward
compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html) enabled. It is tested
on V100, T4, A30 and A100 GPUs with CUDA 11.0-11.4. For those using other types of GPUs which
do not have CUDA forward compatibility (for example, GeForce), CUDA 11.2 is required. Users will
need to ensure the minimum driver (450.80.02) and CUDA toolkit are installed on each Spark node.

### Release Notes
New functionality and performance improvements for this release include:
* Handling data sets that spill out of GPU memory for group by and windowing operations
* Running window rank and dense rank operations on the GPU
* Support for the `LEGACY` timestamp
* Unioning of nested structs
* Adoption of UCX 1.11 for improved error handling for RAPIDS Spark Accelerated Shuffle
* Ability to read cached data from the GPU on the supported Databricks runtimes
* Enabling Parquet writing of array data types from the GPU
* Optimized reads for small files for ORC
* Spark Qualification and Profiling Tools
* Additional filtering capabilities
* Reporting on data types
* Reporting on read data formats
* Ability to run the qualification tool on Spark 2.x logs
* Ability to run the tool on Apache Spark 3.x, AWS EMR 6.3.0, Dataproc 2.0, Microsoft Azure, and
Databricks 7.3 and 8.2 logs
* Improved qualification tool performance

For a detailed list of changes, please refer to the
[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md).
sameerz marked this conversation as resolved.
Show resolved Hide resolved

## Release v21.06.1
This is a patch release to address an issue with the plugin in the Databricks 7.3 ML LTS runtime.

Expand Down Expand Up @@ -99,7 +154,7 @@ need to ensure the minimum driver (450.80.02) and CUDA toolkit are installed on

### Release Notes
New functionality for this release includes:
* Support for running on Cloudera CDP 7.1.7 and Databricks 8.2 ML
* Support for running on Cloudera CDP 7.1.6, CDP 7.1.7 and Databricks 8.2 ML
* New functionality related to arrays:
* Concatenation of array columns
* Casting arrays of floats to arrays of doubles
Expand Down
4 changes: 2 additions & 2 deletions docs/get-started/Dockerfile.cuda
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,8 @@ COPY spark-3.0.2-bin-hadoop3.2/examples /opt/spark/examples
COPY spark-3.0.2-bin-hadoop3.2/kubernetes/tests /opt/spark/tests
COPY spark-3.0.2-bin-hadoop3.2/data /opt/spark/data

COPY cudf-21.08.0-SNAPSHOT-cuda11.jar /opt/sparkRapidsPlugin
COPY rapids-4-spark_2.12-21.08.0-SNAPSHOT.jar /opt/sparkRapidsPlugin
COPY cudf-21.08.0-cuda11.jar /opt/sparkRapidsPlugin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
COPY cudf-21.08.0-cuda11.jar /opt/sparkRapidsPlugin
COPY cudf-21.08.2-cuda11.jar /opt/sparkRapidsPlugin

COPY rapids-4-spark_2.12-21.08.0.jar /opt/sparkRapidsPlugin
COPY getGpusResources.sh /opt/sparkRapidsPlugin

RUN mkdir /opt/spark/python
Expand Down
6 changes: 3 additions & 3 deletions docs/get-started/getting-started-databricks.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,8 @@ cluster.
version:
- [Databricks 7.3 LTS
ML](https://docs.databricks.com/release-notes/runtime/7.3ml.html#system-environment) runs CUDA 10.1
Update 2. Users wishing to try 21.06.1 on Databricks 7.3 LTS ML will need to install the CUDA
11.0 toolkit on the cluster. This can be done with the [generate-init-script-cuda11.ipynb
Update 2. Users wishing to try 21.06.1 or later on Databricks 7.3 LTS ML will need to install the
CUDA 11.0 toolkit on the cluster. This can be done with the [generate-init-script-cuda11.ipynb
](../demo/Databricks/generate-init-script-cuda11.ipynb) init script, which installs both the RAPIDS
Spark plugin and the CUDA 11 toolkit.
- [Databricks 8.2
Expand Down Expand Up @@ -110,7 +110,7 @@ Spark plugin and the CUDA 11 toolkit.
```bash
spark.rapids.sql.python.gpu.enabled true
spark.python.daemon.module rapids.daemon_databricks
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-21.06.1.jar:/databricks/spark/python
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-21.08.0.jar:/databricks/spark/python
```

7. Once you’ve added the Spark config, click “Confirm and Restart”.
Expand Down
3 changes: 1 addition & 2 deletions docs/get-started/getting-started-gcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,8 +131,7 @@ submitted as a Dataproc job. The mortgage examples we use above are also availa
application](https://github.com/NVIDIA/spark-xgboost-examples/tree/spark-3/examples/apps/scala).
After [building the jar
files](https://github.com/NVIDIA/spark-xgboost-examples/blob/spark-3/getting-started-guides/building-sample-apps/scala.md)
they are available through maven `mvn package -Dcuda.classifier=cuda11-0`. In the 21.06 release,
CUDA 11.0/11.2 will be supported.
they are available through maven `mvn package -Dcuda.classifier=cuda11-0`.

Place the jar file `sample_xgboost_apps-0.2.2.jar` under the `gs://$GCS_BUCKET/scala/` folder by
running `gsutil cp target/sample_xgboost_apps-0.2.2.jar gs://$GCS_BUCKET/scala/`. To do this you
Expand Down
13 changes: 9 additions & 4 deletions docs/get-started/getting-started-on-prem.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,17 +55,17 @@ CUDA and will not run on other versions. The jars use a maven classifier to keep
- CUDA 11.0/11.1/11.2 => classifier cuda11

For example, here is a sample version of the jars and cudf with CUDA 11.0 support:
- cudf-21.08.0-SNAPSHOT-cuda11.jar
- rapids-4-spark_2.12-21.08.0-SNAPSHOT.jar
- cudf-21.08.0-cuda11.jar
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- cudf-21.08.0-cuda11.jar
- cudf-21.08.2-cuda11.jar

- rapids-4-spark_2.12-21.08.0.jar
jar that your version of the accelerator depends on.


For simplicity export the location to these jars. This example assumes the sample jars above have
been placed in the `/opt/sparkRapidsPlugin` directory:
```shell
export SPARK_RAPIDS_DIR=/opt/sparkRapidsPlugin
export SPARK_CUDF_JAR=${SPARK_RAPIDS_DIR}/cudf-21.08.0-SNAPSHOT-cuda11.jar
export SPARK_RAPIDS_PLUGIN_JAR=${SPARK_RAPIDS_DIR}/rapids-4-spark_2.12-21.08.0-SNAPSHOT.jar
export SPARK_CUDF_JAR=${SPARK_RAPIDS_DIR}/cudf-21.08.0-cuda11.jar
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export SPARK_CUDF_JAR=${SPARK_RAPIDS_DIR}/cudf-21.08.0-cuda11.jar
export SPARK_CUDF_JAR=${SPARK_RAPIDS_DIR}/cudf-21.08.2-cuda11.jar

export SPARK_RAPIDS_PLUGIN_JAR=${SPARK_RAPIDS_DIR}/rapids-4-spark_2.12-21.08.0.jar
```

## Install the GPU Discovery Script
Expand Down Expand Up @@ -313,6 +313,11 @@ and application.
of the job will run on the GPU then often you can run with less executor heap memory than would
be needed for the corresponding Spark job on the CPU.

In case of a "com.esotericsoftware.kryo.KryoException: Buffer overflow" error it is advisable to
increase the
[`spark.kryoserializer.buffer.max`](https://spark.apache.org/docs/latest/configuration.html#compression-and-serialization)
setting to a value higher than the default.

### Example Command Running on YARN
```shell
$SPARK_HOME/bin/spark-shell --master yarn \
Expand Down
2 changes: 2 additions & 0 deletions docs/supported_ops.md
Original file line number Diff line number Diff line change
Expand Up @@ -20899,6 +20899,7 @@ as `a` don't show up in the table. They are controlled by the rules for
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td>SinglePartition$</td>
<td>Single partitioning</td>
Expand All @@ -20922,6 +20923,7 @@ as `a` don't show up in the table. They are controlled by the rules for
<td> </td>
<td> </td>
<td> </td>
</tr>
</table>

## Input/Output
Expand Down
6 changes: 3 additions & 3 deletions integration_tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ individually, so you don't risk running unit tests along with the integration te
http://www.scalatest.org/user_guide/using_the_scalatest_shell

```shell
spark-shell --jars rapids-4-spark-tests_2.12-21.08.0-SNAPSHOT-tests.jar,rapids-4-spark-udf-examples_2.12-21.08.0-SNAPSHOT,rapids-4-spark-integration-tests_2.12-21.08.0-SNAPSHOT-tests.jar,scalatest_2.12-3.0.5.jar,scalactic_2.12-3.0.5.jar
spark-shell --jars rapids-4-spark-tests_2.12-21.08.0-tests.jar,rapids-4-spark-udf-examples_2.12-21.08.0,rapids-4-spark-integration-tests_2.12-21.08.0-tests.jar,scalatest_2.12-3.0.5.jar,scalactic_2.12-3.0.5.jar
```

First you import the `scalatest_shell` and tell the tests where they can find the test files you
Expand All @@ -157,7 +157,7 @@ If you just want to verify the SQL replacement is working you will need to add t
example assumes CUDA 11.0 is being used.

```
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-21.08.0-SNAPSHOT.jar,rapids-4-spark-udf-examples_2.12-21.08.0-SNAPSHOT.jar,cudf-21.08.0-SNAPSHOT-cuda11.jar" ./runtests.py
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-21.08.0.jar,rapids-4-spark-udf-examples_2.12-21.08.0.jar,cudf-21.08.0-cuda11.jar" ./runtests.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-21.08.0.jar,rapids-4-spark-udf-examples_2.12-21.08.0.jar,cudf-21.08.0-cuda11.jar" ./runtests.py
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-21.08.0.jar,rapids-4-spark-udf-examples_2.12-21.08.0.jar,cudf-21.08.2-cuda11.jar" ./runtests.py

```

You don't have to enable the plugin for this to work, the test framework will do that for you.
Expand Down Expand Up @@ -256,7 +256,7 @@ To run cudf_udf tests, need following configuration changes:
As an example, here is the `spark-submit` command with the cudf_udf parameter on CUDA 11.0:

```
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-21.08.0-SNAPSHOT.jar,rapids-4-spark-udf-examples_2.12-21.08.0-SNAPSHOT.jar,cudf-21.08.0-SNAPSHOT-cuda11.jar,rapids-4-spark-tests_2.12-21.08.0-SNAPSHOT.jar" --conf spark.rapids.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.concurrentPythonWorkers=2 --py-files "rapids-4-spark_2.12-21.08.0-SNAPSHOT.jar" --conf spark.executorEnv.PYTHONPATH="rapids-4-spark_2.12-21.08.0-SNAPSHOT.jar" ./runtests.py --cudf_udf
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-21.08.0.jar,rapids-4-spark-udf-examples_2.12-21.08.0.jar,cudf-21.08.0-cuda11.jar,rapids-4-spark-tests_2.12-21.08.0.jar" --conf spark.rapids.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.concurrentPythonWorkers=2 --py-files "rapids-4-spark_2.12-21.08.0.jar" --conf spark.executorEnv.PYTHONPATH="rapids-4-spark_2.12-21.08.0.jar" ./runtests.py --cudf_udf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-21.08.0.jar,rapids-4-spark-udf-examples_2.12-21.08.0.jar,cudf-21.08.0-cuda11.jar,rapids-4-spark-tests_2.12-21.08.0.jar" --conf spark.rapids.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.concurrentPythonWorkers=2 --py-files "rapids-4-spark_2.12-21.08.0.jar" --conf spark.executorEnv.PYTHONPATH="rapids-4-spark_2.12-21.08.0.jar" ./runtests.py --cudf_udf
$SPARK_HOME/bin/spark-submit --jars "rapids-4-spark_2.12-21.08.0.jar,rapids-4-spark-udf-examples_2.12-21.08.0.jar,cudf-21.08.2-cuda11.jar,rapids-4-spark-tests_2.12-21.08.0.jar" --conf spark.rapids.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.memory.gpu.allocFraction=0.3 --conf spark.rapids.python.concurrentPythonWorkers=2 --py-files "rapids-4-spark_2.12-21.08.0.jar" --conf spark.executorEnv.PYTHONPATH="rapids-4-spark_2.12-21.08.0.jar" ./runtests.py --cudf_udf

```

## Writing tests
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1280,7 +1280,7 @@ object RapidsConf {
|On startup use: `--conf [conf key]=[conf value]`. For example:
|
|```
|${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-21.08.0-SNAPSHOT.jar,cudf-21.08.0-SNAPSHOT-cuda11.jar' \
|${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-21.08.0.jar,cudf-21.08.0-cuda11.jar' \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-21.08.0.jar,cudf-21.08.0-cuda11.jar' \
|${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-21.08.0.jar,cudf-21.08.2-cuda11.jar' \

|--conf spark.plugins=com.nvidia.spark.SQLPlugin \
|--conf spark.rapids.sql.incompatibleOps.enabled=true
|```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1603,6 +1603,7 @@ object SupportedOpsDocs {
TypeEnum.values.foreach { _ =>
println(NotApplicable.htmlTag)
}
println("</tr>")
totalCount += 1
}
}
Expand Down