Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs for 21.08 release #3080

Merged
merged 12 commits into from
Aug 9, 2021
Merged
3 changes: 2 additions & 1 deletion docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,8 @@ Spark driver and executor logs with messages that are similar to the following:

### What is the right hardware setup to run GPU accelerated Spark?

Reference architectures should be available around Q1 2021.
GPU accelerated Spark can run on any NVIDIA Pascal or better GPU architecture, including Volta,
Turing or Ampere.

### What parts of Apache Spark are accelerated?

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ Optional:
You do not need to compile the jar yourself because you can download it from the Maven repository directly.

Here are 2 options:
1. Download the jar file from [Maven repository](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/21.06.0/)
1. Download the jar file from [Maven repository](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/21.08.0/)

2. Compile the jar from github repo
```bash
Expand Down
2 changes: 1 addition & 1 deletion docs/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The following is the list of options that `rapids-plugin-4-spark` supports.
On startup use: `--conf [conf key]=[conf value]`. For example:

```
${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-21.08.0-SNAPSHOT.jar,cudf-21.08.0-SNAPSHOT-cuda11.jar' \
${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-21.08.0.jar,cudf-21.08.0-cuda11.jar' \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-21.08.0.jar,cudf-21.08.0-cuda11.jar' \
${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-21.08.0.jar,cudf-21.08.2-cuda11.jar' \

--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.rapids.sql.incompatibleOps.enabled=true
```
Expand Down
2 changes: 1 addition & 1 deletion docs/demo/Databricks/generate-init-script-cuda11.ipynb
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-21.06.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.06.0/rapids-4-spark_2.12-21.06.0.jar\nsudo wget -O /databricks/jars/cudf-21.06.1-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/21.06.1/cudf-21.06.1-cuda11.jar\n\nsudo wget -O /etc/apt/preferences.d/cuda-repository-pin-600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin\nsudo wget -O ~/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb https://developer.download.nvidia.com/compute/cuda/11.0.3/local_installers/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb\nsudo dpkg -i ~/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb\nsudo apt-key add /var/cuda-repo-ubuntu1804-11-0-local/7fa2af80.pub\nsudo apt-get update\nsudo apt -y install cuda-toolkit-11-0\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0}
{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-21.08.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.08.0/rapids-4-spark_2.12-21.08.0.jar\nsudo wget -O /databricks/jars/cudf-21.08.0-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/21.08.0/cudf-21.08.0-cuda11.jar\n\nsudo wget -O /etc/apt/preferences.d/cuda-repository-pin-600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin\nsudo wget -O ~/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb https://developer.download.nvidia.com/compute/cuda/11.0.3/local_installers/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb\nsudo dpkg -i ~/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb\nsudo apt-key add /var/cuda-repo-ubuntu1804-11-0-local/7fa2af80.pub\nsudo apt-get update\nsudo apt -y install cuda-toolkit-11-0\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0}
2 changes: 1 addition & 1 deletion docs/demo/Databricks/generate-init-script.ipynb
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-21.06.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.06.0/rapids-4-spark_2.12-21.06.0.jar\nsudo wget -O /databricks/jars/cudf-21.06.1-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/21.06.1/cudf-21.06.1-cuda11.jar\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0}
{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-21.08.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.08.0/rapids-4-spark_2.12-21.08.0.jar\nsudo wget -O /databricks/jars/cudf-21.08.0-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/21.08.0/cudf-21.08.0-cuda11.jar\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0}
59 changes: 57 additions & 2 deletions docs/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,61 @@ cuDF jar, that is either preinstalled in the Spark classpath on all nodes or sub
that uses the RAPIDS Accelerator For Apache Spark. See the [getting-started
guide](https://nvidia.github.io/spark-rapids/Getting-Started/) for more details.

## Release v21.08.0
Hardware Requirements:

The plugin is tested on the following architectures:

GPU Architecture: NVIDIA V100, T4 and A10/A30/A100 GPUs

Software Requirements:

OS: Ubuntu 18.04, Ubuntu 20.04 or CentOS 7, CentOS 8

CUDA & Nvidia Drivers*: 11.0-11.4 & v450.80.02+

Apache Spark 3.0.1, 3.0.2, 3.0.3, 3.1.1, 3.1.2, Cloudera CDP 7.1.6, 7.1.7, Databricks 7.3 ML LTS or 8.2 ML Runtime, and GCP Dataproc 2.0

Apache Hadoop 2.10+ or 3.1.1+ (3.1.1 for nvidia-docker version 2)

Python 3.6+, Scala 2.12, Java 8

*Some hardware may have a minimum driver version greater than v450.80.02+. Check the GPU spec sheet
for your hardware's minimum driver version.

### Download v21.08.00
sameerz marked this conversation as resolved.
Show resolved Hide resolved
* Download the [RAPIDS
Accelerator for Apache Spark 21.08.0 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.08.0/rapids-4-spark_2.12-21.08.0.jar)
* Download the [RAPIDS cuDF 21.08.0 jar](https://repo1.maven.org/maven2/ai/rapids/cudf/21.08.0/cudf-21.08.0-cuda11.jar)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Download the [RAPIDS cuDF 21.08.0 jar](https://repo1.maven.org/maven2/ai/rapids/cudf/21.08.0/cudf-21.08.0-cuda11.jar)
* Download the [RAPIDS cuDF 21.08.2 jar](https://repo1.maven.org/maven2/ai/rapids/cudf/21.08.2/cudf-21.08.2-cuda11.jar)


This package is built against CUDA 11.2 and has [CUDA forward
compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html) enabled. It is tested
on V100, T4, A30 and A100 GPUs with CUDA 11.0-11.4. For those using other types of GPUs which
do not have CUDA forward compatibility (for example, GeForce), CUDA 11.2 is required. Users will
need to ensure the minimum driver (450.80.02) and CUDA toolkit are installed on each Spark node.

### Release Notes
New functionality and performance improvements for this release include:
* Handling data sets that spill out of GPU memory for group by and windowing operations
* Running window rank and dense rank operations on the GPU
* Support for the `LEGACY` timestamp
* Unioning of nested structs
* Adoption of UCX 1.11 for improved error handling for RAPIDS Spark Accelerated Shuffle
* Ability to read cached data from the GPU on the supported Databricks runtimes
* Enabling Parquet writing of array data types from the GPU
* Optimized reads for small files for ORC
* Spark Qualification and Profiling Tools
* Additional filtering capabilities
* Reporting on data types
* Reporting on read data formats
* Abillity to run the qualification tool on Spark 2.x logs
* Ability to run the tool on Apache Spark 3.x, AWS EMR 6.3.0, Dataproc 2.0, Microsoft Azure, and
Databricks 7.3 and 8.2 logs
* Improved qualification tool performance

For a detailed list of changes, please refer to the
[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md).
sameerz marked this conversation as resolved.
Show resolved Hide resolved

## Release v21.06.0
Starting with release 21.06.0, the project is moving to calendar versioning, with the first two
digits representing the year, the second two digits representing the month, and the last digit
Expand All @@ -35,7 +90,7 @@ Software Requirements:

CUDA & Nvidia Drivers*: 11.0 or 11.2 & v450.80.02+

Apache Spark 3.0.1, 3.0.2, 3.1.1, 3.1.2, Cloudera CDP 7.1.7, Databricks 7.3 ML LTS or 8.2 ML Runtime, and GCP Dataproc 2.0
Apache Spark 3.0.1, 3.0.2, 3.1.1, 3.1.2, Cloudera CDP 7.1.6, 7.1.7, Databricks 7.3 ML LTS or 8.2 ML Runtime, and GCP Dataproc 2.0

Apache Hadoop 2.10+ or 3.1.1+ (3.1.1 for nvidia-docker version 2)

Expand All @@ -57,7 +112,7 @@ need to ensure the minimum driver (450.80.02) and CUDA toolkit are installed on

### Release Notes
New functionality for this release includes:
* Support for running on Cloudera CDP 7.1.7 and Databricks 8.2 ML
* Support for running on Cloudera CDP 7.1.6, CDP 7.1.7 and Databricks 8.2 ML
* New functionality related to arrays:
* Concatenation of array columns
* Casting arrays of floats to arrays of doubles
Expand Down
4 changes: 2 additions & 2 deletions docs/get-started/Dockerfile.cuda
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,8 @@ COPY spark-3.0.2-bin-hadoop3.2/examples /opt/spark/examples
COPY spark-3.0.2-bin-hadoop3.2/kubernetes/tests /opt/spark/tests
COPY spark-3.0.2-bin-hadoop3.2/data /opt/spark/data

COPY cudf-21.08.0-SNAPSHOT-cuda11.jar /opt/sparkRapidsPlugin
COPY rapids-4-spark_2.12-21.08.0-SNAPSHOT.jar /opt/sparkRapidsPlugin
COPY cudf-21.08.0-cuda11.jar /opt/sparkRapidsPlugin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
COPY cudf-21.08.0-cuda11.jar /opt/sparkRapidsPlugin
COPY cudf-21.08.2-cuda11.jar /opt/sparkRapidsPlugin

COPY rapids-4-spark_2.12-21.08.0.jar /opt/sparkRapidsPlugin
COPY getGpusResources.sh /opt/sparkRapidsPlugin

RUN mkdir /opt/spark/python
Expand Down
6 changes: 3 additions & 3 deletions docs/get-started/getting-started-databricks.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,8 @@ cluster.
version:
- [Databricks 7.3 LTS
ML](https://docs.databricks.com/release-notes/runtime/7.3ml.html#system-environment) runs CUDA 10.1
Update 2. Users wishing to try 21.06 on Databricks 7.3 LTS ML will need to install the CUDA
11.0 toolkit on the cluster. This can be done with the [generate-init-script-cuda11.ipynb
Update 2. Users wishing to try 21.06 or higher on Databricks 7.3 LTS ML will need to install the
CUDA 11.0 toolkit on the cluster. This can be done with the [generate-init-script-cuda11.ipynb
](../demo/Databricks/generate-init-script-cuda11.ipynb) init script, which installs both the RAPIDS
Spark plugin and the CUDA 11 toolkit.
- [Databricks 8.2
Expand Down Expand Up @@ -110,7 +110,7 @@ Spark plugin and the CUDA 11 toolkit.
```bash
spark.rapids.sql.python.gpu.enabled true
spark.python.daemon.module rapids.daemon_databricks
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-21.06.0.jar:/databricks/spark/python
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-21.08.0.jar:/databricks/spark/python
```

7. Once you’ve added the Spark config, click “Confirm and Restart”.
Expand Down
3 changes: 1 addition & 2 deletions docs/get-started/getting-started-gcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,8 +131,7 @@ submitted as a Dataproc job. The mortgage examples we use above are also availa
application](https://github.com/NVIDIA/spark-xgboost-examples/tree/spark-3/examples/apps/scala).
After [building the jar
files](https://github.com/NVIDIA/spark-xgboost-examples/blob/spark-3/getting-started-guides/building-sample-apps/scala.md)
they are available through maven `mvn package -Dcuda.classifier=cuda11-0`. In the 21.06 release,
CUDA 11.0/11.2 will be supported.
they are available through maven `mvn package -Dcuda.classifier=cuda11-0`.

Place the jar file `sample_xgboost_apps-0.2.2.jar` under the `gs://$GCS_BUCKET/scala/` folder by
running `gsutil cp target/sample_xgboost_apps-0.2.2.jar gs://$GCS_BUCKET/scala/`. To do this you
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1280,7 +1280,7 @@ object RapidsConf {
|On startup use: `--conf [conf key]=[conf value]`. For example:
|
|```
|${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-21.08.0-SNAPSHOT.jar,cudf-21.08.0-SNAPSHOT-cuda11.jar' \
|${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-21.08.0.jar,cudf-21.08.0-cuda11.jar' \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-21.08.0.jar,cudf-21.08.0-cuda11.jar' \
|${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-21.08.0.jar,cudf-21.08.2-cuda11.jar' \

|--conf spark.plugins=com.nvidia.spark.SQLPlugin \
|--conf spark.rapids.sql.incompatibleOps.enabled=true
|```
Expand Down