Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct 21.10 docs such as PCBS related FAQ [skip ci] #3815

Merged
merged 5 commits into from
Oct 15, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ nav_order: 11

### What versions of Apache Spark does the RAPIDS Accelerator for Apache Spark support?

The RAPIDS Accelerator for Apache Spark requires version 3.0.1, 3.0.2, 3.0.3, 3.1.1, or 3.1.2 of
The RAPIDS Accelerator for Apache Spark requires version 3.0.1, 3.0.2, 3.0.3, 3.1.1, 3.1.2 or 3.2.0 of
Apache Spark. Because the plugin replaces parts of the physical plan that Apache Spark considers to
be internal the code for those plans can change even between bug fix releases. As a part of our
process, we try to stay on top of these changes and release updates as quickly as possible.
Expand Down Expand Up @@ -287,13 +287,13 @@ AdaptiveSparkPlan isFinalPlan=false

### Are cache and persist supported?

Yes cache and persist are supported, but they are not GPU accelerated yet. We are working with
the Spark community on changes that would allow us to accelerate compression when caching data.
Yes cache and persist are supported, the cache is GPU accelerated but still stored on the host memory.
Please refer to [RAPIDS Cache Serializer](./additional-functionality/cache-serializer.md) for more details.
viadea marked this conversation as resolved.
Show resolved Hide resolved

### Can I cache data into GPU memory?

No, that is not currently supported. It would require much larger changes to Apache Spark to be able
to support this.
No, that is not currently supported.
It would require much larger changes to Apache Spark to be able to support this.

### Is PySpark supported?

Expand Down
13 changes: 5 additions & 8 deletions docs/additional-functionality/cache-serializer.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,21 +29,18 @@ nav_order: 2
`spark.sql.inMemoryColumnarStorage.enableVectorizedReader` will not be honored as the GPU
data is always read in as columnar. If `spark.rapids.sql.enabled` is set to false
the cached objects will still be compressed on the CPU as a part of the caching process.

Please note that ParquetCachedBatchSerializer doesn't support negative decimal scale, so if
`spark.sql.legacy.allowNegativeScaleOfDecimal` is set to true ParquetCachedBatchSerializer
should not be used. Using the serializer with negative decimal scales will generate
an error at runtime.

To use this serializer please run Spark with the following conf.

```
spark-shell --conf spark.sql.cache.serializer=com.nvidia.spark.ParquetCachedBatchSerializer"
spark-shell --conf spark.sql.cache.serializer=com.nvidia.spark.ParquetCachedBatchSerializer
```


## Supported Types

All types are supported on the CPU, on the GPU, ArrayType, MapType and BinaryType are not
supported. If an unsupported type is encountered the Rapids Accelerator for Apache Spark will fall
All types are supported on the CPU.
On the GPU, MapType and BinaryType are not supported.
If an unsupported type is encountered the Rapids Accelerator for Apache Spark will fall
back to using the CPU for caching.

41 changes: 41 additions & 0 deletions docs/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,47 @@ cuDF jar, that is either preinstalled in the Spark classpath on all nodes or sub
that uses the RAPIDS Accelerator For Apache Spark. See the [getting-started
guide](https://nvidia.github.io/spark-rapids/Getting-Started/) for more details.

## Release v21.10.0
Hardware Requirements:

The plugin is tested on the following architectures:

GPU Architecture: NVIDIA V100, T4 and A10/A30/A100 GPUs

Software Requirements:

OS: Ubuntu 18.04, Ubuntu 20.04 or CentOS 7, CentOS 8

CUDA & Nvidia Drivers*: 11.0-11.4 & v450.80.02+

Apache Spark 3.0.1, 3.0.2, 3.0.3, 3.1.1, 3.1.2, 3.2.0, Cloudera CDP 7.1.6, 7.1.7, Databricks 7.3 ML LTS or 8.2 ML Runtime, and GCP Dataproc 2.0

Apache Hadoop 2.10+ or 3.1.1+ (3.1.1 for nvidia-docker version 2)

Python 3.6+, Scala 2.12, Java 8

*Some hardware may have a minimum driver version greater than v450.80.02+. Check the GPU spec sheet
for your hardware's minimum driver version.

### Download v21.10.0
* Download the [RAPIDS
Accelerator for Apache Spark 21.10.0 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.10.0/rapids-4-spark_2.12-21.10.0.jar)
* Download the [RAPIDS cuDF 21.10.0 jar](https://repo1.maven.org/maven2/ai/rapids/cudf/21.10.0/cudf-21.10.0-cuda11.jar)

This package is built against CUDA 11.2 and has [CUDA forward
compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html) enabled. It is tested
on V100, T4, A30 and A100 GPUs with CUDA 11.0-11.4. For those using other types of GPUs which
do not have CUDA forward compatibility (for example, GeForce), CUDA 11.2 is required. Users will
need to ensure the minimum driver (450.80.02) and CUDA toolkit are installed on each Spark node.

### Release Notes
New functionality and performance improvements for this release include:
viadea marked this conversation as resolved.
Show resolved Hide resolved
* TODO1
* TODO2

viadea marked this conversation as resolved.
Show resolved Hide resolved
For a detailed list of changes, please refer to the
[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md).

## Release v21.08.0
Hardware Requirements:

Expand Down