Skip to content

Commit

Permalink
Correct 21.10 docs such as PCBS related FAQ [skip ci] (#3815)
Browse files Browse the repository at this point in the history
* Signed-off-by: Hao Zhu <hazhu@nvidia.com>

Correct some doc for 21.10

* Signed-off-by: Hao Zhu <hazhu@nvidia.com>

Add 21.10 release notes

* Signed-off-by: Hao Zhu <hazhu@nvidia.com>

Add more release notes for 21.10

* Update docs/download.md

Co-authored-by: Sameer Raheja <sameerz@users.noreply.github.com>

* Update docs/download.md

Co-authored-by: Nghia Truong <ttnghia@users.noreply.github.com>

Co-authored-by: Sameer Raheja <sameerz@users.noreply.github.com>
Co-authored-by: Nghia Truong <ttnghia@users.noreply.github.com>
  • Loading branch information
3 people authored Oct 15, 2021
1 parent 145a72c commit 1203e6a
Show file tree
Hide file tree
Showing 3 changed files with 70 additions and 13 deletions.
12 changes: 7 additions & 5 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ nav_order: 11

### What versions of Apache Spark does the RAPIDS Accelerator for Apache Spark support?

The RAPIDS Accelerator for Apache Spark requires version 3.0.1, 3.0.2, 3.0.3, 3.1.1, or 3.1.2 of
The RAPIDS Accelerator for Apache Spark requires version 3.0.1, 3.0.2, 3.0.3, 3.1.1, 3.1.2 or 3.2.0 of
Apache Spark. Because the plugin replaces parts of the physical plan that Apache Spark considers to
be internal the code for those plans can change even between bug fix releases. As a part of our
process, we try to stay on top of these changes and release updates as quickly as possible.
Expand Down Expand Up @@ -287,13 +287,15 @@ AdaptiveSparkPlan isFinalPlan=false

### Are cache and persist supported?

Yes cache and persist are supported, but they are not GPU accelerated yet. We are working with
the Spark community on changes that would allow us to accelerate compression when caching data.
Yes cache and persist are supported, the cache is GPU accelerated
but still stored on the host memory.
Please refer to [RAPIDS Cache Serializer](./additional-functionality/cache-serializer.md)
for more details.

### Can I cache data into GPU memory?

No, that is not currently supported. It would require much larger changes to Apache Spark to be able
to support this.
No, that is not currently supported.
It would require much larger changes to Apache Spark to be able to support this.

### Is PySpark supported?

Expand Down
13 changes: 5 additions & 8 deletions docs/additional-functionality/cache-serializer.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,21 +29,18 @@ nav_order: 2
`spark.sql.inMemoryColumnarStorage.enableVectorizedReader` will not be honored as the GPU
data is always read in as columnar. If `spark.rapids.sql.enabled` is set to false
the cached objects will still be compressed on the CPU as a part of the caching process.

Please note that ParquetCachedBatchSerializer doesn't support negative decimal scale, so if
`spark.sql.legacy.allowNegativeScaleOfDecimal` is set to true ParquetCachedBatchSerializer
should not be used. Using the serializer with negative decimal scales will generate
an error at runtime.

To use this serializer please run Spark with the following conf.

```
spark-shell --conf spark.sql.cache.serializer=com.nvidia.spark.ParquetCachedBatchSerializer"
spark-shell --conf spark.sql.cache.serializer=com.nvidia.spark.ParquetCachedBatchSerializer
```


## Supported Types

All types are supported on the CPU, on the GPU, ArrayType, MapType and BinaryType are not
supported. If an unsupported type is encountered the Rapids Accelerator for Apache Spark will fall
All types are supported on the CPU.
On the GPU, MapType and BinaryType are not supported.
If an unsupported type is encountered the Rapids Accelerator for Apache Spark will fall
back to using the CPU for caching.

58 changes: 58 additions & 0 deletions docs/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,64 @@ cuDF jar, that is either preinstalled in the Spark classpath on all nodes or sub
that uses the RAPIDS Accelerator For Apache Spark. See the [getting-started
guide](https://nvidia.github.io/spark-rapids/Getting-Started/) for more details.

## Release v21.10.0
Hardware Requirements:

The plugin is tested on the following architectures:

GPU Architecture: NVIDIA V100, T4 and A10/A30/A100 GPUs

Software Requirements:

OS: Ubuntu 18.04, Ubuntu 20.04 or CentOS 7, CentOS 8

CUDA & Nvidia Drivers*: 11.0-11.4 & v450.80.02+

Apache Spark 3.0.1, 3.0.2, 3.0.3, 3.1.1, 3.1.2, 3.2.0, Cloudera CDP 7.1.6, 7.1.7, Databricks 7.3 ML LTS or 8.2 ML Runtime, and GCP Dataproc 2.0

Apache Hadoop 2.10+ or 3.1.1+ (3.1.1 for nvidia-docker version 2)

Python 3.6+, Scala 2.12, Java 8

*Some hardware may have a minimum driver version greater than v450.80.02+. Check the GPU spec sheet
for your hardware's minimum driver version.

### Download v21.10.0
* Download the [RAPIDS
Accelerator for Apache Spark 21.10.0 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.10.0/rapids-4-spark_2.12-21.10.0.jar)
* Download the [RAPIDS cuDF 21.10.0 jar](https://repo1.maven.org/maven2/ai/rapids/cudf/21.10.0/cudf-21.10.0-cuda11.jar)

This package is built against CUDA 11.2 and has [CUDA forward
compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html) enabled. It is tested
on V100, T4, A30 and A100 GPUs with CUDA 11.0-11.4. For those using other types of GPUs which
do not have CUDA forward compatibility (for example, GeForce), CUDA 11.2 is required. Users will
need to ensure the minimum driver (450.80.02) and CUDA toolkit are installed on each Spark node.

### Release Notes
New functionality and performance improvements for this release include:
* Support collect_list and collect_set in group-by aggregation
* Support stddev, percentile_approx in group-by aggregation
* RunningWindow operations on map
* HashAggregate on struct and nested struct
* Sorting on nested structs
* Explode on map, array, struct
* Union-all on map, array and struct of maps
* Parquet writing of map
* ORC reader supports reading map/struct columns
* ORC reader support decimal64
* Spark Qualification Tool
* Add conjunction and disjunction filters
* Filtering specific configuration values
* Filtering user name
* Reporting nested data types
* Reporting write data formats
* Spark Profiling Tool
* Generating structured output format
* Improved profiling tool performance

For a detailed list of changes, please refer to the
[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md).

## Release v21.08.0
Hardware Requirements:

Expand Down

0 comments on commit 1203e6a

Please sign in to comment.