Skip to content

Releases: NVIDIA/spark-rapids-ml

v24.08.0 release

19 Sep 07:54
7f8e779
Compare
Choose a tag to compare

Release notes:

  • Removed MAXINT limit on number of non-zero inputs per GPU for sparse logistic regression.
  • IVF-PQ and Cagra were added to the suite of supported approximate nearest neighbor algorithms.
  • Extended benchmarking scripts to be compatible with Databricks runtime 13.3 with the spark-rapids plugin and 14.3 and 15.4 without the plugin.
  • Included an experimental CLI for no-import-statement-change acceleration of pyspark.ml applications.
  • Fixed a slow down for inputs having a large number of columns when type conversion is required.
  • Updated RAPIDS dependencies to 24.08.
  • Known issues to be fixed in next release:
    • for sparse logistic regression fit a low-level C++/CUDA exception is raised if a partition has no non-zero data.
    • array type inputs with int dtypes are not converted to float leading to errors in some algorithms (e.g. cagra ann)
    • in ivf-pq based Cagra the intermediate graph degree must <= 128 or a low-level C++ exception is raised
    • test_sparse_int64 test requires 256GB host memory to run and not 128GB stated in the comments

pip package available at https://pypi.org/project/spark-rapids-ml/24.08.0/

v24.06.0 release

22 Jul 01:33
c7becc2
Compare
Choose a tag to compare

Release notes:

  • Double precision support for GPU accelerated logistic regression.
  • Added GPU accelerated IVF-Flat Approximate Nearest Neighbor (ANN) to benchmarking scripts.
  • Improved throughput of GPU accelerated IVF-Flat ANN for large data sets.
  • Update of RAPIDS dependencies to 24.06.

NOTE: For a large number of feature/input columns in float64 type, please use VectorUDT or array type (as opposed to multiple scalar columns) for all algorithms due to a performance issue. This will be resolved in our 24.08 release.

pip package available at https://pypi.org/project/spark-rapids-ml/24.06.0/

v24.04.0 release

16 May 04:05
df01b39
Compare
Choose a tag to compare

Release notes:

  • Feature standardization in logistic regression for sparse vectors.
  • GPU accelerated Density Based Spatial Clustering for Applications with Noise (DBSCAN) algorithm with example notebook.
  • GPU accelerated IVF-Flat Approximate Nearest Neighbor algorithm with example notebook
  • Stage level scheduling support for Yarn and K8s.
  • Update of RAPIDS dependencies to 24.04.

pip package available at https://pypi.org/project/spark-rapids-ml/24.04.0/

v24.02.0 release

21 Mar 23:45
e0f644d
Compare
Choose a tag to compare

Release notes:

  • Support feature standardization in logistic regression for dense vectors.
  • Add large scale synthetic sparse data generation for logistic regression testing.
  • Fix tol=0 in KMeans
  • Add sparse vectors to logistic regression notebook example.
  • Update RAPIDS dependencies to 24.02.
  • Known Issue: RandomForest training will throw an exception if the label column takes on only a single value. This will be fixed in 24.04.

pip package available at https://pypi.org/project/spark-rapids-ml/24.02.0/

v23.12.0 release

17 Jan 06:10
e8d138b
Compare
Choose a tag to compare

Release notes:

  • Match Spark's logistic regression fit behavior when data set has only one label value.
  • Support sparse vector based computations through cuML layer in logistic regression fit, transform, and cross validation.
  • Update dataproc benchmark script.
  • Update Azure Databricks instructions.
  • Update RAPIDS dependencies to 23.12.

pip package available at https://pypi.org/project/spark-rapids-ml/23.12.0/

v23.10.0 release

16 Nov 04:16
5f77d4b
Compare
Choose a tag to compare

Release Notes:

  • L1 and elastic net regularization for GPU accelerated distributed LogisticRegression, with notebook example.
  • More than 2 classes for GPU accelerated distributed LogisticRegression, with notebook example.
  • Optimized fitMultiple api for LogisticRegression.
  • Accelerated cross validation for LogisticRegression and log loss.
  • Output raw prediction column for logistic regression.
  • Updated Databricks init scripts and benchmarking scripts.
  • Improved api docs.
  • Updated RAPIDS dependencies to 23.10.

NOTE: While the runtime is compatible with Spark versions >= 3.3, some scripts in python/tests/ are not compatible with Spark 3.3. This is addressed in 23.12

pip package available at https://pypi.org/project/spark-rapids-ml/23.10.0/

v23.08.0 release

13 Sep 05:48
5dab107
Compare
Choose a tag to compare

Release Notes:

  • GPU accelerated distributed Logistic Regression with L2 regularization fit and transform, along with benchmarking and Jupyter notebook examples.
  • GPU accelerated distributed Uniform Manifold Approximation and Projection (UMAP) fit and transform for non-linear dimensionality reduction along with benchmarking and Jupyter notebook examples.
  • Stage level scheduling for training on stand-alone clusters.
  • Improved logging.
  • Preserve input column types during transform.
  • Default to float32 inputs to cuML layer.
  • Support conversion of GPU Logistic Regression models to pySpark ML CPU.
  • Improved local benchmarking script.
  • Updated RAPIDS and RAPIDS Accelerator for Spark dependencies to 23.08.

pip package available at https://pypi.org/project/spark-rapids-ml/23.8.0/

v23.06.0 release

13 Jul 07:25
04dffdf
Compare
Choose a tag to compare

Release Notes:

  • GPU accelerated CrossValidator for RandomForestClassifier, RandomForestRegressor and LinearRegression, with example notebook
  • Support for CUDA unified virtual memory to allow over-subscription of GPU memory
  • Benchmarking scripts and instructions for AWS EMR
  • Distributed synthetic data generation
  • RandomForest example notebooks
  • Support Spark ML parameters in constructors
  • Improved API docs
  • Updated RAPIDS dependencies to 23.06

pip package available at https://pypi.org/project/spark-rapids-ml/23.6.0/

v23.04.0 release

03 May 19:03
b251734
Compare
Choose a tag to compare

This release includes:

  • Getting started guide and benchmarking scripts on GCP dataproc
  • Getting started guide on AWS EMR
  • cpu method to convert Spark RAPIDS ML generated models to Spark ML models
  • Eliminating the need for CUDA on the driver node
  • Example notebook for k-NN
  • Spark 3.4 compatibility
  • Updating RAPIDS dependencies to 23.04

pip package available at https://pypi.org/project/spark-rapids-ml/23.4.0/

v23.02.0 release

03 Apr 01:09
ab575bc
Compare
Choose a tag to compare

Added GPU-accelerated PySpark-compatible APIs for the following algorithms:

  • K-Means
  • k-NN
  • LinearRegression
  • PCA
  • RandomForestClassifier
  • RandomForestRegressor

Pip package: https://pypi.org/project/spark-rapids-ml/