diff --git a/docs/cudf/source/cudf_pandas/faq.md b/docs/cudf/source/cudf_pandas/faq.md index 34b657488c1..5dd33f45216 100644 --- a/docs/cudf/source/cudf_pandas/faq.md +++ b/docs/cudf/source/cudf_pandas/faq.md @@ -194,3 +194,14 @@ for testing or benchmarking purposes. To do so, set the ```bash CUDF_PANDAS_FALLBACK_MODE=1 python -m cudf.pandas some_script.py ``` + +## What is the recommended range of dataset size that I can process using `cudf.pandas`? + +`cudf.pandas` can process a wide range of dataset size. As a +_very rough_ rule of thumb,`cudf.pandas` shines on workflows with more than +10,000 - 100,000 rows of data, depending on the algorithms, data types, and other factors. +Below this range, workflows might execute slower on GPU than CPU because of the +cost of data transfers. With [managed memory pool and managed memory prefetching enabled in cudf +by default](how-it-works.md), you can process datasets larger than GPU memory and up to a theoretical +limit of the combined CPU and GPU memory size. However, note that the +best performance with large data sizes can be data and workflow dependent.