Skip to content

Commit

Permalink
Add a link for Mellanox documentation on RoCE and a pointer to --with…
Browse files Browse the repository at this point in the history
…out-ucx for the MOFED installation script (#2745)

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
  • Loading branch information
abellina authored Jun 21, 2021
1 parent eedd181 commit 269e7a7
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions docs/additional-functionality/rapids-shuffle.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,16 @@ The minimum UCX requirement for the RAPIDS Shuffle Manager is
in machines that don't connect their GPUs and NICs to PCIe switches (i.e. directly to the
root-complex).

Other considerations:

- Please refer to [Mellanox documentation](
https://community.mellanox.com/s/article/recommended-network-configuration-examples-for-roce-deployment)
on how to configure RoCE networks (lossless/lossy, QoS, and more)

- We recommend that the `--without-ucx` option is passed when installing MLNX_OFED
(`mlnxofedinstall`). This is because the UCX included in MLNX_OFED does not have CUDA support,
and is likely older than what is available in the UCX repo (see Step 2 below).

If you encounter issues or poor performance, GPUDirectRDMA can be controlled via the
UCX environment variable `UCX_IB_GPU_DIRECT_RDMA=no`, but please
[file a GitHub issue](https://github.com/NVIDIA/spark-rapids/issues) so we can investigate
Expand Down

0 comments on commit 269e7a7

Please sign in to comment.