You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Hi,
I am trying to run nvm-block-bench on ASUS ESC8000-E11 that has V100 GPU and 6x Intel NVMe SSDs. The GPU is configured with PCIe5 x16.
To Reproduce
In fact, I'd like to reproduce this test: #17 .
Expected behavior
In my expectation, I should get the same linear scaling of bandwidth as he did.
However, my bandwidth cap seems to be limited to ~10GB/s.
Here are some results:
I could get ~5GB/s results when reading only one ssd:
run: ./bin/nvm-blockbench --threads=$((1024*1024*4)) --blk_size=64 --reqs=1 --pages=$((1024*1024*4)) --queue_depth=1024 --page_size=4096 --num_blks=2097152 --gpu=4 --n_ctrls=1 --num_queues=128 --random=true --access_type=0
result:
Elapsed Time: 3.10162e+06 Number of Ops: 4194304 Data Size (bytes): 17179869184
Ops/sec: 1.3523e+06 Effective Bandwidth(GB/S): 5.1586
2.When we increase to 2 SSDs, we only get ~8GB/s
run: ./bin/nvm-blockbench --threads=$((1024*1024*4)) --blk_size=64 --reqs=1 --pages=$((1024*1024*4)) --queue_depth=1024 --page_size=4096 --num_blks=2097152 --gpu=4 --n_ctrls=2 --num_queues=128 --random=true --access_type=0
result:
Elapsed Time: 1.97415e+06 Number of Ops: 4194304 Data Size (bytes): 17179869184
Ops/sec: 2.12461e+06 Effective Bandwidth(GB/S): 8.10473
Elapsed Time: 1.54689e+06 Number of Ops: 4194304 Data Size (bytes): 17179869184
Ops/sec: 2.71144e+06 Effective Bandwidth(GB/S): 10.3433
4.Unfortunately, increasing the number of SSDS didn't work, and the bandwidth seemed to be limited.
run: ./bin/nvm-blockbench --threads=$((1024*1024*4)) --blk_size=64 --reqs=1 --pages=$((1024*1024*4)) --queue_depth=1024 --page_size=4096 --num_blks=2097152 --gpu=4 --n_ctrls=6 --num_queues=128 --random=true --access_type=0
result:
Elapsed Time: 1.54337e+06 Number of Ops: 4194304 Data Size (bytes): 17179869184
Ops/sec: 2.71762e+06 Effective Bandwidth(GB/S): 10.3669
I tried to change page_size, req, threads, etc., but the bandwidth was only ~10GB/s.
I tried to troubleshoot the problem from the ssd perspective, using the fio tool to read data from multiple SSDs to the CPU at the same time, and the bandwidth was up to ~30GB/s:
Do you have any ideas or solutions for this result? Thanks.
Machine Setup (please complete the following information):
OS: Ubuntu 20.04.6, Kernel 5.4.0-99-generic
NVIDIA Driver: 545.23.08, CUDA Versions: 12.3, GPU name: NVIDIA V100-PCIE-32GB
SSD used: Intel SSD D7-P5520 SERIES
The text was updated successfully, but these errors were encountered:
I think the answer is in your own question. V100 is gen3 system. The max GPU ingress bandwidth in gen3 is about 12GBps. Expecting 30GBps on gen3 spec is unreasonable and unrealistic. In the CPU case you are observing 30GBps because you are consuming more than 8 PCIe slots each capable of 12GBps of ingress bandwidth!
The interesting question here really is why the scaling to two ssd does not hit 10GBps and gets capped at 8GBps. This i strongly suspect due to limitations in root complex of ASUS ESC8000-E11 CPU socket.
Lastly, #17 issue completely different as it in gen5 system (2 generation ahead) and there is no relationship between the two!
Thanks for your answer, it looks like I made a very basic mistake.
I only considered the characteristics of the server slot and ignored the GPU itself, thanks again!
Describe the bug
Hi,
I am trying to run nvm-block-bench on ASUS ESC8000-E11 that has V100 GPU and 6x Intel NVMe SSDs. The GPU is configured with PCIe5 x16.
To Reproduce
In fact, I'd like to reproduce this test: #17 .
Expected behavior
In my expectation, I should get the same linear scaling of bandwidth as he did.
However, my bandwidth cap seems to be limited to ~10GB/s.
Here are some results:
run:
./bin/nvm-blockbench --threads=$((1024*1024*4)) --blk_size=64 --reqs=1 --pages=$((1024*1024*4)) --queue_depth=1024 --page_size=4096 --num_blks=2097152 --gpu=4 --n_ctrls=1 --num_queues=128 --random=true --access_type=0
result:
2.When we increase to 2 SSDs, we only get ~8GB/s
run:
./bin/nvm-blockbench --threads=$((1024*1024*4)) --blk_size=64 --reqs=1 --pages=$((1024*1024*4)) --queue_depth=1024 --page_size=4096 --num_blks=2097152 --gpu=4 --n_ctrls=2 --num_queues=128 --random=true --access_type=0
result:
3.Next increase to 4 SSDs, ~10GB/s
run:
./bin/nvm-blockbench --threads=$((1024*1024*4)) --blk_size=64 --reqs=1 --pages=$((1024*1024*4)) --queue_depth=1024 --page_size=4096 --num_blks=2097152 --gpu=4 --n_ctrls=4 --num_queues=128 --random=true --access_type=0
result:
4.Unfortunately, increasing the number of SSDS didn't work, and the bandwidth seemed to be limited.
run:
./bin/nvm-blockbench --threads=$((1024*1024*4)) --blk_size=64 --reqs=1 --pages=$((1024*1024*4)) --queue_depth=1024 --page_size=4096 --num_blks=2097152 --gpu=4 --n_ctrls=6 --num_queues=128 --random=true --access_type=0
result:
I tried to change page_size, req, threads, etc., but the bandwidth was only ~10GB/s.
I tried to troubleshoot the problem from the ssd perspective, using the fio tool to read data from multiple SSDs to the CPU at the same time, and the bandwidth was up to ~30GB/s:
Do you have any ideas or solutions for this result? Thanks.
Machine Setup (please complete the following information):
OS: Ubuntu 20.04.6, Kernel 5.4.0-99-generic
NVIDIA Driver: 545.23.08, CUDA Versions: 12.3, GPU name: NVIDIA V100-PCIE-32GB
SSD used: Intel SSD D7-P5520 SERIES
The text was updated successfully, but these errors were encountered: