-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the BaM bandwidth is stopped to increase when the number of NVMe is more than 7 #17
Comments
Wow. There are two awesome news here. We have not tested hopper generation or gen5 CPU yet and we are super excited to see benchmark and bring up working out of the box. Thanks for giving this first awesome news! We are delighted to see linear scaling upto 5 SSDs. Agreed it is lower and not scaling but this is first gen5 platform results we are aware of. So thanks for second awesome news. Anyway, we faced similar trends when we moved from gen3 to gen4 and we want to help you out debug this issue better. We likely will not get access to gen5 platform immediately and hence can we schedule call to discuss what can be done (I believe you know my email address.)? We have bunch of theories and only way to determine what may be going wrong is validating each of them. Iommu definitely is one of the culprit here but we require to understand the pcie topology and capabilities of the gen5 root complex. Previously we had faced issues where CPU was wrongly configured to handle such high throughput and we need to understand if that is not the case. There is a bit of debug for gen5 platform to be done and we want to help here! Lastly, can you try the following-
I'm curious to see if latency is an issue here. |
Hi @msharmavikram ,
Much appreciated that you lend me a hand about this issue, yes I know your email address and I'm very happy to schedule a call when more info is available and clear.
I didn't enable IOMMU in my host, there's nothing output when I run command "cat /proc/cmdline | grep iommu". And I also attached the pcie topo which is collected by running command "lspci -tv" and "lspci -vv", please refer to "lspci -tv" and "lspci -vv" .
please refer to the output below:
|
Will look forward to your email. Meanwhile can you try one more command and increase number of SSDs from 1 to 8 (below one is for 8 SSDs)
|
Hi @msharmavikram , Here's the log from 1 to 8, the result is simiar as what I summaried before. https://raw.githubusercontent.com/LiangZhou9527/some_stuff/main/1-8.log Please note, this line "in Controller::Controller, path = /dev/libnvm0" is for debug only, it will not impact the performance result. |
I believe this is Intel SSDs. At least that's how it looks like. What are the max iopa for 4kb and 512B accesses ? The issue seems to be from the iommu/pcie switch or CPU. We want to determine if the issue is bandwidth or iops. Let's try 1 to 8 SSDs configuration with page_size=512 instead of 4kb. Let's see what it shows. (Reach out in email as we might require additional support from vendors here - broadcom, Intel. ). |
Hi there,
I'm doing benchmark testing on my machine which is configured with some H800 GPUs and 8 NVMe storages dedicated for the BaM.
The GPU is configured with PCIe5 x16 and the NVMe storage is configured with PCIe4 x4, which means in theory the max bandwidth of GPU is around 60 GBps and the max bandwidth of single NVMe storage is around 7.5 GBps.
But according to my testing using "nvm-block-bench", the result is not as expected. I summary thge result here: https://raw.githubusercontent.com/LiangZhou9527/some_stuff/8b48038465858846f864e43cef6d0e6df787a2c2/BaM%20bandwidth%20and%20the%20number%20of%20NVMe.png
In the pciture we can see that the bandwidth with 6 NVMe and 7 NVMe is almost the same, but when the number of NVMe reaches 8, the bandwitdh is dropped a lot.
Any thoughts about what happens here?
BTW, I didn't enable IOMMU on my machine, and the benchmark testing cmdline is as below (I executed the command 8 times, each time with different --n_ctrls value, say, 1, 2 ... 8)
./bin/nvm-block-bench --threads=262144 --blk_size=64 --reqs=1 --pages=262144 --queue_depth=1024 --page_size=4096 --num_blks=2097152 --gpu=0 --num_queues=128 --random=true -S 1 --n_ctrls=1
The text was updated successfully, but these errors were encountered: