You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have seen big-time improvements with our B-Tree and BRIN index arrangements when switched over to Samsung PM1735 series of PCIe 4.0 x8 NVMe and we are now considering to either move everything over in RAID configuration, or rather simply use these as index/temporary/materialisation tablespace exclusively so as to avoid RAID capacity overhead:
Capacity: 6.4 TB
Sequential Read (128 KB): 8000 MB/s
Sequential Write (128 KB): 3800 MB/s
Random Read (4 KB): 1500K IOPS
Random Write (4 KB): 250K IOPS
As you can see, at 64.0 Gbit sequential read bandwidth it's significantly thicker than DRR4 ECC memory.
Perhaps this is making a statement on whole "NVMe revolution" business, however not yet going as far as to rival purpose-built optical NICs that they use in datacenter, NUMA hyperscale arrangements, etc. At any rate, this is how I discovered PG-Strom, by casually exploring options to further leverage NVMe storage in bandwidth-first scenarios. In the server rack that I'm working with at the moment, we also have AMD Instinct MI50 which is a datacenter-grade GPU that wouldn't otherwise be interesting had it not wielded 32GB of high-bandwidth HBM2 memory at 1024 Gbps. We have initially intended to use this card for Llama 2 inference but had eventually decided against it, as our IBM POWER9 system wasn't as good fit as a purpose-built x86 system with top of the line NVIDIA cards.
We were supposed to get rid of the GPU earlier this week.
However, when discovering your work in the area of peer-to-peer gpu<-->nvme access, I had decided to postpone it until further investigation. NVIDIA GPUDirect is the technology that has enabled this capability, as I understand. According to AMD's documentation, GPUDirect RDMA API is not supported in HIP, which is unfortunate considering that HIP provides a clear set of overall drop-in bindings to CUDA, but frankly was also something to be expected. So obviously the implementation should it be possible— wouldn't work as easily as simply rewriting the headers and substituting nvcc for hipcc. That said, I'm pretty sure I had seen p2p code in amdgpu kernel driver, & I would expect it to work because it supports multi-GPU configuration without NVLink equivalent, as-in directly over PCIe.
Do you have any idea how hard this is likely to be, if at all possible, in a server mainboard with fully-enabled hardware IOMMU? If this were to work, we could make a solid case for perhaps petascale Postgres, and delegate anywhere from 32-256GB of HBM2 memory for indexing & directly scanning over NVMe without ever consuming CPU time, or incurring unnecessary copies to RAM. I haven't done the measurements but I would expect various hash-join and point-polygoin intersect type of loads to work much better in hybrid HBM / NVMe tablespaces.
The idea excites me very much considering that MI50 remains the extra-affordable (950 USD) means to 32+GB worth of HBM2.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hey,
We have seen big-time improvements with our B-Tree and BRIN index arrangements when switched over to Samsung PM1735 series of PCIe 4.0 x8 NVMe and we are now considering to either move everything over in RAID configuration, or rather simply use these as index/temporary/materialisation tablespace exclusively so as to avoid RAID capacity overhead:
As you can see, at 64.0 Gbit sequential read bandwidth it's significantly thicker than DRR4 ECC memory.
Perhaps this is making a statement on whole "NVMe revolution" business, however not yet going as far as to rival purpose-built optical NICs that they use in datacenter, NUMA hyperscale arrangements, etc. At any rate, this is how I discovered PG-Strom, by casually exploring options to further leverage NVMe storage in bandwidth-first scenarios. In the server rack that I'm working with at the moment, we also have AMD Instinct MI50 which is a datacenter-grade GPU that wouldn't otherwise be interesting had it not wielded 32GB of high-bandwidth HBM2 memory at 1024 Gbps. We have initially intended to use this card for Llama 2 inference but had eventually decided against it, as our IBM POWER9 system wasn't as good fit as a purpose-built x86 system with top of the line NVIDIA cards.
We were supposed to get rid of the GPU earlier this week.
However, when discovering your work in the area of peer-to-peer gpu<-->nvme access, I had decided to postpone it until further investigation. NVIDIA GPUDirect is the technology that has enabled this capability, as I understand. According to AMD's documentation, GPUDirect RDMA API is not supported in HIP, which is unfortunate considering that HIP provides a clear set of overall drop-in bindings to CUDA, but frankly was also something to be expected. So obviously the implementation should it be possible— wouldn't work as easily as simply rewriting the headers and substituting nvcc for hipcc. That said, I'm pretty sure I had seen p2p code in amdgpu kernel driver, & I would expect it to work because it supports multi-GPU configuration without NVLink equivalent, as-in directly over PCIe.
Do you have any idea how hard this is likely to be, if at all possible, in a server mainboard with fully-enabled hardware IOMMU? If this were to work, we could make a solid case for perhaps petascale Postgres, and delegate anywhere from 32-256GB of HBM2 memory for indexing & directly scanning over NVMe without ever consuming CPU time, or incurring unnecessary copies to RAM. I haven't done the measurements but I would expect various hash-join and point-polygoin intersect type of loads to work much better in hybrid HBM / NVMe tablespaces.
The idea excites me very much considering that MI50 remains the extra-affordable (950 USD) means to 32+GB worth of HBM2.
Beta Was this translation helpful? Give feedback.
All reactions