- This optimization is based on MXNet 0.10
- This optimization is done on Linux Centos Sugon cluster(10 nodes).
- This optimization requires infiniband card & lmlx4 library
ps-lite-rdma-final/
source code is different fromreport/final-submit/
source code. But both of them can run smmothly.- The differences between these two source code files are:
ps-lite-rdma-final
is completely written by Lin Zhiqi (the owner of this repository), thefinal-submit
source code is written by Song Xiaoniu.- The major difference between this two source code is the basic model of RDMA QP and CQ
ps-lite-rdma-final
uses 1 shared send cq (not srq!) on all QPs. Each QP has its own recv cq.final-submit
use the RDMA model that each QP has its own send cq and recv cq- Features ps-lite-rdma-final has but
final-sbumit
doesn't have:- Parallel memcpy (by unlocking early locks of rdma send operation)
- multi-post-recv-request(repeatly post multi recv request at end of connection setup, thus can provide higher performance when facing with n workers - 1 server)
- These two codes have similar performance. But due to
final-submit
has more sample tests results, so we finally use this version to submmit the final-report.