Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/efa: differentiate unresponsive receiver errors following rdma-core #10410

Merged
merged 1 commit into from
Sep 26, 2024

Conversation

jiaxiyan
Copy link
Contributor

Add a new vendor error code EFA_IO_COMP_STATUS_LOCAL_ERROR_UNREACH_REMOTE from rdma core to indicate the remote is unreachable.
Add a new EFA provider error code UNESTABLISHED_RECV_UNRESP to distinguish unresponsive receiver error when the peer is reachable by the EFA device but libfabric failed to complete a handshake.
Add unit test for EFA_IO_COMP_STATUS_LOCAL_ERROR_UNREACH_REMOTE.

@jiaxiyan jiaxiyan requested a review from a team September 24, 2024 18:49
Copy link
Contributor

@shijin-aws shijin-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only some nits

prov/efa/src/efa_strerror.c Show resolved Hide resolved
prov/efa/src/rdm/efa_rdm_cq.c Outdated Show resolved Hide resolved
@shijin-aws
Copy link
Contributor

bot:aws:retest

Add a new vendor error code EFA_IO_COMP_STATUS_LOCAL_ERROR_UNREACH_REMOTE
from rdma core to indicate the remote is unreachable.
Add a new EFA provider error code UNESTABLISHED_RECV_UNRESP to distinguish
unresponsive receiver error when the peer is reachable by the EFA device
but libfabric failed to complete a handshake.
Add unit test for EFA_IO_COMP_STATUS_LOCAL_ERROR_UNREACH_REMOTE.

Signed-off-by: Jessie Yang <jiaxiyan@amazon.com>
@shijin-aws shijin-aws merged commit 5573b3f into ofiwg:main Sep 26, 2024
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants