-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prov/util: Revert ofi_genlock changes completely #9442
prov/util: Revert ofi_genlock changes completely #9442
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced the genlock implementation is the problem here. The biggest thing I see in this set of changes is that you change the ep lock to a mutex instead of optimizing for the no-op. I could definitely see that causing a regression. I think the ep_list_locks are the only ones that need to be a mutex (until we revisit the connection management issue Sean brought up). Perhaps that change got added by mistake in the ep_list_lock patch.
Not related to the code itself but when you revert a commit, remember to:
- Sign all your revert commits so the DCO check passes
- Add a commit message to each one explaining why you are reverting
ret = ofi_genlock_init(&ep->lock, | ||
ep->domain->threading != FI_THREAD_SAFE ? | ||
OFI_LOCK_NOOP : OFI_LOCK_MUTEX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe misreading this (could have been an issue with the original commit) but the commit I think was supposed to target the ep_list_locks, this is the ep->lock itself which I think is a separate issue. I agree with changing the ep lock to a noop/mutex check but I don't think it belongs in this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is clearer if you look at the files changed tab instead of commit by commit.
prov/util/src/util_av.c
Outdated
ofi_genlock_init(&av->ep_list_lock, | ||
domain->threading == FI_THREAD_DOMAIN ? | ||
OFI_LOCK_NOOP : OFI_LOCK_MUTEX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original commit says it was fixing the API per Sean's concern about the ep_list_lock needing to be thread safe regardless of the threading level. I don't think reverting that fix in the av and cq is the proper answer here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is clearer if you look at the files changed tab instead of commit by commit.
@@ -52,15 +52,15 @@ static const char *rxd_cq_strerror(struct fid_cq *cq_fid, int prov_errno, | |||
|
|||
cq = container_of(cq_fid, struct rxd_cq, util_cq.cq_fid); | |||
|
|||
ofi_genlock_lock(&cq->util_cq.ep_list_lock); | |||
ofi_mutex_lock(&cq->util_cq.ep_list_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The genlock vs mutex implementation is exactly the same. Switching back to the mutex lock shouldn't impact performance. I'm not convinced this is the source of the regression you're seeing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I question that the test results are reproducible enough to determine a 20ns difference. The mutex calls go direct to pthread calls, the genlock calls access those indirectly via a function pointer. The point of genlock is that it allows disabling the lock at runtime, which isn't possible calling mutex directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reproduce the testing on your end. In the slack channel, I linked my run command for the reproducer (and it was also done on an intel Skylake instance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my test results:
Bisect 2 84a6bde5e -> main(for SHM)
84a6bde5e: .46 99f87709c:.46 <Seths OFI genlock changes in here> 318b3f49f: .45 55f7fd1ad: .45 6b62f34ea:.45 90baec9d3: .45 803af8801: .45 64b31bea3: .48 967f91458:.48 52a4364d9: .48 422241636:.48 main: .48
And running with this branch on main brings us from .48 to .46.
@@ -60,7 +60,7 @@ static int rxd_cntr_wait(struct fid_cntr *cntr_fid, uint64_t threshold, int time | |||
return -FI_ETIMEDOUT; | |||
|
|||
ep_retry = -1; | |||
ofi_genlock_lock(&cntr->ep_list_lock); | |||
ofi_mutex_lock(&cntr->ep_list_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here - they are the same thing, I don't think this should affect performance. We should take a closer look at it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are not the same thing, and they do impact performance.
Before this patch set, main was always using ofi_mutex locks (perf .46us). The first patch changed a bunch of ofi_mutex to be ofi_genlock, and decided on whether or not to use mutex locking based on domain lock level (perf .45us ... yay less locking makes us faster). The second patch set switched the ofi_genlock to always default to an ofi_mutex lock b/c @shefty realized we were violating a threading case for msg endpoints in the API (perf .48us... bad). This PR will blow both patch set 1, and patch set 2 away. This will make us default to ofi_mutex, and will bring us from current 0.48us to 0.46us b/c ofi_genlock hard coded to ofi_mutex adds a non negligible overhead to ofi_mutex directly. |
I do not like this patch set, although I think it is necessary. I think ofi_genlocks are better and should be used. I noticed that 1.19.x SHM is faster than main SHM, and this patch set is the fastest/easiest way to fix part of that problem. I would like to be able to use ofi_genlock->ofi_mutex without a performance hit, and I think finding the root cause of the issue would be great. |
@a-szegel What I'm trying to say is I think that there was an issue in the original commit that is misleading you to what the actual issue is. In the commit where you change the ep_list_locks to always be a mutex, you also forced the ep lock to always be a mutex as well. The ep_list_lock is the only one where it "has" to be a mutex because of the control/bind issue that could be problematic. The util ep can still have the no-op optimization.
|
@shefty told me that I needed to revert everything in #9377. Before any of my changes, it was always a mutex lock, and after my changes, it is an ofi_genlock that is always a mutex lock with a regression. Yes making the mutex lock "not a mutex" fixes the regression, but it also fixes the regression by having it always be an ofi_mutex lock. This points to an issue in genlock. |
@a-szegel I think I'm lost with your logic. For me the regression is not fixed by changing the genlock mutex to using a mutex directly. And before your changes, the util_ep lock was not always a mutex, it was optimized for the threading model and your changes changed it to always a mutex. Your revert of the initial set of changes was incorrect and introduced a new change in the util_ep. |
Completely reverting the ofi_genlock changes from main saves us 20ns in fi_rdm_tagged_pingpong for the SHM provider.
instance: aws c5n.18xlarge
os: al2
client: fi_rdm_tagged_pingpong --pin-core 1 -S 1 -s 127.0.0.1 -p shm 127.0.0.1
server: fi_rdm_tagged_pingpong --pin-core 0 -S 1 -s 127.0.0.1 -p shm
main @ 8db8594 (has genlocks that are always mutex): .48us
git revert 64b31be is .45us (my changes to make us have genlocks with no mutex)
git revert 96f35fa 3670360 1c4e8ea 0d39856 is .46us (ofi_mutex directly)