-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-14535 pool: start/stop ds_pool_child individually #13347
Conversation
Bug-tracker data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Build on Leap 15.4 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13347/1/execution/node/582/log |
da2ee07
to
afec482
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Build on Leap 15.4 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13347/2/execution/node/390/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13347/2/execution/node/313/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13347/2/execution/node/393/log |
Test stage Build RPM on Leap 15.4 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13347/2/execution/node/325/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13347/2/execution/node/355/log |
afec482
to
601c8cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13347/3/execution/node/1130/log |
Reorganize ds_pool_child start/stop code to make it able to be started/stopped individually: - Introduced four states for ds_pool_child: NEW, STARTING, STARTED and STOPPING. - ds_pool_child is added/removed to/from cache through the collective call of pool_child_add/delete_one(), now the cache doesn't hold the ds_pool_child reference anymore. - Introduced ds_pool_child_start/stop() for callers to start/stop ds_pool_child individually, ds_pool_child_state() to query the state of ds_pool_child. - Adjusted ds_pool_child_lookup() & ds_pool_child_get() a bit; Required-githooks: true Signed-off-by: Niu Yawei <yawei.niu@intel.com>
601c8cb
to
8bfc948
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
src/pool/srv_util.c
Outdated
@@ -973,7 +973,6 @@ check_pool_targets(uuid_t pool_id, int *tgt_ids, int tgt_cnt, bool reint, | |||
} | |||
|
|||
ABT_rwlock_unlock(pool->sp_lock); | |||
ds_pool_child_put(pool_child); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ABT_rwlock_rdlock
call on line 937 may cause the current ULT to yield. So, it's not easy to why not holding a pool_child
reference is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, it's better to call ds_pool_child_get() to hold reference. Can I fix it in the follow-on PR (along with the two spelling issues.)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with me.
src/pool/srv_target.c
Outdated
return child; | ||
} | ||
} | ||
return NULL; | ||
} | ||
|
||
struct ds_pool_child * | ||
ds_pool_child_get(struct ds_pool_child *child) | ||
ds_pool_child_get(const uuid_t uuid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a static pool_child_lookup_noref
(or ds_pool_child_lookup_noref
if you need an external one) and keeping the semantics of ds_pool_child_lookup
unchanged (except for checking the state)? And we could remove ds_pool_child_get
if necessary.
That seems to have less impact on other modules, and also more consistent with other "lookup/get/put" APIs we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds good to me. I'll fix it.
Address review comments from LiWei. Fix spelling errors. Required-githooks: true Signed-off-by: Niu Yawei <yawei.niu@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
dss_module_fini_metrics(DAOS_TGT_TAG, child->spc_metrics); | ||
ABT_eventual_set(child->spc_ref_eventual, | ||
(void *)&child->spc_ref, | ||
if (child->spc_ref == 0 && *child->spc_state == POOL_CHILD_STOPPING) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems when the spc_ref drop to zero, the spc_state must be STOPPING?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pool child cache doesn't hold pool child reference now, so the pool child reference could be dropped to zero even if it's still in cache (in start mode).
) Reorganize ds_pool_child start/stop code to make it able to be started/stopped individually: - Introduced four states for ds_pool_child: NEW, STARTING, STARTED and STOPPING. - ds_pool_child is added/removed to/from cache through the collective call of pool_child_add/delete_one(), now the cache doesn't hold the ds_pool_child reference anymore. - Introduced ds_pool_child_start/stop() for callers to start/stop ds_pool_child individually, ds_pool_child_state() to query the state of ds_pool_child. - Removed ds_pool_child_get(); Signed-off-by: Niu Yawei <yawei.niu@intel.com> Signed-off-by: shiying <yshi1210@gmail.com>
Reorganize ds_pool_child start/stop code to make it able to be started/stopped individually: - Introduced four states for ds_pool_child: NEW, STARTING, STARTED and STOPPING. - ds_pool_child is added/removed to/from cache through the collective call of pool_child_add/delete_one(), now the cache doesn't hold the ds_pool_child reference anymore. - Introduced ds_pool_child_start/stop() for callers to start/stop ds_pool_child individually, ds_pool_child_state() to query the state of ds_pool_child. - Removed ds_pool_child_get(); Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Reorganize ds_pool_child start/stop code to make it able to be started/stopped individually: - Introduced four states for ds_pool_child: NEW, STARTING, STARTED and STOPPING. - ds_pool_child is added/removed to/from cache through the collective call of pool_child_add/delete_one(), now the cache doesn't hold the ds_pool_child reference anymore. - Introduced ds_pool_child_start/stop() for callers to start/stop ds_pool_child individually, ds_pool_child_state() to query the state of ds_pool_child. - Removed ds_pool_child_get(); Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Reorganize ds_pool_child start/stop code to make it able to be
started/stopped individually:
and STOPPING.
call of pool_child_add/delete_one(), now the cache doesn't hold the
ds_pool_child reference anymore.
ds_pool_child individually, ds_pool_child_state() to query the state
of ds_pool_child.
Required-githooks: true
Signed-off-by: Niu Yawei yawei.niu@intel.com
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: