-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(#507) Change platform_buffer_create()/destroy() interfaces to platform_buffer_init() / deinit() methods. #508
Conversation
✅ Deploy Preview for splinterdb canceled.
|
@@ -428,6 +428,9 @@ $(BINDIR)/$(UNITDIR)/task_system_test: $(UTIL_SYS) | |||
$(OBJDIR)/$(FUNCTIONAL_TESTSDIR)/test_async.o \ | |||
$(LIBDIR)/libsplinterdb.so | |||
|
|||
$(BINDIR)/$(UNITDIR)/platform_apis_test: $(UTIL_SYS) \ | |||
$(PLATFORM_SYS) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor new unit-test to exercise the changed APIs.
src/platform_linux/platform.c
Outdated
platform_heap_handle UNUSED_PARAM(heap_handle), | ||
platform_module_id UNUSED_PARAM(module_id)) | ||
platform_buffer_create(size_t length, | ||
platform_heap_id heap_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This arg sets the stage to allow diff heap-ID handles in future.
(As discussed previously, for the in-flight prototype dev code to support use of shared memory for allocation, I am reusing this heap_id
as the handle to select the shared-segment ID to allocate from. The change I'm doing here sets up the stage to move to that model, if we ever decide to productize that implementation.
Even if we choose to not go that route, this change by itself is not problematic, and should continue to work correctly.
)
src/platform_linux/platform.c
Outdated
buffer_handle *bh = TYPED_MALLOC(platform_get_heap_id(), bh); | ||
buffer_handle *bh = TYPED_MALLOC(heap_id, bh); | ||
if (bh == NULL) { | ||
return bh; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor bug-fix; otherwise old L99 will seg-fault.
src/platform_linux/platform.c
Outdated
ret = munmap(bh->addr, bh->length); | ||
if (ret) { | ||
return CONST_STATUS(errno); | ||
} | ||
|
||
platform_free(platform_get_heap_id(), bh); | ||
platform_free(heap_id, *bh_in); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interface changed (for the same reasons as for the create()
method) so that we can use the right heap_id
while freeing memory. Otherwise, default platform_get_heap_id()
may be NULL, and we may end up not freeing the memory with the proper heap-ID.
305839a
to
4de46da
Compare
if (platform_use_hugetlb) { | ||
flags |= MAP_HUGETLB; | ||
} | ||
int prot = PROT_READ | PROT_WRITE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocks of code are indented to the left now as we've lost L 66 if()
condition. Function will return early on L73 if bh == NULL
.
src/platform_linux/platform.c
Outdated
platform_status | ||
platform_buffer_destroy(buffer_handle *bh) | ||
platform_buffer_destroy(platform_heap_id heap_id, buffer_handle **bh_in) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the interface to receive address of buffer_handle *
, so that it can be NULL'ed out on exit.
(This is a safer way to avoid callers accidentally accessing bh_in
after this call.
src/rc_allocator.c
Outdated
@@ -388,7 +388,7 @@ rc_allocator_init(rc_allocator *al, | |||
void | |||
rc_allocator_deinit(rc_allocator *al) | |||
{ | |||
platform_buffer_destroy(al->bh); | |||
platform_buffer_destroy(al->heap_id, &al->bh); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example of this usage. al->bh
will be NULL'ed out upon return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
4de46da
to
c23b30b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to propose a different approach:
Change these from create
and destroy
functions to init
and deinit
functions.
Then change all users to declare buffer_handle
s instead of buffer_handle *
fields in their structures.
I believe this will solve the problem you are trying to solve.
I discussed this with Alex and he also thinks this would actually be better than the existing code.
c23b30b
to
b7a2b22
Compare
HI @rtjohnso - Based on your suggestion to rework this approach, I made the required changes. Uploaded a new commit, here. (I will squash the two commits and will clean up the commit messages once you approve this work, and before I integrate this to It has certainly simplified the interface and code and solves the dependency on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome.
There are two main things I'm requesting:
- Clean up the error handling in
clockcache_init
(see comments). - Audit the code you've changed to see if any additional
heap_id
orheap_handle
parameters or structure fields can be eliminated. I think I flagged them all in comments, but it would be good to double-check.
Thank you!
if (!cc->rc_bh) { | ||
|
||
rc = platform_buffer_init(&cc->rc_bh, refcount_size); | ||
if (!SUCCESS(rc)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling needs to be tightened up.
If either platform_buffer_init
fails, then we goto alloc_error
, which does clockcache_deinit
. And clockcache_deinit
unconditionally calls platform_buffer_deinit
on both buffers. And platform_buffer_deinit
unconditionally calls munmap
on the buffer_handle
's addr
. But, depending on where we fail, none, one, or both of the buffers may have not been inited, so calling munmap
could cause an error.
There are two ways to fix this, I think:
- Modify
clockcache_deinit
to check whether the corresponding pointer isNULL
before it callsplatform_buffer_deinit
. So, it would checkcc->data
before callingplatform_buffer_deinit(&cc->bh)
and it would checkcc->refcount
before callingplatform_buffer_deinit(&cc->rc_bh)
. - Modify
clockcache_init
to have different error handling targets and do the cleanup on its own, i.e. havegoto alloc_failure1;
,goto alloc_failure2;
, etc.
My preference is Option 1, but either is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rtjohnso - There are quite a few callers to platform_buffer_deinit()
, and technically, one has to comb through all instances to make sure that we are not calling this wrongly.
A 3rd approach would be to push the checks to inside of platform_buffer_deinit()
.
It will try to do the munmap()
only if bh->addr
is non-NULL. This will solve the error handling issue from clockcache_deinit()
and also protect all other instances where platform_buffer_deinit()
is being called.
What are your thoughts on this approach (before I implement it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And, ooops, ... sorry, I introduced this error handling issue when implementing the change to introduce init() / deinit()
interface, without looking too hard at error handling. Sorry!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rtjohnso : Re: your comment that "Before the buffer_handle is inited, addr may be garbage, so you cannot safely deinit based on the value of addr."
In the two places we are talking about, both rc_allocator_init()
and clockcache_init()
do a ZERO_CONTENTS(al)
and ZERO_CONTENTS(cc)
.
So, certainly for the error path going through clockcache_deinit(), we should be sure that
cc->bh->addrwill be NULL even before
init()` happened.
Unless you can spot something else I am overlooking, seems like pushing the NULL-check to inside platform_buffer_deinit()
should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this in PR meeting. Rework this as suggested by @robj. clockcache_deinit()
will only call platform_buffer_deinit()
only if corresponding pointers are non-NULL.
Remove the fix as applied in the latter function.
Before the buffer_handle is inited, addr may be garbage, so you cannot
safely deinit based on the value of addr.
Best,
Rob
Message ID: ***@***.***>
… |
Actually I think it was broken in different ways before, so no worries!
Thank you fixing it all up.
…On Sun, Jan 8, 2023 at 9:01 PM gapisback ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/clockcache.c
<#508 (comment)>:
> @@ -1841,11 +1843,12 @@ clockcache_init(clockcache *cc, // OUT
/* Entry per-thread ref counts */
size_t refcount_size = cc->cfg->page_capacity * CC_RC_WIDTH * sizeof(uint8);
- cc->rc_bh = platform_buffer_create(refcount_size, cc->heap_handle, mid);
- if (!cc->rc_bh) {
+
+ rc = platform_buffer_init(&cc->rc_bh, refcount_size);
+ if (!SUCCESS(rc)) {
(And, ooops, ... sorry, I introduced this error handling issue when
implementing the change to introduce init() / deinit() interface, without
looking too hard at error handling. Sorry!)
—
Reply to this email directly, view it on GitHub
<#508 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA45FRK6LX3MSDTAVNGS4YTWROLT7ANCNFSM6AAAAAATGFT2QU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
That's pretty un-idiomatic of how `_init` functions usually work and makes
some weird usage rules.
What if I want to use a buffer_handle somewhere else? What if I want one
on the stack? Now I have to know that I need to zero it before I can init
it? Very error prone.
…On Sun, Jan 8, 2023 at 9:44 PM gapisback ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/clockcache.c
<#508 (comment)>:
> @@ -1841,11 +1843,12 @@ clockcache_init(clockcache *cc, // OUT
/* Entry per-thread ref counts */
size_t refcount_size = cc->cfg->page_capacity * CC_RC_WIDTH * sizeof(uint8);
- cc->rc_bh = platform_buffer_create(refcount_size, cc->heap_handle, mid);
- if (!cc->rc_bh) {
+
+ rc = platform_buffer_init(&cc->rc_bh, refcount_size);
+ if (!SUCCESS(rc)) {
@rtjohnso <https://github.com/rtjohnso> : Re: your comment that "*Before
the buffer_handle is inited, addr may be garbage, so you cannot safely
deinit based on the value of addr.*"
In the two places we are talking about, both rc_allocator_init() and
clockcache_init() do a ZERO_CONTENTS(al) and ZERO_CONTENTS(cc).
So, certainly for the error path going through clockcache_deinit(), we
should be sure that cc->bh->addrwill be NULL even beforeinit()` happened.
Unless you can spot something else I am overlooking, seems like pushing
the NULL-check to inside platform_buffer_deinit() should work.
—
Reply to this email directly, view it on GitHub
<#508 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA45FRLHOW4SBR2QRAI4NGLWROQR5ANCNFSM6AAAAAATGFT2QU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@rtjohnso - I implemented much of the cleanup to rip out |
This part, Rob - "What if I want one on the stack? Now I have to know that I need to zero it before I can initit? Very error prone." .... I'm not so sure, it's that error prone. I have been trained and have gotten used to a conventional way of coding where every on-stack structure is always I think requiring that on-stack variables, structures are always zero'ed out before use is a stronger discipline to have / enforce. I can implement it in the ways you are suggesting, but I think then there is more work to chase down all other places where We should probably try to talk this through to resolve the right approach to adopt. |
if (cc->logfile) { | ||
clockcache_log(0, 0, "deinit %s\n", ""); | ||
#if defined(CC_LOG) || defined(ADDR_TRACING) | ||
platform_close_log_file(cc->logfile); | ||
#endif | ||
} | ||
|
||
platform_buffer_deinit(&cc->rc_bh); | ||
if (cc->lookup) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not bother to arrange these backout codes in reverse
order of how they are done in init()
method as each backout fragment is protected under an if()
check.
So, although, not done in "pure backout form", this code ordering should work.
src/clockcache.c
Outdated
platform_buffer_deinit(&cc->bh); | ||
platform_buffer_deinit(&cc->rc_bh); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not bracket this with a corresponding if (!NULL)
check on the member field as I am relying on the bh
struct being zero'ed out on declaration.
Current code always conforms to this ... Let's talk this point through ...
Hi, @rtjohnso & @ajhconway -- I am in the middle of amending the code to correctly handle backout error issues, and to purge The error handling corrections are partially implemented; see this commit. I paused as this needs some unit-testing and I have provided a sample of such a unit-test here. We need better I think there are few open items with this work that are best discussed in-person. Let's try to cover these items on Tue's PR meeting. |
@@ -351,7 +351,7 @@ rc_allocator_init(rc_allocator *al, | |||
return rc; | |||
} | |||
// To ensure alignment always allocate in multiples of page size. | |||
uint32 buffer_size = cfg->extent_capacity * sizeof(uint8); | |||
uint64 buffer_size = cfg->extent_capacity * sizeof(uint8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential bugfix. buffer_size
declared as uint32 could overflow, to 0, for very large extent_capacity
. (Discovered this while trying to build negative test cases in newly added, incomplete, clockcache_test.c
unit test.
@@ -82,12 +82,13 @@ platform_buffer_init(buffer_handle *bh, size_t length) | |||
} | |||
|
|||
if (platform_use_mlock) { | |||
int rc = mlock(bh->addr, length); | |||
if (rc != 0) { | |||
int mlock_rv = mlock(bh->addr, length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename existing rc
which shadows the new instance added at the top of this function (in an earlier commit of this PR cycle).
platform_error_log("mlock (%lu bytes) failed with error: %s\n", | ||
length, | ||
strerror(errno)); | ||
munmap(bh->addr, length); | ||
rc = CONST_STATUS(errno); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested by Rob ... this is is better.
@@ -174,6 +172,7 @@ CTEST_TEARDOWN(btree_stress) | |||
clockcache_deinit(&data->cc); | |||
rc_allocator_deinit(&data->al); | |||
task_system_destroy(data->hid, &data->ts); | |||
platform_heap_destroy(&data->hh); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing in initial code; not an issue right now ... (Not sure why ASAN-builds are not tripping up ...)
src/platform_linux/platform.c
Outdated
if (ret) { | ||
return CONST_STATUS(errno); | ||
} | ||
if (bh->addr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discuss with @rtjohnso -- is this acceptable to you now?
bh->addr = NULL; | ||
bh->addr = NULL; | ||
} | ||
bh->length = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and L124 are added so that we can validate the state of resulting output bh
in unit-tests.
uint64 num_inserts, | ||
uint64 correctness_check_frequency, | ||
task_system *state, | ||
platform_heap_handle hh, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting this arg caused clang-format to throw a fit.
FILE *dev_null = fopen("/dev/null", "w"); | ||
ASSERT_NOT_NULL(dev_null); | ||
platform_set_log_streams(dev_null, dev_null); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block will be simplified with just call to one function when PR #514 checks-in to /main
.
size_t length = (1024 * KiB * GiB); | ||
|
||
// This is expected to fail as length is very large | ||
rc = platform_buffer_init(&bh, length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From code inspection and manual testing, the length set on L90 is sufficiently big-enough to cause mmap()
to fail w/ an error.
Should this approach be tightened further? (I think this is sufficient.)
4289306
to
b588554
Compare
b588554
to
f422a16
Compare
@rtjohnso - As per our discussion in the PR review meeting, I reworked the flow in This latest commit (squashing 2 prev related-commits) should address the last remaining issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! That cleaned up a lot of stuff.
@rtjohnso - About your very last comment, I'm only adding the extra check that in case we are unable to allocate memory to This was also (already) being done in
|
1e928ed
to
bbe8063
Compare
…interfaces. This commit reworks the buffer_handle{} interfaces to now become platform_buffer_init() and platform_buffer_deinit(). Structures that need a buffer_handle{}, declare a nested sub-struct, which will go through this init / deinit interface to allocate / free memory using existing mmap() interfaces. This removes the need for an input 'heap_id / heap_handle' arg to allocate and free memory. This change does not functionally change anything in these methods. Added small unit-test, platform_apis_test, to exercise these changes. Cleanup structures and fns that used to take 'heap_handle *' which now become unused with this rework. Tighten up backout / error handling in clockcache_init() and deinit() code-flow. Co-authored by Rob Johnson, who reworked the entire interfaces as implemented above, to remove the dependency on 'hid' argument.
bbe8063
to
3f57698
Compare
This commit reworks the interfaces of
platform_buffer_create()
andplatform_buffer_destroy()
to now becomeplatform_buffer_init()
andplatform_buffer_deinit()
interfaces.This removes a dependency on
platform_heap_id
orplatform_heap_handle
arg.This change does not functionally change anything in these methods. Also, this (indirectly) fixes a minor bug in the create function to deal with the (very rare) case of failing to allocate memory from the heap for
buffer_handle *bh
. Added small unit-test, platform_apis_test, to exercise these changes.NOTE: This is some common infra-code built in a prototype dev branch for supporting different memory handles.
I'm pulling this piece out on its own for integration to /main, so that this ground work can be laid for integrating future dev-efforts (w/ little less pain).