Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support free-fragment recycling in shared-segment. Add fingerprint object management. #569

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gapisback
Copy link
Collaborator

@gapisback gapisback commented Apr 14, 2023

The main change with this commit is the support for free-fragment lists and recycling of small fragments from shared memory. This was a main limitation of the support added in previous commits.

Another driving factor for implementing free-fragment list support was that previous multi-user concurrent insert performance benchmarking was not functional beyond a point. We would frequently run into shmem Out-Of-Memory (OOMs), even with shmem sizes > 8 GiB (which worked in a prior dev/perf-test cycle).

Design Overview

The main design changes to manage small-fragments are follows:

Managing memory allocation / free using platform_memfrag{} fragments

  • Allocation and free of memory is dealt with in terms of "memory fragments", a small structure that holds the memory->{addr, size}. All memory requests (as is being done previously) are aligned to
    the cacheline.

    • Allocation: All clients of memory allocation have to "hand-in" an opaque platform_memfrag{} handle, which will be returned populated with the memory address, and more importantly, the size-of-the-fragment that was used to satisfy the memory request.

    • Free: Clients now have to safely keep a handle to this returned platform_memfrag{}, and hand it back to the free() method. free() will rely "totally" on the size specified in this input fragment handle supplied. And the free'd memory fragment will be returned to the corresponding free-list bucket, if the fragment's size is one in a small set of free-fragments being tracked.

  • Upon free(), the freed-fragment is tracked in a few free-lists bucketed by size of the freed-fragment. For now, we support 4 buckets, size <= 64, <= 128, <= 256 & <= 512 bytes. (These sizes are sufficient
    for current benchmarking requirements.)

    A free'd fragment is hung off of the corresponding list, threading the free-fragments using the fragment's memory itself.

  • New struct free_frag_hdr{} provides the threading structure. It tracks the current fragment's size and free_frag_next pointer. The 'size' provided to the free() call is is recorded as the free'd fragment's size.

  • Subsequently, a new alloc() request is 1st satisfied by searching the free-list corresponding to the memory request.

For example, a request from a client for 150 bytes will be rounded-up to a cacheline boundary,
i.e. 192 bytes. The free-list for bucket 256 bytes will be searched to find the 1st free-fragment of the right size. If no free fragment is found in the target list, we then allocate a new fragment. The returned fragment will have a size of 256 (for an original request of 150 bytes).

  • An immediate consequence of this approach is that there is a small, but significant, change in the allocation, free APIs; i.e. TYPED_MALLOC(), TYPED_ARRAY_MALLOC() and TYPED_FLEXIBLE_STRUCT_MALLOC(), and their 'Z' equivalents, which return 0'ed out memory.

  • All existing clients of the various TYPED*() memory allocation calls have been updated to declare an on-stack platform_memfrag{} handle, which is passed back to platform_free().

  • In some places memory is allocated to initialize sub-systems and then torn down during deinit(). In a few places existing structures are extended to track an additional 'size' field. The size of the memory fragment allocated during init() is recorded here, and then used to invoke platform_free() as part of the deinit() method.

    • An example is clockcache_init() where this kind of work to record the 'size' of the fragment is done and passed-down to clockcache_deinit(), where the memory fragment is then freed with the right 'size'.

    This pattern is now to be seen in many such init()/deinit() methods of diff sub-systems; e.g. pcq_alloc(), pcq_free(), ...

  • Copious debug and platform asserts have been added in shmem alloc/free methods to cross-check to some extent illegal calls.

Cautionary Note

If the 'ptr' handed to platform_free() is not of type platform_memfrag{} *, it is treated as a generic *, and its sizeof() will be used as the 'size' of the fragment to free.

This works in most cases. Except for some lapsed cases where, when allocating a structure, the allocator ended up selecting a "larger" fragment that just happened to be available in the free-list. The consequence is that we might end-up free'ing a larger fragment to a smaller-sized free-list. Or, even if
we do free it to the right-sized bucket, we still end-up marking the free-fragment's size as smaller that what it really is. Over time, this may add up to a small memory leak, but hasn't been found to be crippling in current runs. (There is definitely no issue here with over-writing memory due to incorrect sizes.)

Fingerprint Object Management

Managing memory for fingerprint arrays was particularly problematic.

This was the case even in a previous commit, before the introduction of the memfrag{} approach. Managing fingerprint memory was found to be especially cantankerous due to the way filter-building and compaction tasks are queued and asynchronously processed by some other thread / process.

The requirements from the new interfaces are handled as follows:

  • Added a new fingerprint{} object, struct fp_hdr{}, which embeds at its head a platform_memfrag{}. And few other short fields are added for tracking fingerprint memory mgmt gyrations.

  • Various accessor methods are added to manage memory for fingerprint arrays through this object.
    E.g.,

    • fingerprint_init() - Will allocate required fingerprint for 'ntuples'.
    • fingerprint_deinit() - Will dismantle object and free the memory
    • fingerprint_start() - Returns start of fingerprint array's memory
    • fingerprint_nth() - Returns n'th element of fingerprint

Packaging the handling of fingerprint array through this object and its interfaces helped greatly to stabilize the memory histrionics.

  • When SplinterDB is closed, shared memory dismantling routine will tag any large-fragments that are still found "in-use". This is percolated all the way back to splinterdb_close(), unmount() and to
    platform_heap_destory() as a failure $rc. Test will fail if they have left some un-freed large fragments.
    (Similar approach was considered to book-keep all small fragments used/freed, but due to some rounding errors, it cannot be a reliable check at this time. So hasn't been done.)

Test changes

Miscellaneous

  • Elaborate and illustrative tracing added to track memory mgmt done for fingerprint arrays, especially when they are bounced around queued / re-queued tasks. (Was a very problematic debugging issue.)

  • Extended tests to exercise core memory allocation / free APIs, and to exercise fingerprint object mgmt, and writable_buffer interfaces:

    • platform_apis_test:
    • splinter_shmem_test.c: Adds specific test-cases to verify that free-list mgmt is happening correctly.
  • Enhanced various diagnostics, asserts, tracing

  • Improved memory usage stats gathering and reporting

  • Added hooks to cross-check multiple-frees of fragments, and testing hooks to verify if a free'd fragment is relocated to the right free-list

@netlify
Copy link

netlify bot commented Apr 14, 2023

Deploy Preview for splinterdb canceled.

Name Link
🔨 Latest commit 9281c83
🔍 Latest deploy log https://app.netlify.com/sites/splinterdb/deploys/65bb050c366a1500096d0cad

@@ -179,7 +179,7 @@ splinterdb_open(splinterdb_config *cfg, splinterdb **kvs);
// Close a splinterdb
//
// This will flush all data to disk and release all resources
void
int
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To percolate errors found by shm-destroy, if large-fragments not free are still found hanging around.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change. Can you add a comment defining the meaning of the return value? e.g.

"returns 0 on success, non-zero otherwise."

Or

"returns

  • 0 on success,
  • a positive integer when all data has been persisted but not all resources were able to be released, and
  • a negative number to indicate that not all data was able to be persisted and the database was unable to shut down safely."

@@ -380,7 +380,6 @@ void PACKEDARRAY_JOIN(__PackedArray_unpack_, PACKEDARRAY_IMPL_BITS_PER_ITEM)(con
#include "poison.h"

#define PACKEDARRAY_MALLOC(size) platform_malloc(size)
#define PACKEDARRAY_FREE(p) platform_free(p)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused interface.

platform_assert(req->num_tuples < req->max_tuples);
req->fingerprint_arr[req->num_tuples] =
platform_assert(btree_pack_can_fit_tuple(req));
fingerprint_start(&req->fingerprint)[req->num_tuples] =
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where you will start to see the use of fingerprint object and its accessor / interfaces.

uint32 *fingerprint_arr; // IN/OUT: hashes of the keys in the tree
hash_fn hash; // hash function used for calculating filter_hash
unsigned int seed; // seed used for calculating filter_hash
fp_hdr fingerprint; // IN/OUT: hashes of the keys in the tree
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The in-place char * array is now replaced by the fingerprint object, which carries inside of it platform_memfrag{} handle to track allocate memory fragment's size and to free it reliably.

req->fingerprint_arr =
TYPED_ARRAY_ZALLOC(hid, req->fingerprint_arr, max_tuples);

fingerprint_init(&req->fingerprint, hid, max_tuples); // Allocates memory
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline memory allocation on old L345 is, henceforth, replaced by init()'ing the fingerprint object ... And so on ...

"Unable to allocate memory for %lu tuples",
max_tuples);
if (!req->fingerprint_arr) {
if (fingerprint_is_empty(&req->fingerprint)) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can no longer check for NULL array ptr, to detect OOM. You must consult the is_empty() method to figure out if there is memory or not.

if (req->fingerprint_arr) {
platform_free(hid, req->fingerprint_arr);
if (!fingerprint_is_empty(&req->fingerprint)) {
fingerprint_deinit(hid, &req->fingerprint);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deinit() will free memory.

if (!cc->lookup) {
goto alloc_error;
}
cc->lookup_size = memfrag_size(&memfrag_cc_lookup);
Copy link
Collaborator Author

@gapisback gapisback Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the first instance of a pair of init / deinit calls, which now need to communicate the size of memory fragment allocated by init().

Like it's done on this line, few common structures now gain a new size field to track the memory fragment's size. These structures are of the kind where they are allocated / init'ed in one function and much later the deinit() method is called in a separate function.

src/clockcache.c Outdated
if (cc->lookup) {
platform_free(cc->heap_id, cc->lookup);
memfrag_init_size(mf, cc->lookup, cc->lookup_size);
platform_free(cc->heap_id, mf);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

free() needs to be told the fragment's size correctly. This is obtained from the size field stashed away when init() was done.

src/clockcache.c Outdated
}
if (cc->entry) {
platform_free(cc->heap_id, cc->entry);
memfrag_init_size(mf, cc->entry, cc->entry_size);
platform_free(cc->heap_id, mf);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same pattern of changes continues. This will appear in many more instances ...

src/memtable.c Outdated
platform_memfrag memfrag_ctxt;
platform_memfrag *mf = &memfrag_ctxt;
memfrag_init_size(mf, ctxt, ctxt->mt_ctxt_size);
platform_free(hid, mf);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: This does look like verbose multi-line repeat code.

@rtjohnso - I did consider whether to add a packaged macro, say, memfrag_init(), to which you supply the addr / init. Inside the body, we can declare a hidden structures and platform_memfrag *mf, do the setup, and pass-it as the 2nd arg.

Can be done ... probably .. did not try it too hard. Wanted to get this into review, and then I expect anyway to get comments on this approach.

We can re-discuss the coding impact this approach has ... and re-visit during review.

@@ -287,7 +291,11 @@ io_handle_deinit(laio_handle *io)
}
platform_assert(status == 0);

platform_free(io->heap_id, io->req);
platform_memfrag memfrag = {.addr = io->req, .size = io->req_size};
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE to myself: This should go away. Currently, memfrag_init_size() is a #define and this struct is exposed in platform.h. Rework this so the fields are hidden, and only memfrag_init_size() is exposed to client code.

This will prevent such naked assignments.

@@ -84,13 +84,16 @@ platform_heap_create(platform_module_id UNUSED_PARAM(module_id),
return STATUS_OK;
}

void
platform_status
Copy link
Collaborator Author

@gapisback gapisback Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Percolating errors upstream from platform_shmdestroy; see L91 below.

platform_histo_handle hh;
hh = TYPED_MANUAL_MALLOC(
hh = TYPED_ARRAY_MALLOC(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equivalent calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I believe the correct macro for this situation is TYPED_FLEXIBLE_STRUCT_ZALLOC.

({ \
debug_assert((n) >= sizeof(*(v))); \
(typeof(v))platform_aligned_malloc(hid, \
PLATFORM_CACHELINE_SIZE, \
(n), \
(mf), \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocation now receives and returns a platform_memfrag{} *, So, macro's call changes.

@@ -368,13 +371,13 @@ extern platform_heap_id Heap_id;
({ \
debug_assert((n) >= sizeof(*(v))); \
(typeof(v))platform_aligned_malloc( \
hid, (a), (n), STRINGIFY(v), __func__, __FILE__, __LINE__); \
hid, (a), (n), NULL, STRINGIFY(v), __func__, __FILE__, __LINE__); \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For most of the consumers, this is good-enough. I could have changed this to require all callers to also declare an on-stack platform_memfrag{}, but that would be more code changes.

The one 'minor' issue with this is that we might incorrectly free a smaller-sized fragment . But that's not a huge loss, so I went with current solution.

"Attempt to free a NULL ptr from '%s', line=%d", \
__func__, \
__LINE__); \
if (IS_MEM_FRAG(p)) { \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is key ... and needs understanding. Please review carefully.

const size_t _reqd = \
(_size + platform_alignment(PLATFORM_CACHELINE_SIZE, _size)); \
platform_free_mem((hid), (p), _reqd, STRINGIFY(p)); \
(p) = NULL; \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rtjohnso - This line and L907 below is what makes it necessary for clients calling free() to do two things:

   platform_memfrag memfrag;
   platform_memfrag *mf;
   ... Do the initialization ...
  platform_free(mf);

I would have liked to skip the mf and simply pass-in &memfrag to free(), but there is a compiler error.

I think this can be fixed with some rework ... but I ran out of energy. Let's review if this can be improved.

platform_do_realloc(const platform_heap_id heap_id,
const size_t oldsize,
void *ptr, // IN
size_t *newsize, // IN/OUT
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reallocation now returns the *newsize, so clients like writable buffer resize can record the new fragment's size in its buffer_capacity field.

This, then, allows writable_buffer_deinit() to correctly supply the newly realloc'ed fragment's size to free.

void *retptr = (heap_id ? splinter_shm_alloc(heap_id, required, objname)
: aligned_alloc(alignment, required));
void *retptr = NULL;
if (heap_id == PROCESS_PRIVATE_HEAP_ID) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified usage of the semantic of heap_id; NULL means process-private-heap-ID, so go thru old malloc()-style code-flow.

int frag_allocated_to_pid; // Allocated to this OS-pid
int frag_freed_by_pid; // OS-pid that freed this fragment
threadid frag_freed_by_tid; // Splinter thread-ID that freed this
int frag_line;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changed, indented and aligned fields for readability.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated: Nothing changed ... only indentation changes ...

# define SHM_LARGE_FRAG_SIZE (90 * KiB)
#else
# define SHM_LARGE_FRAG_SIZE (38 * KiB)
#endif // SPLINTER_DEBUG
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diff in this limit was causing unit-tests to fail in debug builds and pass in release build.

This artifact is leftover from my poc-dev days ... no longer should be needed to separate out limits.

Have stabilized on 32K as lower limit for large fragments in both builds. Moved to shmem.h

typedef struct free_frag_hdr {
struct free_frag_hdr *free_frag_next;
size_t free_frag_size;
} free_frag_hdr;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used to chain free'd-fragments that are returned to the free-list. This tiny struct lives at the head of each free fragment.

Min frag-size is 64 bytes, so we have room.

* can print these after shared segment has been destroyed.
* ------------------------------------------------------------------------
*/
typedef struct shminfo_usage_stats {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidated all metrics, usage-stats into this common struct.

It will be update in-place nested in the shmem control block.

And will also be used to return metrics when shared memory is being dismantled.

Works much better this way!!

@@ -198,13 +309,13 @@ platform_shm_hip(platform_heap_id hid)
static inline void
shm_lock_mem_frags(shmem_info *shminfo)
{
platform_spin_lock(&shminfo->shm_mem_frags_lock);
platform_mutex_lock(&shminfo->shm_mem_frags_mutex);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better solution.... Otherwise, for some cases of new large_inserts_stress_test, we were simply burning up 100% CPU.

In all workloads, pthread-semaphore is less than 5%, sometimes even 1-2% as seen in perf top.

platform_save_usage_stats(shminfo_usage_stats *usage, shmem_info *shminfo)
{
*usage = shminfo->usage;
usage->large_frags_found_in_use = platform_trace_large_frags(shminfo);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much simpler rather than the line-by-line copy that old code from L303 onwards was doing.

} else {
// Try to satisfy small memory fragments based on requested size, from
// cached list of free-fragments.
retptr = platform_shm_find_frag(shminfo, size, objname, func, file, line);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New find method to find small free fragments, thru its free-list-by-size.

{
((free_frag_hdr *)ptr)->free_frag_next = *here;
((free_frag_hdr *)ptr)->free_frag_size = size;
*here = ptr;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inserting free'd- fragment at the head of its free-list

gapisback added a commit that referenced this pull request Dec 10, 2023
Upcoming PR #569 is overhauling large-inserts stress test.
To simplify examining the diffs of this test case as part
of that review, this commit is renaming the test file
to large_inserts_stress_test.c, with appropriate changes
to the build Makefile and test files, to pickup new file.
gapisback added a commit that referenced this pull request Dec 10, 2023
Upcoming PR #569 is overhauling large-inserts stress test.
To simplify examining the diffs of this test case as part
of that review, this commit is renaming the test file
to large_inserts_stress_test.c . Make appropriate changes
to the build Makefile and test files, to pickup new file.
gapisback added a commit that referenced this pull request Dec 10, 2023
Upcoming PR #569 is overhauling large-inserts stress test.
To simplify examining the diffs of this test case as part
of that review, this commit is renaming the test file
to large_inserts_stress_test.c . Make appropriate changes
to the build Makefile and test files, to pickup new file.
gapisback added a commit that referenced this pull request Jan 24, 2024
Upcoming PR #569 is bringing-in support for handling small
fragments. This commit renames existing variables, field names
and a few function names that deal with large-fragment support
to consistently use 'large' in the name. This clears the way
in the namespace for upcoming small-fragment changes.

Some examples:
- struct shm_frag_info -> struct shm_large_frag_info
- E.g., shm_frag_addr -> frag_addr, shm_frag_size -> frag_size ...
- shm_frag_info shm_mem_frags[] -> shm_large_frag_info shm_large_frags[]
- shm_num_frags_tracked -> shm_nlarge_frags_tracked
- platform_shm_find_free() -> platform_shm_find_large()

... No other code-/logic-changes are done with this commit.
gapisback added a commit that referenced this pull request Jan 24, 2024
Upcoming PR #569 is bringing-in support for handling small
fragments. This commit renames existing variables, field names
and a few function names that deal with large-fragment support
to consistently use 'large' in the name. This clears the way
in the namespace for code changes coming from small-fragment
changes.

Some examples:
- struct shm_frag_info -> struct shm_large_frag_info
- E.g., shm_frag_addr -> frag_addr, shm_frag_size -> frag_size ...
- shm_frag_info shm_mem_frags[] -> shm_large_frag_info shm_large_frags[]
- shm_num_frags_tracked -> shm_nlarge_frags_tracked
- platform_shm_find_free() -> platform_shm_find_large()

NOTE: No other code-/logic-changes are done with this commit.
gapisback added a commit that referenced this pull request Jan 24, 2024
Upcoming PR #569 is bringing-in support for handling small
fragments. This commit renames existing variables, field names
and a few function names that deal with large-fragment support
to consistently use 'large' in the name. This clears the way
in the namespace for code changes coming from small-fragment
changes.

Some examples:
- struct shm_frag_info -> struct shm_large_frag_info
- E.g., shm_frag_addr -> frag_addr, shm_frag_size -> frag_size ...
- shm_frag_info shm_mem_frags[] -> shm_large_frag_info shm_large_frags[]
- shm_num_frags_tracked -> shm_nlarge_frags_tracked
- platform_shm_find_free() -> platform_shm_find_large()

NOTE: No other code-/logic-changes are done with this commit.
gapisback added a commit that referenced this pull request Jan 24, 2024
Upcoming PR #569 is bringing-in support for handling small
fragments. This commit renames existing variables, field names
and a few function names that deal with large-fragment support
to consistently use 'large' in the name. This clears the way
in the namespace for code changes coming from small-fragment
changes.

Some examples:
- struct shm_frag_info -> struct shm_large_frag_info
- E.g., shm_frag_addr -> frag_addr, shm_frag_size -> frag_size ...
- shm_frag_info shm_mem_frags[] -> shm_large_frag_info shm_large_frags[]
- shm_num_frags_tracked -> shm_nlarge_frags_tracked
- platform_shm_find_free() -> platform_shm_find_large()

NOTE: No other code-/logic-changes are done with this commit.
gapisback added a commit that referenced this pull request Jan 24, 2024
This commit refactors shared memory usage stats fields to
drive-off shminfo_usage_stats{} struct entirely. Add
platform_save_usage_stats(), used by platform_shm_print_usage().

This refactoring paves the way for upcoming PR #569 which
is adding more memory-usage stats fields.
gapisback added a commit that referenced this pull request Jan 25, 2024
This commit refactors shared memory usage stats fields to
drive-off shminfo_usage_stats{} struct entirely. Add
platform_save_usage_stats(), used by platform_shm_print_usage().

This refactoring paves the way for upcoming PR #569 which
is adding more memory-usage stats fields.
@gapisback gapisback force-pushed the agurajada/shmem-free-list-mgmt-Rev branch from 6c24747 to 33a95f2 Compare January 25, 2024 06:45
@gapisback
Copy link
Collaborator Author

@rtjohnso - The final part-3 shared memory support change-set is now ready for review.

The suggested order in which to review these diffs is:

  1. src/platform_linux/platform.h
  2. src/platform_linux/platform_inline.h
  3. src/platform_linux/platform_types.h
  4. src/util.h, src/util.c
  5. src/platform_linux/shmem.h
  6. src/platform_linux/shmem.c
  7. src/trunk.h, src/trunk.c
  8. src/routing_filter.c
  9. Then the rest of the files.
  10. Good luck!

@rtjohnso
Copy link
Contributor

I think the current memfrag interface is leaky and not general.

I think the interface should look like this:

platform_status
platform_alloc(memfrag *mf, // OUT
               int size);

platform_status
platform_realloc(memfrag *mf, // IN/OUT
                 int newsize);

platform_status
platform_free(memfrag *mf); // IN

void *
memfrag_get_pointer(memfrag *mf);

(Note that details, like the exact names of the functions or the memfrag datatype are not too important in this example.)

The point is that the rest of the code should treat memfrags as opaque objects. In the current code, the rest of the code goes around pulling out fields and saving them for later use. It means that internal details of the current allocator implementation are being leaked all over the rest of the code. This will make it difficult to change the allocator implementation down the road.

As for names, I would advocate renaming memfrag to memory_allocation.

@gapisback
Copy link
Collaborator Author

HI, @rtjohnso --

Thanks for your initial approach on reworking the interfaces.

I'm happy to take this further, but I feel this round-trip discussion will become long and meandering. And this review panel UI exchange is not ideally suited for that kind of interaction.

I want to avoid re-doing the implementation till we've settled on and agreed to the new interfaces. Every bit of code rework requires massively editing the change-set and re-stabilizing - an effort I would like to avoid doing multiple times.

How about I start a new thread under Discussions tab, with your initial proposal? And, will give you my responses, rebuttal. I suspect we will have to go back-and-forth a few times before settling on the final interfaces.

(As a team, we haven't used the Discussions tab feature internally. As I am beginning my transition to fully out-of-VMware, it may be a good opportunity to engage using this GitHub feature, so it continues even when I'm a fully O-Sourced' engineer.)

…ject mgmt

The main change with this commit is the support for free-fragment lists
and recycling of small fragments from shared memory. This was a main
limitation of the support added in previous commits. Another driving
factor for implementing some free-list support was that previous
multi-user concurrent insert performance benchmarking was not functional
beyond a point, and we'd frequently run into shmem Out-Of-Memory (OOMs),
even with shmem sizes > 8 GiB (which worked in a prior dev/perf-test cycle).

The main design changes to manage small-fragments are follows:

Managing memory allocation / free using platform_memfrag{} fragments:

- Allocation and free of memory is dealt with in terms of "memory
  fragments", a small structure that holds the memory->{addr, size}.
  All memory requests (as is being done previously) are aligned to
  the cacheline.

  - Allocation: All clients of memory allocation have to "hand-in"
    an opaque platform_memfrag{} handle, which will be returned populated
    with the memory address, and more importantly, the size-of-the-fragment
    that was used to satisfy the memory request.

  - Free: Clients now have to safely keep a handle to this returned
    platform_memfrag{}, and hand it back to the free() method.
    free() will rely "totally" on the size specified in this input
    fragment handle supplied. And the free'd memory fragment will
    be returned to the corresponding free-list bucket.

- Upon free(), the freed-fragment is tracked in a few free-lists
  bucketed by size of the freed-fragment. For now, we support 4 buckets,
  size <= 64, <= 128, <= 256 & <= 512 bytes. (These sizes are sufficient
  for current benchmarking requirements.) A free'd fragment is hung off
  of the corresponding list, threading the free-fragments using
  the fragment's memory itself. New struct free_frag_hdr{} provides the
  threading structure. It tracks the current fragment's size and
  free_frag_next pointer. The 'size' provided to the free() call is
  is recorded as the free'd fragment's size.

- Subsequently, a new alloc() request is 1st satisfied by searching the
  free-list corresponding to the memory request. For example, a request
  from a client for 150 bytes will be rounded-up a cacheline boundary,
  i.e. 192 bytes. The free-list for bucket 256 bytes will be searched
  to find the 1st free-fragment of the right size. If no free fragment
  is found in the target list, we then allocate a new fragment.
  The returned fragment will have a size of 256 (for an original request
  of 150 bytes).

- An immediate consequence of this approach is that there is a small,
  but significant, change in the allocation, free APIs; i.e. TYPED_MALLOC(),
  TYPED_ARRAY_MALLOC() and TYPED_FLEXIBLE_STRUCT_MALLOC(), and their 'Z'
  equivalents, which return 0'ed out memory.

- All existing clients of the various TYPED*() memory allocation
  calls have been updated to declare an on-stack platform_memfrag{}
  handle, which is passed back to platform_free().

  - In some places memory is allocated to initialize sub-systems and
    then torn down during deinit(). In a few places existing structures
    are extended to track an additional 'size' field. The size of the
    memory fragment allocated during init() is recorded here, and then
    used to invoke platform_free() as part of the deinit() method.
    An example is clockcache_init() where this kind of work to record
    the 'size' of the fragment is done and passed-down to clockcache_deinit(),
    where the memory fragment is then freed with the right 'size'.
    This pattern is now to be seen in many such init()/deinit() methods
    of diff sub-systems; e.g. pcq_alloc(), pcq_free(), ...

- Cautionary Note:

  If the 'ptr' handed to platform_free() is not of type platform_memfrag{} *,
  it is treated as a generic <struct> *, and its sizeof() will be used
  as the 'size' of the fragment to free. This works in most cases. Except
  for some lapsed cases where, when allocating a structure, the allocator
  ended up selecting a "larger" fragment that just happened to be
  available in the free-list. The consequence is that we might end-up
  free'ing a larger fragment to a smaller-sized free-list. Or, even if
  we do free it to the right-sized bucket, we still end-up marking the
  free-fragment's size as smaller that what it really is. Over time, this
  may add up to a small memory leak, but hasn't been found to be crippling
  in current runs. (There is definitely no issue here with over-writing
  memory due to incorrect sizes.)

- Copious debug and platform asserts have been added in shmem alloc/free
  methods to cross-check to some extent illegal calls.

Fingerprint Object Management:

  Managing memory for fingerprint arrays was particularly problematic.
  This was the case even in a previous commit, before the introduction
  of the memfrag{} approach. Managing fingerprint memory was found to
  be especially cantankerous due to the way filter-building and compaction
  tasks are queued and asynchronously processed by some other
  thread / process.

  The requirements from the new interfaces are handled as follows:

   - Added a new fingerprint{} object, struct fp_hdr{}, which embeds
     at its head a platform_memfrag{}. And few other short fields are
     added for tracking fingerprint memory mgmt gyrations.

   - Various accessor methods are added to manage memory for fingerprint
     arrays through this object. E.g.,

     - fingerprint_init() - Will allocate required fingerprint for 'ntuples'.
     - fingerprint_deinit() - Will dismantle object and free the memory
     - fingerprint_start() - Returns start of fingerprint array's memory
     - fingerprint_nth() - Returns n'th element of fingerprint

  Packaging the handling of fingerprint array through this object and
  its interfaces helped greatly to stabilize the memory histrionics.

- When SplinterDB is closed, shared memory dismantling routine will tag
  any large-fragments that are still found "in-use". This is percolated
  all the way back to splinterdb_close(), unmount() and to
  platform_heap_destory() as a failure $rc. Test will fail if they have
  left some un-freed large fragments. (Similar approach was considered to
  book-keep all small fragments used/freed, but due to some rounding
  errors, it cannot be a reliable check at this time. So hasn't been done.)

Test changes:

Miscellaneous:
 - Elaborate and illustrative tracing added to track memory mgmt done
   for fingerprint arrays, especially when they are bounced around
   queued / re-queued tasks. (Was a very problematic debugging issue.)
 - Extended tests to exercise core memory allocation / free APIs, and
   to exercise fingerprint object mgmt, and writable_buffer interfaces:
    - platform_apis_test:
    - splinter_shmem_test.c: Adds specific test-cases to verify that
        free-list mgmt is happening correctly.
 - Enhanced various diagnostics, asserts, tracing
 - Improved memory usage stats gathering and reporting
 - Added hooks to cross-check multiple-frees of fragments, and testing
   hooks to verify if a free'd fragment is relocated to the right free-list
  - Add diagram for large-free fragment tracking.
This commit reworks the interfaces on the lines discussed in this
discussion thread: #615

void *
platform_alloc(memfrag *mf, int size, ...);

void *
platform_realloc(memfrag *mf, int newsize);

void
platform_free(memfrag *mf); // IN

Currently, the return from `platform_free` still remains as void.
Changing it to platform_status will mean plumbing the return handling
to all callers. Also C-system call `free()` is defined as `void`.
So changing `platform_free` to return platform_status will be a
bit inconsistent.
@gapisback gapisback force-pushed the agurajada/shmem-free-list-mgmt-Rev branch from 3588f3e to 9281c83 Compare February 1, 2024 02:42
@gapisback
Copy link
Collaborator Author

@rtjohnso - My CI-stabilization jobs have succeeded. I have squashed all changes arising from our proposal discussion thread into this one single commit and have refreshed this change-set.

You can restart your review on this amended change-set. (I expect CI-jobs will succeed as they did in the stabilization PR #616 )

@gapisback
Copy link
Collaborator Author

gapisback commented Feb 1, 2024

@rtjohnso : Fyi -- I want to log this one ASAN-instability the most recent round of CI-jobs ran into, as I am not going to remember all this later.

Here is the state of affairs and results of my investigations.

  1. CI Job no. 109 (main-pr-asan) job failed with this error:
 build/release-asan/bin/driver_test splinter_test --perf --use-shmem --max-async-inflight 0 --num-insert-threads 4 --num-lookup-threads 4 --num-range-lookup-threads 0 --tree-size-gib 2 --cache-capacity-mib 512

build/release-asan/bin/driver_test: splinterdb_build_version 9281c83f

Dispatch test splinter_test

Attempt to create shared segment of size 8589934592 bytes.

Created shared memory of size 8589934592 bytes (8 GiB), shmid=8617984.

Completed setup of shared memory of size 8589934592 bytes (8 GiB), shmaddr=0x7f6924570000, shmid=8617984, available memory = 8589894272 bytes (~7.99 GiB).

filter-index-size: 256 is too small, setting to 512

Running splinter_test with 1 caches

splinter_test: SplinterDB performance test started with 1 tables

splinter_perf_inserts() starting num_insert_threads=4, num_threads=4, num_inserts=27185152 (~27 million) ...

Thread 2 inserting  37% complete for table 0 ... =================================================================

==2666==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f68fba16f80 at pc 0x7f6b276fef50 bp 0x7f69044aa080 sp 0x7f69044a9828

READ of size 589 at 0x7f68fba16f80 thread T1

Thread 2 inserting  42% complete for table 0 ... OS-pid=2666, OS-tid=2669, Thread-ID=3, Assertion failed at src/trunk.c:2213:trunk_get_new_bundle(): "(node->hdr->end_bundle != node->hdr->start_bundle)". No available bundles in trunk node. page disk_addr=1513291776, end_bundle=3, start_bundle=3

./test.sh: line 115:  2666 Aborted                 "$@"

make: *** [Makefile:558: run-tests] Error 134
  1. Upon re-run this asan job no. 109.1 succeeded.

  2. Attempted to manually re-run this specific test multiple times on my Nimbus-VM,but could not reproduce the ASAN heap-buffer-overflow error. Ran the exact test with different combinations (one run, 4 concurrent runs with the exact same params, 4 concurrent executions with increasing thread-count up to --num-insert-threads 8 --num-lookup-threads 8, and similar stress load on the VM), but could not repro the problem outside CI.

The last variation of this test in manual repro attempts I tried is 4 concurrent invocations of this test: (Logging this here so I can refer to this later on.)

./driver_test splinter_test --perf --use-shmem --max-async-inflight 0 --num-insert-threads 8 --num-lookup-threads 8 --num-range-lookup-threads 0 --tree-size-gib 2 --cache-capacity-mib 512

The VM has 16 vCPUs, so I figured by running with 8 insert-threads and 4 concurrent instances, we'd load the CPU high-enough to tickle any bugs out. But the ASAN problem did not recur in these manual repro attempts.


NOTE: In the original failure in CI, hard to tell exactly, but it seems like the thread ID 2 ran into the ASAN memory over flow and soon after, thread ID=3 ran into this assertion a few lines later:

OS-pid=2666, OS-tid=2669, Thread-ID=3,  Assertion failed at src/trunk.c:2213:trunk_get_new_bundle(): "(node->hdr->end_bundle != node->hdr->start_bundle)". No available bundles in trunk node. page disk_addr=1513291776, end_bundle=3, start_bundle=3

You may recall that I had reported issue #474 some time ago for this trunk bundle mgmt assertion.

I suspect that there is something lurking there that popped up in the CI-run.

I cannot explain how / whether / if this assertion tripping is caused by the ASAN heap-buffer-overflow error or if they are even related. Unfortunately, I could not repro the ASAN issue outside CI, so have to give up on this investigation now.

The rest of the test runs are stable, and this ASAN-job did succeed on a re-run. I have re-reviewed the code-diffs applied recently and could not find anything obviously broken. For now, I will have to conclude that the changes are fine except there may be some hidden instability popping up, possibly triggered by issue #474 mentioned earlier.

Copy link
Contributor

@rtjohnso rtjohnso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've mostly just gone through the headers in the platform code, plus the fingerprint array api.

I assume once we get these nailed down, then most of the changes in the rest of the code will be relatively straightforward updates to the new apis.

Or is there anything else major?

Let's get the new apis sorted and then I can review the whole PR.

@@ -179,7 +179,7 @@ splinterdb_open(splinterdb_config *cfg, splinterdb **kvs);
// Close a splinterdb
//
// This will flush all data to disk and release all resources
void
int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change. Can you add a comment defining the meaning of the return value? e.g.

"returns 0 on success, non-zero otherwise."

Or

"returns

  • 0 on success,
  • a positive integer when all data has been persisted but not all resources were able to be released, and
  • a negative number to indicate that not all data was able to be persisted and the database was unable to shut down safely."

@@ -3167,8 +3173,8 @@ btree_pack_loop(btree_pack_req *req, // IN/OUT
log_trace_key(tuple_key, "btree_pack_loop (bottom)");

if (req->hash) {
platform_assert(req->num_tuples < req->max_tuples);
req->fingerprint_arr[req->num_tuples] =
platform_assert(btree_pack_can_fit_tuple(req));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

platform_histo_handle hh;
hh = TYPED_MANUAL_MALLOC(
hh = TYPED_ARRAY_MALLOC(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I believe the correct macro for this situation is TYPED_FLEXIBLE_STRUCT_ZALLOC.

#define TYPED_ALIGNED_ZALLOC(hid, a, v, n) \

#define TYPED_ALIGNED_MALLOC(hid, a, v, n) \
TYPED_ALIGNED_MALLOC_MF(&memfrag_##v, hid, a, v, n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate this hard-coding of this naming convention, and I think it doesn't solve an important problem.

Let me propose something that might be more useful.

#define TYPED_ALIGNED_MALLOC_AUTO(hid, a, v, n)                                     \
  platform_memfrag v_##memfrag __attribute__(cleanup(platform_free));
  TYPED_ALIGNED_MALLOC_MF(&v_##memfrag, hid, a, v, n)

This allocates memory that will automatically get freed when the function exits the scope of the allocation.

Furthermore, it alleviates the user of the responsibility to declare a memfrag at all.

* Utility macro to test if an argument to platform_free() is a
* platform_memfrag *.
*/
#define IS_MEM_FRAG(x) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will go away, right?

__FILE__, \
__LINE__); \
_mf->addr = NULL; \
_mf->size = 0; \
} while (0)

// Convenience function to free something volatile
static inline void
platform_free_volatile_from_heap(platform_heap_id heap_id,
volatile void *ptr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will change to accept a memfrag, right?

@@ -40,10 +40,11 @@ platform_checksum_is_equal(checksum128 left, checksum128 right)
static void
platform_free_from_heap(platform_heap_id UNUSED_PARAM(heap_id),
void *ptr,
const size_t size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be modified to take a memfrag, right?

platform_free_volatile_from_heap( \
id, (p), STRINGIFY(p), __func__, __FILE__, __LINE__); \
(p) = NULL; \
debug_assert(((p) != NULL), \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you change platform_free_volatile_from_heap to take a memfrag, you won't need all this type checking.

void *retptr = NULL;
if (heap_id == PROCESS_PRIVATE_HEAP_ID) {
retptr = aligned_alloc(alignment, required);
if (memfrag) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get rid of the if. Always require a memfrag parameter.

static inline void
platform_free_from_heap(platform_heap_id heap_id,
void *ptr,
const size_t size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a memfrag instead.

@gapisback
Copy link
Collaborator Author

@rtjohnso - I've gone thru your review comments quickly. Most of those are easily implementable. I will get to it.

I've mostly just gone through the headers in the platform code, plus the fingerprint array api.

I am curious about your review of the fingerprint array API rework. Did you not find any issues with that? I was bracing myself to get lots of comments as this area is fragile and the rework is a bit tricky. If you think this array API is acceptable, then that will reduce a bunch of rework rounds on me.

Let's get the new apis sorted and then I can review the whole PR.

Let me apply the changes requested and then re-test. (CI-re-test stabilization will be a nightmare starting tomorrow.)

Once I go over all the changes, I will be better able to answer this question of yours:

Or is there anything else major?

... for which the answer now is, I don't think so, off-hand.

@rtjohnso
Copy link
Contributor

rtjohnso commented Feb 2, 2024

I left a few comments on the fingerprint array code already.

I haven't done a full evaluation. It seemed more complex than I expected, but I see that it is trying to make explicit some of the complex sharing that goes on with the fingerprint arrays, which is a goal I like. I will want to do a more thorough review of how it is used to understand how it all fits together.

@rtjohnso
Copy link
Contributor

rtjohnso commented Feb 2, 2024

I spoke with Alex today about the overall design, and he really doesn't like how the whole concept of memfrags puts a burden on the rest of the code.

So let's do the following. Whenever the shm code allocates memory, it allocates one extra cache line in front, and stores the memfrag on that cacheline. Later, during a free, you use pointer arithmetic to find the memfrag for that pointer.

deukyeon added a commit that referenced this pull request Jul 30, 2024
commit 369bf55
Author: Rob Johnson <rob@robjohnson.io>
Date:   Wed May 22 16:53:01 2024 -0700

    Robj/onetrust scriptid (#627)

    update to Broadcom's OneTrust script ID

commit 655988f
Author: Rob Johnson <rob@robjohnson.io>
Date:   Wed May 22 13:16:55 2024 -0700

    Robj/deukyeon deadlock (#626)

    fix several compact_bundle bugs when index node has split

commit 05654ab
Author: Rob Johnson <rob@robjohnson.io>
Date:   Sat May 18 17:11:11 2024 -0700

    Robj/print node (#625)

    cleanup trunk printing and add node ids

commit 6a4afab
Author: Rob Johnson <rob@robjohnson.io>
Date:   Thu May 16 13:30:23 2024 -0700

    Robj/fallocate blockdev fix (#623)

    don't fallocate on block devices

commit ac426e4
Author: Rob Johnson <rob@robjohnson.io>
Date:   Thu May 16 10:43:48 2024 -0700

    Robj/leaf split fix (#624)

    Fix bug where a leaf split that resulted in a single leaf would cause an assertion failure in a subsequent compaction.

    The cause of the bug was that the compaction code detected node splits by checking whether the upper-bound key of the node had changed since the compaction request was created.  However, in a leaf split into 1 leaf, the upper bound key doesn't change, but the leaf does get rebundled, meaning that the bundle in the compaction request is no longer live.  This meant that the compaction thought it was seeing a dead bundle in a node that had not split.  This should only occur if the bundle gets flushed from a parent to a child between the enqueuing of the compaction and its execution.  But that can only occur if the node is not a leaf.  Hence the compaction code asserted that the node was not a leaf in this case.

    To fix the bug, we improve the way node splits are detected. Every node has a unique id in its header.  IDs change whenever a node is split, which makes detecting splits trivial.

commit a46705b
Author: Rob Johnson <rob@robjohnson.io>
Date:   Wed Apr 17 14:10:54 2024 -0700

    Fix O_DIRECT and multi-threaded io_context issues (#621)

    * enable --set-O_DIRECT flag in large_inserts_stress_test
    * workaround linux aio oddity: with O_DIRECT, io_getevents with a NULL timeout may still return 0 even though there are in-flight IOs.
    * switch to per-process io contexts

commit 9359c9a
Author: Rob Johnson <rob@robjohnson.io>
Date:   Thu Feb 29 08:38:51 2024 -0800

    Robj/badge (#619)

    add badge to README.md

commit 77f8fc9
Author: Rob Johnson <rob@robjohnson.io>
Date:   Wed Feb 28 01:19:01 2024 -0800

    Set up github actions CI (#617)

    * fix some memory leaks
    * fix some build-system bugs and add a memory-safety assert
    * remove databases during tests to reduce disk usage
    * Create run-tests.yml

commit 2eacefa
Author: Aditya Gurajada <gapisback@gmail.com>
Date:   Fri Jan 26 15:36:52 2024 -0800

    (#599) Replace calls to getpid() with platform'ized platform_getpid()

    This commit cleans-up the `#include <unistd.h>` from several .c
    files that needed access to get pid(). Instead, change the code
    to invoke platform_getpid(), now defined in platform_inline.h

commit 23f5e4f
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Wed Jan 24 05:55:16 2024 -0800

    Refactor to save / print shmem usage stats using common struct.

    This commit refactors shared memory usage stats fields to
    drive-off shminfo_usage_stats{} struct entirely. Add
    platform_save_usage_stats(), used by platform_shm_print_usage().

    This refactoring paves the way for upcoming PR #569 which
    is adding more memory-usage stats fields.

commit 4c0a225
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Sun Dec 10 11:49:54 2023 -0800

    shmem.c: Rename variables relating to large-fragment handling.

    Upcoming PR #569 is bringing-in support for handling small
    fragments. This commit renames existing variables, field names
    and a few function names that deal with large-fragment support
    to consistently use 'large' in the name. This clears the way
    in the namespace for code changes coming from small-fragment
    changes.

    Some examples:
    - struct shm_frag_info -> struct shm_large_frag_info
    - E.g., shm_frag_addr -> frag_addr, shm_frag_size -> frag_size ...
    - shm_frag_info shm_mem_frags[] -> shm_large_frag_info shm_large_frags[]
    - shm_num_frags_tracked -> shm_nlarge_frags_tracked
    - platform_shm_find_free() -> platform_shm_find_large()

    NOTE: No other code-/logic-changes are done with this commit.

commit 634df21
Author: Rob Johnson <rob@robjohnson.io>
Date:   Wed Jan 24 04:24:36 2024 -0800

    Robj/fallocate (#601)

    catch fallocate failure

commit ef74125
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Sat Dec 9 22:30:15 2023 -0800

    (#604) Cache page-size in local variables to reduce multiple lookups.

    In debug-build test runs, profiling shows interfaces
    like clockcache_page_size(), clockcache_config_page_size()
    bubbling up to the top of 'perf top' output. This commit
    replaces multiple calls to lookup functions that retrieve
    the page-size by caching the page-size once per function
    where it's used multiple times.

    The affected interfaces are: btree_page_size(),
    clockcache_page_size(), cache_page_size(),
    cache_config_page_size(), trunk_page_size() and a few
    similar ones.

    These changes add up to saving few seconds of test-execution
    (out of few mins of run-time) in debug-build mode, esp for
    BTree-related tests.

commit 3e259ca
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Sun Dec 10 07:58:21 2023 -0800

    test.sh: Enable running by named-function w/o shared memory.

    This commit adds minor improvements / bug-fixes to test.sh:

    - The capability to run individual test-function(s) by name
      was not working without the "--use-shmem" flag. Rework the
      parameter parsing in all test-functions so that we can now
      invoke this driver script as:

        $ INCLUDE_SLOW_TESTS=true ./test.sh run_slower_unit_tests

      This will run individually named test-functions with
      default memory configuration.

    - run_other_driver_tests() had a bug where tests run by this
      function were not honoring '--use-shmem' arg. Fix this so
      that cache_test, log_test, filter_test can now also be run
      with "--use-shmem" enabled.

    - Introduce Use_shmem global to parse-out --use-shmem arg.
      Rework minion test-functions to drive off of global variable.

    - Update elapsed-time tracking to separately track the test
      execution run-times w/o and w/ shared memory configured.

commit c340a0e
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Sun Dec 10 06:43:01 2023 -0800

    Rename large_inserts_bugs_stress_test.c -> large_inserts_stress_test.c

    Upcoming PR #569 is overhauling large-inserts stress test.
    To simplify examining the diffs of this test case as part
    of that review, this commit is renaming the test file
    to large_inserts_stress_test.c . Make appropriate changes
    to the build Makefile and test files, to pickup new file.

commit ae4636f
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Thu Dec 7 15:08:05 2023 -0800

    CI: Bump timeout from 3h to 4h. shmem-tests cause debug test runs to take longer.

    After addition of new large_inserts_stress_test, being done as part of
    PR #569 (free/memory mgmt support for shared memory), CI-debug jobs
    are timing out at current timeout=3h.

    Bump this timeout limit to 4h, to see if test-jobs complete.

commit c169d5e
Author: Gabe Rosenhouse <grosenhouse@vmware.com>
Date:   Mon Nov 20 15:51:24 2023 -0800

    Another fix for an old CI problem (#602)

    This field is documented as optional and defaulting to "latest" [0]
    but for some reason, setting it explicitly seems to matter [1].

    [0]: https://github.com/concourse/registry-image-resource#source-configuration
    [1]: https://vmware.slack.com/archives/CEUC18KQA/p1689754476227109?thread_ts=1689753436.440569&cid=CEUC18KQA

commit c64f68d
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Mon Oct 10 15:22:07 2022 -0700

    Support for multi-process execution, with processes using shared memory

    This commit extends core shared memory support to now allow for
    a multi-process execution model, where multiple processes can now
    attach to Splinter shared memory. Core thread-specific concurrency
    primitives are modified, slightly, to now also support a
    multi-process execution model.

    - This commit sets up the stage to support fork()'ed or other OS-processes
      running with --use-shmem option, where each process will [in future]
      masquerade as a Splinter thread. A core change needed to move to that
      execution model is to support thread-specific IO-context structures.
      Otherwise, if an/other OS-process tries to do IO using AIO-context
      established by the main thread (i.e. by the process that started up
      SplinterDB), we will immediately run into hard IO-system call errors.

    This commit:
       - Performs an io_setup() for each thread / process
       - Manages this AIO-context tightly bound to Splinter's thread context
       - Does required book-keeping to keep this IO-context state kosher
         in conjunction with thread registration / de-registration.
       - Updates existing io_apis_test to deal with thread-specific IO
         context handles.

       An alternative could be localize this change-in-behaviour (of setting
       up thread-specific IO-context structs) only when the process-model of
       execution comes around. That execution model requires configuring
       SplinterDB with shared-memory support. But, just by looking at
       --use-shmem (or corresponding config setting), we cannot be sure that
       the process-model will be used or if we are just re-running rest of
       the test suites with shared-segment enabled. So, without trying to
       further complicate this choice-making, with this commit we will always
       set up thread-specific AIO-context structures.

    Collection of lower-level changes to move to this execution model:

     - platform_buffer_init() that mmap()s' memory for the buffer cache
       will now use MAP_SHARED (v/s MAP_PRIVATE).
       The issue is that some parts of structures, e.g. buffer cache, are
       allocated using mmap(). The flags for this were MAP_PRIVATE, which
       means this memory is only accessible to the main process that set up
       Splinter. All child threads work on a COW-version of this mapped
       memory.
       So the changes done by the child process to the BTree in the buffer
       cache are not visible to the parent process.

     - Convert synchronization primitives to be shared across processes.

       This commit reworks core synchronization APIs to use interfaces
       that allow the sync-hook across child processes. This affects:

         - platform_mutex_init()
         - platform_spinlock_init()
         - platform_condvar_init()
         - platform_semaphore_init()
         - Add corresponding API-exerciser unit-tests for sanity coverage

     - Now that we have thread-specific IO-context setup, as part of
       thread register / deregister, we now also do io_register_thread(),
       io_deregister_thread(). This is basically book-keeping state of
       the thread w.r.t IO setup & context.

    Testing changes added:

      - Support --fork-child to test execution options. Some new tests
        will honor this argument, and will exercise activity using
        a forked-process execution model.

      - New test splinterdb_forked_child_test added: This covers the
        cases to show that IO errors could be repro'ed when running Splinter
        activity from a forked child process. Many other cases are added
        to this framework to exercise different cases of forked process
        doing SplinterDB activity. Much code/dev stabilization was achieved
        through this single new test.

      - Add case test_seq_key_seq_values_inserts_forked to large_inserts_stress
        test.

      - Existing functional io_apis_test to run with --fork-child option,
        thereby creating the scenario(s) of forked processes exercising
        the basic IO APIs.

      - Add new & extended tests to test.sh, for extended coverage using
        shared-memory and multi-process execution.

      - Add support for --wait-for-gdb and wait_for_gdb_hook() function.

        To debug forked child processes, add support for new command-line
        flag: --wait-for-gdb . And add a looping function where we can
        set a breakpoint, wait_for_gdb_hook(). Use this facility in
        splinterdb_forked_child_test.c, which has helped debug errors
        seen while running test_multiple_forked_process_doing_IOs().

    Changes arising from review comments: Mostly cleanup:

     - splinterdb.c: Redefine testing-accessor methods to return correct
       <data type> *, rather than void *

     - platform.c: Clean-up error handling in platform_condvar_init(),
       using goto labels. Add missing pthread_mutex_destroy() in one case.

     - platform_shmcreate() will now return heap-ID as start address of
       allocated shared segment. Adjust platform_heap_id_to_shmaddr()
       appropriately.

     - rename tests/splinterdb_test_apis.h -> src/splinterdb_tests_private.h

     - Rename test config 'num-forked-processes' -> 'num-processes'.
       Adjust tests accordingly.

commit 42799b1
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Mon Sep 26 09:45:25 2022 -0700

    Core changes to support running Splinter with allocated shared memory.

    Support to run SplinterDB with shared memory configured for most
    memory allocation is an -EXPERIMENTAL- feature added with this commit.

    This commit brings in basic support to create a shared memory segment and
    to redirect all memory allocation primitives to shared memory. Currently,
    we only support a simplistic memory mgmt; i.e. only-allocs, and a very
    simplistic handling of free() of the very last memory piece allocated.
    With shared segments of 1-2 GiB we can run all functional and unit tests.

    The high-points of the changes are:

    - External configuration: splinterdb_config{} gains a few new visible
      fields to configure and troubleshoot shared memory configuration.
       - Boolean: use_shmem: Default is OFF
       - size_t : shmem_size:

    - The main driving change is the re-deployment of platform_heap_id 'hid'
      arg that appears in all memory-related interfaces. If Splinter is
      configured for shared memory use, 'hid' will be an opaque handle to
      the shared segment. Most memory allocation will be redirected to new
      shmem-based alloc() / free() interfaces.

    - Formalize usages of PROCESS_PRIVATE_HEAP_ID: A small number of clients
      that wish to repeatedly allocate large chunks of memory tend to cause
      OOMs. The memory allocated by these clients is not shared across threads
      / processes. For such usages, introduce PROCESS_PRIVATE_HEAP_ID as an
      alias to NULL, defaulting to allocating memory from the heap.

    - Manage handling of heap-ID to platform_get_heap_id() to correctly
      return the handle to shared memory. (Otherwise, it would return
      NULL by default.)

    - BTree pack allocates large fingerprint-array. This also causes large
      tests to run into OOMs. For threaded execution, it's ok if the memory
      for this array is allocated from the heap. But for multi-process
      execution, when one process (thread) allocates this finger print
      array, another thread may pick up the task to compact a bundle and
      will try to free this memory.

      So, this memory has to come from shared memory. To cope with such
      repeated allocations of large chunks of memory to build fingerprint,
      a small scheme for recycling such "free"-large-memory chunks scheme
      is supported by shmem module.

      Applied this technique to recycle memory allocated for iterators also.
      They tend to be big'gish, so can also cause shmem-OOMs.

    - All existing functional and unit-tests have been enhanced to now
      support "--use-shmem" argument. This will create Splinter with
      shared memory configured, and tests are run in this mode.

      This change brings-in quite a good coverage of existing testing for
      this new feature.

       - New test: large_inserts_bugs_stress_test -- added to cover the
         primary use-case of concurrent insert performance benchmarking
         (that this feature is driving in prior integration effort).

       - test.sh enhanced to run different classes of test with the
         "--use-shmem" option.

    - Diagnostis & Troubleshooting:

       - Shmem-based alloc/free interfaces extended to print name of object
         and other call-site info, to better pinpoint source code-flow
         leading to memory issues.

       - Add shared memory usage metrics, including for large-fragment
         handling.  Report summary-line of metrics when Splinter is shutdown.
         Print stats on close.

       - Add various utility diagnostic helper methods to validate that
         addresses within shared memory are valid. Unit-tests and some asserts
         use these.

    - minor #include cleanups

    Changes arising through review cycle and stabilization v/s /main:

    - In test.sh/run_slower_unit_tests(), re-enable execution of
      large_inserts_bugs_stress_test, but bracketted under "set +e" / "set -e"
      settings. If this test fails in CI (as it does randomly), hopefully,
      this SET toggling will allow the rest of the script to still run. CI job
      should not fail immediately.
      (Some deeper stabilization is needed for these test cases.)

    - Purged the heap_handle * in shmem.h/.c module and through the rest
      of the Splinter code. Only heap-ID is a valid handle anymore.

    - Fix race condition bug in platform_shm_alloc()

    - Added Micro-optimization to recycle last-allocated frag being freed.

    - Add config_parse_use_shmem() as parsing interface to see if
      "--use-shmem" was supplied. Apply to many unit-/functional-tests.

    Rework unit-tests to use config_parse_use_shmem() to support --use-shmem parsing.

    Re-enable large_inserts_bugs_stress_test execution.

commit 2fb4d7c
Author: Deukyeon Hwang <deukyeon@users.noreply.github.com>
Date:   Tue Aug 15 17:19:03 2023 -0700

    Fix the compile error on platform_open_log_file() (#596)

commit 4679bb7
Author: Rob Johnson <rob@robjohnson.io>
Date:   Tue Aug 15 16:15:54 2023 -0700

    remove btree rough count stuff, since it is unused (#594)

commit 3bf7023
Author: Rob Johnson <rob@robjohnson.io>
Date:   Wed Jul 26 18:53:26 2023 -0700

    Bidirectional Iterators (#588)

    * tweak iterator api to make it easier to add bidirectionality

    * debugging btree reverse iteration

    * reduce time we hold locks during btree_split_child_leaf

    * further refine the locking in btree node splits and fix reverse iterator bug

    * btree iterator init at key other than min and add btree_iterator_seek

    * splinterdb_iterator_prev implmentated and working

    * clang formatting

    * improve the trunk iterator logic

    * corrections for pull request

    * more pull request fixes

    * assert fix

    * more pull request feedback

    * iterator stress test, bug fixes, formatting

    * final bit of pr feedback

    * formatting

    ---------

    Co-authored-by: Evan West <evan.ts.west@gmail.com>

commit 950df20
Author: Rob Johnson <rob@robjohnson.io>
Date:   Tue Jul 25 16:27:33 2023 -0700

    allow merge callbacks to be NULL (#577)

    In that case, splinterdb_update() is not supported.

commit a7547cd
Author: deukyeon <deukyeon@users.noreply.github.com>
Date:   Tue Jul 25 15:15:05 2023 -0700

    (#580) Cleanup some bool stuff (#584)

    * Add the header file for _Bool

    * converting bool to bool32

    * move stdbool include

    ---------

    Co-authored-by: Rob Johnson <rob@robjohnson.io>

commit a3e9469
Author: deukyeon <deukyeon@users.noreply.github.com>
Date:   Mon Jul 24 18:02:50 2023 -0700

    Set the addresses of log for super block if it is. (#582)

    Co-authored-by: Alex Conway <aconway@vmware.com>

commit 3cec342
Author: Gabe Rosenhouse <grosenhouse@vmware.com>
Date:   Mon Jul 24 17:12:20 2023 -0700

    CI fix: use new version of registry-image resource (#593)

commit d2e8369
Author: Evan West <evan.ts.west@gmail.com>
Date:   Thu Jul 13 20:22:03 2023 +0000

    fix formatting in trunk.c

commit fad27b5
Author: Evan West <evan.ts.west@gmail.com>
Date:   Fri Jul 7 18:38:55 2023 +0000

    remove local_max_key and fix filter assertion

commit b6dafdf
Author: Gabe Rosenhouse <grosenhouse@vmware.com>
Date:   Mon Jun 26 11:13:59 2023 -0700

    CI: switch PR resource to maintained one (#587)

commit 6a2348c
Author: Rob Johnson <rob@robjohnson.io>
Date:   Sun Apr 30 13:04:41 2023 -0700

    Robj/memtable race fix (#574)

    * Memtable Generation Bugfix

    Fixes a bug where memtable_maybe_rotate_and_get_insert_lock would
    speculatively increment the memtable generation even when the next
    memtable was not yet ready. This would cause concurrent lookup threads
    to attempt to access that memtable, resulting in errors.

    This fix requires the insert threads to wait until the next memtable is
    ready before finalizing the current one.

    * abstract memtable and trunk root-addr locking apis

    ---------

    Co-authored-by: Alex Conway <aconway@vmware.com>

commit 8c639a0
Author: Rob Johnson <rob@robjohnson.io>
Date:   Mon Apr 24 21:47:01 2023 -0700

    fix next_req node-split bug in trunk (#575)

commit 1e8f790
Author: deukyeon <deukyeon@users.noreply.github.com>
Date:   Mon Apr 24 19:36:07 2023 -0700

    (#546) Fix the segmentation fault after splinterdb_stats_reset() (#547)

    Previously, when trunk_stats_reset() was called by
    splinterdb_stats_reset(), the entire statistics of a trunk, including
    the histogram handles, were reset to zero.

commit fa990cf
Author: Alex Conway <aconway@vmware.com>
Date:   Tue Dec 20 21:19:00 2022 +0000

    Copy-on-Write Trunk

    This changeset implements copy-on-write for trunk nodes, which includes
    several high-level changes. This PR still needs to be rebased onto main,
    but the the purpose is to discuss high- and low-level design decisions.

    Changes in this PR:

    Trunk root lock. A distributed RW lock is used to access/change the
    current root.

    Flush from root. Flushes proceed from the root and cascade immediately
    rather than being triggered at the beginning of trunk_compact_bundle.

    Copy-on-write. Trunk nodes cannot be modified directly, and instead are
    change via a copy-on-write of the root-to-node path together with a
    change of the root node.

    Garbage Collection for unlinked branches and filters. After a
    copy-on-write, the nodes on the old path will be unreferenced. This PR
    does not GC the trunk nodes themselves, but it includes a GC path to
    dereference the replaced branches and filters.

    platform_batch_rwlock. Replaces distributed locks using dummy cache
    pages with a batched distributed RW lock implementation in
    platform.[ch].

commit b5a283b
Author: Gabe Rosenhouse <grosenhouse@vmware.com>
Date:   Sat Apr 22 20:10:34 2023 -0700

    Update CONTRIBUTING.md to describe ok-to-test PR label (#573)

commit d991bac
Author: Gabe Rosenhouse <grosenhouse@vmware.com>
Date:   Thu Apr 20 14:25:56 2023 -0700

    CI requires an "ok-to-test" label before running PRs (#572)

commit 77ab353
Author: gapisback <89874928+gapisback@users.noreply.github.com>
Date:   Thu Apr 20 14:12:23 2023 -0700

    CI: Bump timeout from 2h to 3h. shmem-tests cause debug test runs to take longer. (#566)

    In-flight stabilization of shared memory support in Splinter is bringin
    along with tons more additional tests. We are effetively running most
    of the existing twice; once w/o and once w/ shared memory configured.
    Debug-build test runs are timing out at 2 hours. Bump timeout to 3hs,
    and once stabilized, we can look into dropping this to 2h.

commit b2245ac
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Wed Apr 5 16:29:13 2023 -0700

    Fix bug in output formatted by size_to_str() helper.

    Fractional portion of value formatted by size_to_str() was
    incorrect. We were losing the scale for things which were
    supposed to be "xx.07", we were reporting "xx.7", which is
    incorrect.

commit 8a04854
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Mon Mar 20 15:26:29 2023 -0700

    (#548) Use _Bool for boolean fields in external config struct.

    In SplinterDB's public splinterdb_config{} config, we have few fields
    defined as 'bool' which is typedef'ed to int32 on our side. This creates
    compatibility problems when linking this library with other s/w
    which may have defined 'bool' as 1-byte field. (Offsets of fields in
    the splinterdb_config{} struct following 1st field defined as 'bool'
    changes across dot-oh's.) This commit slightly adjusts the typedefs of
    boolean fields in external structs to now use _Bool. This should reduce
    the risk of such incompatibilities.

    Change return type of methods in public_util.h to _Bool .

    Relocate typedef int32 bool to private platform_linux/platform.h so it's
    used only on Splinter-side.

    Cleaned-up few instances around use of bool type for code hygiene:

     - Minor adjustment to routing_filter_is_value_found() returning bool.
     - Stray references to use of 0/1 for boolean values with FALSE/TRUE.

commit f3c92ef
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Tue Apr 4 15:51:05 2023 -0700

    (#561) Fix bug in routing_filter_prefetch(), causing assertion to trip.

    This commit fixes a simple arithmetic error in routing_filter_prefetch()
    while computing next page's address. The bug results in a debug-assert
    in clockcache_get_internal(), or an unending hang in clockcache_get()
    code-flow using release binary.

    A new test case test_issue_458_mini_destroy_unused_debug_assert has been
    added which reproduces the problem. However, this case still runs into
    another failure (being tracked separately), so this case is currently
    being skipped.

commit 89f09b3
Author: Gabriel Rosenhouse <grosenhouse@vmware.com>
Date:   Wed Mar 22 20:22:05 2023 -0700

    CI: use gcc for ASAN jobs

commit 9037ebe
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Wed Mar 22 10:54:35 2023 -0700

    (#554) Fixes to get couple of tests running cleanly in ASAN-builds

    This commit fixes minor errors in 2 tests (io_apis_test, filter_test)
    to get them running cleanly in ASAN-builds.

commit ea7203a
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Wed Mar 22 14:25:29 2023 -0700

    (#554) Enhance test.sh to run a sub-set of tests named by their driving function.

    This commit now allows running as "test.sh <fn-name>" interface, where
    the name of the driving function executing a batch of tests can be
    run independently, without having to go through full execution of all
    tests. This helps developers shorten their fix-dev-test cycle, especially
    when validating quick-fixes for long-running tests, like ASAN / MSAN builds.

commit a5c821c
Author: Gabriel Rosenhouse <grosenhouse@vmware.com>
Date:   Tue Mar 21 11:00:01 2023 -0700

    CI: temporarily cover the shmem branch

    revert this once it merges

commit 5dd7535
Author: Gabriel Rosenhouse <grosenhouse@vmware.com>
Date:   Tue Mar 21 10:54:58 2023 -0700

    CI: run msan and asan tests on all PRs

commit 98f5ca1
Author: Gabriel Rosenhouse <grosenhouse@vmware.com>
Date:   Tue Mar 21 10:59:29 2023 -0700

    CI: fixup for multi-branch work

commit 6be0461
Author: Gabriel Rosenhouse <grosenhouse@vmware.com>
Date:   Mon Mar 20 15:16:00 2023 -0700

    CI: refactor config to enable coverage of multiple branches

commit d9fcc40
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Mon Jan 9 17:55:52 2023 -0800

    Identify Memtable v/s Branch page types via BTree-print routines.

    This commit extends BTree-print routines to also report the page
    type, whether it's a branch or a memtable BTree. As the structures and
    print methods are shared between two objects, this extra information
    will help in diagnostics. Trunk nodes are likewise identified.
    Extend btree_print_tree() to receive page_type arg.
    Minor fix in trunk_print_pivots() to align outputs for pivot key's string.

commit 2ea30f4
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Fri Jan 6 12:54:11 2023 -0800

    (#500) Move hook-related global vars to task_system{} struct.

    This commit removes the dependency of task system structures on global
    variables declared in task.c . The hook-related variables are now
    moved as members of the task_system{} struct. This removes accessing
    potentially stale values when task-system is destroyed and re-created.
    Also, TASK_MAX_HOOKS is now decreased from 8 to 4. This change largely
    has no functional impact, and is mostly a test-stabilization fix.

commit 4b6e0b1
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Mon Feb 6 17:18:13 2023 -0800

    Add initial support for message logging levels, used in unit-tests

    This commit does some clean-up and normalizes the behaviour of
    interfaces to control outputs from C-unit tests. The main goal
    is to reduce voluminous output generated by few unit-tests that
    exercise print diagnostic code (which otherwise crashes browsers
    when viewing test-run outputs in CI). An additional benefit of this
    rework is that we now have a way to run unit-tests to see output
    generated at different verbosity levels.

    - By default, unit test execution remains silent and only error messags
      will be printed. ctests' main() takes care of setting this up.
    - set_log_streams_for_tests() becomes the single-interface that unit
      test code has to invoke, when needed to change the test output's
      verbosity level.
    - Small collection of MSG_LEVEL_ levels added to ctest.h

    Test execution examples: Run with env-var to see diff outputs:
      VERBOSE=0 (or unset env-var): Default; silent output
      VERBOSE=3 : See error messages
      VERBOSE=6 : See info and error messages
      VERBOSE=7 : See all messages; mainly intended to collect debug output

commit 1f09113
Author: gapisback <89874928+gapisback@users.noreply.github.com>
Date:   Thu Jan 19 13:42:32 2023 -0800

    Fix-up indentation of multi-line comments to conform to coding standards. (#535)

    This commit fixes up several comments in btree.c to conform to the
    style we have followed elsewhere for multi-line comments. No code
    logic changes are done with this commit.

commit caeaeb1
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Fri Dec 23 10:26:27 2022 -0800

    (#513) Add set_log_streams_for_*() fns to manage unit-test outputs.

    This commit refactors existing chunks of code that exists in different unit-
    test sources to manage output file handles to a common function. This is
    defined in new file unit/unit_tests_common.c ; set_log_streams_for_tests()
    Tests that check error-raising behaviour will now need to call
    set_log_streams_for_error_tests(), to manage output streams.

    Minor correction to TEST_DB_NAME; change it to conform to the r.e. defined
    in .gitignore to suppress listing this in 'git status' output.

commit 7e85a29
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Thu Dec 22 10:44:04 2022 -0800

    (#511) Add fns to print 'size' as human-readable string w/ unit-specifiers

    This commit adds couple of utility functions to snprintf(), in an output
    buffer, the 'size' unit to a human-readable string with unit-specifiers.

    - size_to_str() - Convert 'size' to a string in an output buffer
    - size_to_fmtstr() - Same as above, using user-specified format-string.
       Useful to generate output enclosed in, e.g., '(%s)'.
    - Add size_str(), size_fmstr() caller-macros to simplify calling these
      formatting functions. These macros declare on-stack buffers used to
      format the output string. size_str() provided by Rob Johnson, to greatly
      simplify the usage.

    Add utility bytes-to-Units conversion macros. Add unit tests to exercise
    these interfaces. Apply these utility fns in couple of stats-printing
    and BTree print-methods, to display size values as human-friendly unit
    specifiers.

commit 84484df
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Fri Jan 6 10:47:16 2023 -0800

    (#499) Minor cleanup of INVALID_TID, and MAX_THREADS in task-system

    This commit applies some minor clean-up to task system as a follow-on
    to the larger rework done under PR #497. Consistently use STATUS_BUSY,
    as a way to report when all concurrent threads are in-use. Minor
    changes are done to task system unit-test code & cleanup of comments.

commit 60c7910
Author: Aditya Gurajada <agurajada@vmware.com>
Date:   Wed Dec 21 16:32:08 2022 -0800

    (#507) Rework of platform_buffer_create()/destroy() to init/deinit() interfaces.

    This commit reworks the buffer_handle{} interfaces to now become
    platform_buffer_init() and platform_buffer_deinit().

    Structures that need a buffer_handle{}, declare a nested sub-struct,
    which will go through this init / deinit interface to allocate / free
    memory using existing mmap() interfaces. This removes the need for
    an input 'heap_id / heap_handle' arg to allocate and free memory.
    This change does not functionally change anything in these methods.
    Added small unit-test, platform_apis_test, to exercise these changes.
    Cleanup structures and fns that used to take 'heap_handle *' which
    now become unused with this rework. Tighten up backout / error handling
    in clockcache_init() and deinit() code-flow.

    Co-authored by Rob Johnson, who reworked the entire interfaces as
    implemented above, to remove the dependency on 'hid' argument.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants