Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys_heap: a new/simpler/faster memory allocator #17628

Closed
wants to merge 2 commits into from

Conversation

andyross
Copy link
Contributor

This is the new heap implementation and test rig I've been talking about. See commit logs and comments for design notes.

Specifically note that the heap code itself is dirt simple (mostly, once you get your head around the field accessor API), and that more than half the added code here is in the validation and test rigs. @andrewboie will be happy to see that coverage on the new files is effectively 100% (there's some cleanup to do with some of the validation lines, mostly "return false;", which obviously don't fail and don't get hit).

It passes everywhere right now except for riscv32 and nios2 (no idea yet, haven't had a chance to look).

This is just the new code. I still need to finish up wrapper APIs for sys/k_mem_pool before this can go in. I was hoping to have that done today, but it might stretch another day. And even then we'll likely leave both implementations in place and make them switchable via kconfig for a version.

On top of those, I should also turn the existing stress rig into a proper benchmark and add that so we can track performance with a real usage simulator.

Andy Ross added 2 commits July 17, 2019 13:39
The existing mem_pool implementation has been an endless source of
frustration.  It's had alignment bugs, it's had racy behavior.  It's
never been particularly fast.  It's outrageously complicated to
configure statically.  And while its fragmentation resistance and
overhead on small blocks is good, it's space efficiencey has always
been very poor due to the four-way buddy scheme.

This patch introduces sys_heap.  It's a more or less conventional
segregated fit allocator with power-of-two buckets.  It doesn't expose
its level structure to the user at all, simply taking an arbitrarily
aligned pointer to memory.  It stores all metadata inside the heap
region.  It allocates and frees by simple pointer and not block ID.
Static initialization is trivial, and runtime initialization is only a
few cycles to format and add one block to a list header.

It has excellent space efficiency.  Chunks can be split arbitrarily in
8 byte units.  Overhead is only four bytes per allocated chunk (eight
bytes for heaps >256kb or on 64 bit systems), plus a log2-sized array
of 2-word bucket headers.  No coarse alignment restrictions on blocks,
they can be split and merged (in units of 8 bytes) arbitrarily.

It has good fragmentation resistance.  Freed blocks are always
immediately merged with adjacent free blocks.  Allocations are
attempted from a sample of the smallest bucket that might fit, falling
back rapidly to the smallest block guaranteed to fit.  Split memory
remaining in the chunk is always returned immediately to the heap for
other allocation.

It has excellent performance with firmly bounded runtime.  All
operations are constant time (though there is a search of the smallest
bucket that has a compile-time-configurable upper bound, setting this
to extreme values results in an effectively linear search of the
list), objectively fast (about a hundred instructions) and amenable to
locked operation.  No more need for fragile lock relaxation trickery.

It also contains an extensive validation and stress test framework,
something that was sorely lacking in the previous implementation.

Note that sys_heap is not a compatible API with sys_mem_pool and
k_mem_pool.  Wrappers for those (now-) legacy APIs appear later in
this patch series.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Use the white box validation and test rig added as part of the
sys_heap work.  Add a layer that puts hashed cookies into the blocks
to detect corruption, check the validity state after every operation,
and enumerate a few different usage patterns:

+ Small heap, "real world" allocation where the heap is about half
full and most allocations succeed.

+ Small heap, "fragmentation runaway" scenario where most allocations
start failing, but the heap must remain consistent.

+ Big heap.  We can't test this with the same exhaustive coverage
(many re/allocations for every byte of storage) for performance
reasons, but we do what we can.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
@andyross andyross requested review from pdunaj and npitre July 17, 2019 21:02
@andyross andyross added the DNM This PR should not be merged (Do Not Merge) label Jul 17, 2019
*/

/* Note: the init_mem/bytes fields are for the static initializer to
* have somewhere to put the arguments. The actual heap metadata at
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... forgot to remove this bit. Ignore this comment and those fields. I yanked the static initializer from this version entirely because the semantics of SYS_MEM_POOL_DEFINE is a little too weird to wrap anyway.

@zephyrbot zephyrbot added area: API Changes to public APIs area: Tests Issues related to a particular existing or missing test labels Jul 17, 2019
@zephyrbot
Copy link
Collaborator

Found the following issues, please fix and resubmit:

checkpatch issues

-:133: WARNING:TYPO_SPELLING: 'successfull' may be misspelled - perhaps 'successful'?
#133: FILE: include/sys/sys_heap.h:127:
+ * Results, including counts of frees and successfull/unsuccessful

-:301: WARNING:LONG_LINE: line over 80 characters
#301: FILE: lib/os/heap-validate.c:86:
+		for(c = c0; c != 0 && (n == 0 || c != c0); n++, c = free_next(h, c)) {

-:301: ERROR:SPACING: space required before the open parenthesis '('
#301: FILE: lib/os/heap-validate.c:86:
+		for(c = c0; c != 0 && (n == 0 || c != c0); n++, c = free_next(h, c)) {

-:353: ERROR:CODE_INDENT: code indent should use tabs where possible
#353: FILE: lib/os/heap-validate.c:138:
+ ^I * pass caught all the blocks and that they now show UNUSED.$

-:353: WARNING:SPACE_BEFORE_TAB: please, no space before tabs
#353: FILE: lib/os/heap-validate.c:138:
+ ^I * pass caught all the blocks and that they now show UNUSED.$

-:364: ERROR:SPACING: space required before the open parenthesis '('
#364: FILE: lib/os/heap-validate.c:149:
+		for(c = c0; n == 0 || c != c0; n++, c = free_next(h, c)) {

-:404: ERROR:CODE_INDENT: code indent should use tabs where possible
#404: FILE: lib/os/heap-validate.c:189:
+        static u64_t state = 123456789; /* seed */$

-:404: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#404: FILE: lib/os/heap-validate.c:189:
+        static u64_t state = 123456789; /* seed */$

-:406: ERROR:CODE_INDENT: code indent should use tabs where possible
#406: FILE: lib/os/heap-validate.c:191:
+        state = state * 2862933555777941757UL + 3037000493UL;$

-:406: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#406: FILE: lib/os/heap-validate.c:191:
+        state = state * 2862933555777941757UL + 3037000493UL;$

-:408: ERROR:CODE_INDENT: code indent should use tabs where possible
#408: FILE: lib/os/heap-validate.c:193:
+        return (u32_t)(state >> 32);$

-:408: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#408: FILE: lib/os/heap-validate.c:193:
+        return (u32_t)(state >> 32);$

-:650: ERROR:SPACING: space required before the open parenthesis '('
#650: FILE: lib/os/heap.c:122:
+	if(!last_chunk(h, c) && !used(h, right_chunk(h, c))) {

-:662: ERROR:SPACING: space required before the open parenthesis '('
#662: FILE: lib/os/heap.c:134:
+	if(c != h->chunk0 && !used(h, left_chunk(h, c))) {

-:712: WARNING:UNNECESSARY_ELSE: else is not generally useful after a break or return
#712: FILE: lib/os/heap.c:184:
+			return split_alloc(h, bi, sz);
+		} else {

-:821: WARNING:NEW_TYPEDEFS: do not add new typedefs
#821: FILE: lib/os/heap.h:48:
+typedef size_t chunkid_t;

-:859: ERROR:CODE_INDENT: code indent should use tabs where possible
#859: FILE: lib/os/heap.h:86:
+                             enum chunk_fields f, chunkid_t val)$

-:859: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#859: FILE: lib/os/heap.h:86:
+                             enum chunk_fields f, chunkid_t val)$

-:1126: WARNING:LONG_LINE: line over 80 characters
#1126: FILE: tests/lib/heap/src/main.c:161:
+	TC_PRINT("Testing maximally fragmented (%d byte) heap\n", SMALL_HEAP_SZ);

- total: 9 errors, 10 warnings, 1120 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

NOTE: Whitespace errors detected.
      You may wish to use scripts/cleanpatch or scripts/cleanfile

Your patch has style problems, please review.

NOTE: Ignored message types: AVOID_EXTERNS BRACES CONFIG_EXPERIMENTAL CONST_STRUCT DATE_TIME FILE_PATH_CHANGES MINMAX NETWORKING_BLOCK_COMMENT_STYLE PRINTK_WITHOUT_KERN_LEVEL SPLIT_STRING VOLATILE

NOTE: If any of the errors are false positives, please report
      them to the maintainers.



@npitre
Copy link
Collaborator

npitre commented Jul 18, 2019 via email

@npitre
Copy link
Collaborator

npitre commented Jul 18, 2019 via email

@andyross
Copy link
Contributor Author

Yeah, there's some space for optimization for sure. The "buf" and "buckets" pointers are in the struct still because it started out as a separate struct and moved into the heap memory late in the process.

With the precomputed size_mask, that was definitely a win when I looked at it originally, though that may have changed. I think the bigger problem there is that the bit is on the wrong field, the LEFT_SIZE field is used much more infrequently and should be the one polluted with extra ops. (The reason is that they both had high bits at one point in an abandoned misdesign).

Most of the rest of the notes make sense. Will address when I get back to this. Though the first priority here is to get the mem_pool wrapper work (in some form -- it's looking more complicated than I thought to make it switchable between backends) up.

(And honestly, the timer work in #17155 has a much better chance of landing for 2.0 and I really need to be doing that first. It was only the fear of endlessly codereviewing mem_pool bugs that scared me into getting this into code so soon...)

@npitre
Copy link
Collaborator

npitre commented Jul 18, 2019 via email

@npitre
Copy link
Collaborator

npitre commented Jul 18, 2019 via email

Copy link
Collaborator

@pizi-nordic pizi-nordic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize, but at the moment I have no chance to do detailed code review. Instead, I would like to point to some high-level aspects:

  • We are staring to see SoCs with data caches in the Zephyr. Since we are doing new allocator, we should include this aspect in the design. The minimum requirement here is to allocate chunks which are aligned to cache line size (usually 16, 32, or 64 bytes) both in address and size. Ideally such configuration should be done per heap, as this is necessary only for memory which will be later accessed through DMA.
  • IMHO we should create some debugging infrastructure which allow the users do detect common errors related to memory allocation. I suggest the following:
    • We should check if the pointer being freed is the pointer returned by sys_heap_alloc() and belongs to the same heap (this should be easy to do using pointer arithmetic, checking the alignment and data structure describing given chunk).
    • We should detect double-free scenarios.
    • We should add head and tail guard areas to check for access beyond allocated region.
      [ For those not familiar with the concept: The memory below and above allocated chunk is filled by distinctive pattern (like 0xDEADC0DE) which is checked periodically and/or during free. If the pattern is altered, you wrote out of the bonds. If the pattern propagated somewhere causing crash, you know that it origins from read beyond allocated area. ]
    • We should support data trashing to detect use-after-free scenarios.
      [ For those not familiar with the concept: The freed memory is filled with distinctive pattern (different than the one used in guards). If you see the pattern somewhere, you will know its origin. Also the pattern is verified periodically and/or during allocation, so a write to the freed region could be detected. ]
    • Ideally we should be able to trace allocation point of each chunk (file, line, function). Such information together with CLI interface for heap state inspection will be a great tool helping to fight with memory leaks. Moreover, the CLI interface dumping the state of the heap could be also helpful during development.

*
* @param h Heap from which to allocate
* @param bytes Number of bytes requested
* @return Pointer to memory the caller can now use
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it would be good to add information about alignment of allocated memory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normal malloc returns a pointer that is "suitably aligned for any kind of variable". I think we should to, otherwise there will be a problems

* @param free Callback to perform a free of a pointer returned from
* @a alloc. Passes back the @a arg parameter as a
* context handle.
* @param arg Context handle to pass back to the callbacks
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change arg to context.

@npitre
Copy link
Collaborator

npitre commented Jul 19, 2019 via email

bool emptybit = (h->avail_buckets & (1 << bidx)) == 0;
bool emptylist = b->next == 0;
bool emptycount = b->list_size == 0;
bool empties_match = emptybit == emptylist && emptybit == emptycount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parenthesis are not needed here, but very good to have. They will speak your true intentions over guessed ones in the future.

Copy link
Collaborator

@pdunaj pdunaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First comments. The biggest problem may be with locking. The former scheme assumed poll maintains the lock so can unlock irqs while in loops. Now we would have to lock entire operation on k_malloc level.

* sys_heap_alloc such that it can be used for other purposes. The
* caller must not use the memory region after entry to this function.
*
* @note The sys_heap implementation is not internally synchronized.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would mean lock should be taken by k_malloc

* running one and corrupting it. YMMV.
*/

static size_t max_chunkid(struct z_heap *h)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const struct z_heap *h when h is read-only?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also in other places

return ret;
}

static void free_list_remove(struct z_heap *h, int bidx,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_t bidx (or unsigned in) as this cannot be negative

b->list_size--;

if (b->list_size == 0) {
h->avail_buckets &= ~(1 << bidx);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BIT(bidx) instead of (1 << bidx)
or better
WRITE_BIT(h->avail_buckets, bidx, 0);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also in other places

{
int b = bucket_idx(h, size(h, c));

if (h->buckets[b].list_size++ == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works but mixing increment with other operators in an expression is very bug prone.

{
/* Must fit in a 32 bit count of u64's */
#if __SIZEOF_SIZE_T__ > 4
CHECK(bytes < 0x800000000ULL);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytes <= UINT32_MAX, bytes <= INT32_MAX?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 0x800000000ULL?

* Note: this cannot catch every possible error, but if it returns
* true then the heap is in a consistent state and can correctly
* handle any sys_heap_alloc() request and free any live pointer
* returned from a previou allocation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previou

* failures and a very fragmented heap.
* @param result Struct into which to store test results.
*/
void sys_heap_stress(void *(*alloc)(void *arg, size_t bytes),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if this is a good place to have that api, after all it is API for testing (same with sys_heap_validate). What about having sys_heap_validate.h or something similar and not spoil this header.


#define CHUNK_UNIT 8

enum chunk_fields { SIZE_AND_USED, LEFT_SIZE, FREE_PREV, FREE_NEXT };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some explanation here would be good.

return sizeof(size_t) > 4 || h->len > 0x7fff;
}

static inline size_t chunk_field(struct z_heap *h, chunkid_t c,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunk_field_get?

}
}

static inline void chunk_set(struct z_heap *h, chunkid_t c,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunk_field_set?

}
}

static void free_list_add(struct z_heap *h, chunkid_t c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function explanation is missing. From what it is doing i would rather call it add_to_bucket

return ret;
}

static void free_list_remove(struct z_heap *h, int bidx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment missing, also i would call it remove_from_backet.

return (chunk_field(h, c, SIZE_AND_USED) & ~h->size_mask) != 0;
}

static ALWAYS_INLINE chunkid_t size(struct z_heap *h, chunkid_t c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would like to see setters and getter for all those field. Now it is inconsistent. There are getters but fields are set using chunk_set which is a bit confusing in heap.c code.

free_list_add(h, c);
}

void *sys_heap_alloc(struct sys_heap *heap, size_t bytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if it would make sense to have loops as an argument instead of kconfig option. it would allow to have use wrapper like (alloc_from_isr with lower value and alloc which favors lower fragmentation over performance)

*/
int loops = MIN(b->list_size, CONFIG_SYS_HEAP_ALLOC_LOOPS);

for (int i = 0; i < loops; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, i don't fully understand why this loop is needed. Since you have mask avail_buckets then you need to pickup first bucket which has elements available and chunk size is equal/bigger than requested size. It seems that code below this for loop is doing that.

ok, no i see it. bucket contains chunks with size between 2^n and 2^(n+1) so there is a chance to find proper candidate. In that case maybe there should be an option to tell function to check all chunks in the smaller backet before going to the bigger bucket.

@carlescufi
Copy link
Member

@andyross can we resurrect this PR and get it in for 2.1?

@carlescufi carlescufi added the dev-review To be discussed in dev-review meeting label Sep 17, 2019
@galak galak added this to the v2.1.0 milestone Sep 19, 2019
@galak
Copy link
Collaborator

galak commented Sep 19, 2019

dev-review:

  • Question is how to progress Zephyr APIs w/regard to this and mempool.
    • do we have both and deprecate mempool
  • Requires more thought on @andyross side w/regards to how mempool is plumbed through other Zephyr APIs.
  • Will try to find a plan for 2.1. @andyross will look at it once timer stuff is dealt with.

@npitre
Copy link
Collaborator

npitre commented Sep 22, 2019 via email

@galak galak removed the dev-review To be discussed in dev-review meeting label Sep 26, 2019
@carlescufi
Copy link
Member

I have some cycles to spare at the moment. So if @andyross agrees I'd be happy to work on this and bring it forward.

We discussed this in a recent meeting and the idea was to try and move it forward as soon as possible, so +1 from me.

@npitre
Copy link
Collaborator

npitre commented Sep 30, 2019 via email

@andyross
Copy link
Contributor Author

I should be able to get back on this within a week or two, but no complaints if @npitre wants to play with it. I have some early work on an an alloc_aligned implementation that doesn't require changes to the heap internals (just finds a bigger block and fragments it). Also we need to add some latency tests to track ISR-friendly performance.

And the portability side is the hardest. I see three paths:

  1. Just use this as a new API and keep the mempool one around for existing users until we decide to deprecate it.

  2. Port only the IPC side of the APIs (the struct k_mem_block abstration) to sit on top of this for the benefit of our (somewhat oddball) APIs that use it.

  3. Provide a full implementation (or as full as possible) of sys_mem_pool and k_mem_pool using this as a backend. Not 100% doable because the old code has some tested promises on alignment and packing behavior that aren't true for generic heaps.

@npitre
Copy link
Collaborator

npitre commented Sep 30, 2019 via email

@andyross
Copy link
Contributor Author

IIRC there are tests that do things like check that sequential allocations from an empty heap are indeed split the way it expects from the original implementation. I don't personally think those are behaviors worth preserving, but we need to find them and call them out explicitly when removing.

@andyross
Copy link
Contributor Author

This got too stale. Resubmitted in #23941

@andyross andyross closed this Mar 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Changes to public APIs area: Tests Issues related to a particular existing or missing test DNM This PR should not be merged (Do Not Merge)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants