sys_heap: a new/simpler/faster memory allocator #17628

andyross · 2019-07-17T21:01:39Z

This is the new heap implementation and test rig I've been talking about. See commit logs and comments for design notes.

Specifically note that the heap code itself is dirt simple (mostly, once you get your head around the field accessor API), and that more than half the added code here is in the validation and test rigs. @andrewboie will be happy to see that coverage on the new files is effectively 100% (there's some cleanup to do with some of the validation lines, mostly "return false;", which obviously don't fail and don't get hit).

It passes everywhere right now except for riscv32 and nios2 (no idea yet, haven't had a chance to look).

This is just the new code. I still need to finish up wrapper APIs for sys/k_mem_pool before this can go in. I was hoping to have that done today, but it might stretch another day. And even then we'll likely leave both implementations in place and make them switchable via kconfig for a version.

On top of those, I should also turn the existing stress rig into a proper benchmark and add that so we can track performance with a real usage simulator.

The existing mem_pool implementation has been an endless source of frustration. It's had alignment bugs, it's had racy behavior. It's never been particularly fast. It's outrageously complicated to configure statically. And while its fragmentation resistance and overhead on small blocks is good, it's space efficiencey has always been very poor due to the four-way buddy scheme. This patch introduces sys_heap. It's a more or less conventional segregated fit allocator with power-of-two buckets. It doesn't expose its level structure to the user at all, simply taking an arbitrarily aligned pointer to memory. It stores all metadata inside the heap region. It allocates and frees by simple pointer and not block ID. Static initialization is trivial, and runtime initialization is only a few cycles to format and add one block to a list header. It has excellent space efficiency. Chunks can be split arbitrarily in 8 byte units. Overhead is only four bytes per allocated chunk (eight bytes for heaps >256kb or on 64 bit systems), plus a log2-sized array of 2-word bucket headers. No coarse alignment restrictions on blocks, they can be split and merged (in units of 8 bytes) arbitrarily. It has good fragmentation resistance. Freed blocks are always immediately merged with adjacent free blocks. Allocations are attempted from a sample of the smallest bucket that might fit, falling back rapidly to the smallest block guaranteed to fit. Split memory remaining in the chunk is always returned immediately to the heap for other allocation. It has excellent performance with firmly bounded runtime. All operations are constant time (though there is a search of the smallest bucket that has a compile-time-configurable upper bound, setting this to extreme values results in an effectively linear search of the list), objectively fast (about a hundred instructions) and amenable to locked operation. No more need for fragile lock relaxation trickery. It also contains an extensive validation and stress test framework, something that was sorely lacking in the previous implementation. Note that sys_heap is not a compatible API with sys_mem_pool and k_mem_pool. Wrappers for those (now-) legacy APIs appear later in this patch series. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>

Use the white box validation and test rig added as part of the sys_heap work. Add a layer that puts hashed cookies into the blocks to detect corruption, check the validity state after every operation, and enumerate a few different usage patterns: + Small heap, "real world" allocation where the heap is about half full and most allocations succeed. + Small heap, "fragmentation runaway" scenario where most allocations start failing, but the heap must remain consistent. + Big heap. We can't test this with the same exhaustive coverage (many re/allocations for every byte of storage) for performance reasons, but we do what we can. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>

andyross · 2019-07-17T21:04:31Z

include/sys/sys_heap.h

+ */
+
+/* Note: the init_mem/bytes fields are for the static initializer to
+ * have somewhere to put the arguments.  The actual heap metadata at


Hm... forgot to remove this bit. Ignore this comment and those fields. I yanked the static initializer from this version entirely because the semantics of SYS_MEM_POOL_DEFINE is a little too weird to wrap anyway.

zephyrbot · 2019-07-17T21:08:09Z

Found the following issues, please fix and resubmit:

checkpatch issues

-:133: WARNING:TYPO_SPELLING: 'successfull' may be misspelled - perhaps 'successful'?
#133: FILE: include/sys/sys_heap.h:127:
+ * Results, including counts of frees and successfull/unsuccessful

-:301: WARNING:LONG_LINE: line over 80 characters
#301: FILE: lib/os/heap-validate.c:86:
+		for(c = c0; c != 0 && (n == 0 || c != c0); n++, c = free_next(h, c)) {

-:301: ERROR:SPACING: space required before the open parenthesis '('
#301: FILE: lib/os/heap-validate.c:86:
+		for(c = c0; c != 0 && (n == 0 || c != c0); n++, c = free_next(h, c)) {

-:353: ERROR:CODE_INDENT: code indent should use tabs where possible
#353: FILE: lib/os/heap-validate.c:138:
+ ^I * pass caught all the blocks and that they now show UNUSED.$

-:353: WARNING:SPACE_BEFORE_TAB: please, no space before tabs
#353: FILE: lib/os/heap-validate.c:138:
+ ^I * pass caught all the blocks and that they now show UNUSED.$

-:364: ERROR:SPACING: space required before the open parenthesis '('
#364: FILE: lib/os/heap-validate.c:149:
+		for(c = c0; n == 0 || c != c0; n++, c = free_next(h, c)) {

-:404: ERROR:CODE_INDENT: code indent should use tabs where possible
#404: FILE: lib/os/heap-validate.c:189:
+        static u64_t state = 123456789; /* seed */$

-:404: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#404: FILE: lib/os/heap-validate.c:189:
+        static u64_t state = 123456789; /* seed */$

-:406: ERROR:CODE_INDENT: code indent should use tabs where possible
#406: FILE: lib/os/heap-validate.c:191:
+        state = state * 2862933555777941757UL + 3037000493UL;$

-:406: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#406: FILE: lib/os/heap-validate.c:191:
+        state = state * 2862933555777941757UL + 3037000493UL;$

-:408: ERROR:CODE_INDENT: code indent should use tabs where possible
#408: FILE: lib/os/heap-validate.c:193:
+        return (u32_t)(state >> 32);$

-:408: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#408: FILE: lib/os/heap-validate.c:193:
+        return (u32_t)(state >> 32);$

-:650: ERROR:SPACING: space required before the open parenthesis '('
#650: FILE: lib/os/heap.c:122:
+	if(!last_chunk(h, c) && !used(h, right_chunk(h, c))) {

-:662: ERROR:SPACING: space required before the open parenthesis '('
#662: FILE: lib/os/heap.c:134:
+	if(c != h->chunk0 && !used(h, left_chunk(h, c))) {

-:712: WARNING:UNNECESSARY_ELSE: else is not generally useful after a break or return
#712: FILE: lib/os/heap.c:184:
+			return split_alloc(h, bi, sz);
+		} else {

-:821: WARNING:NEW_TYPEDEFS: do not add new typedefs
#821: FILE: lib/os/heap.h:48:
+typedef size_t chunkid_t;

-:859: ERROR:CODE_INDENT: code indent should use tabs where possible
#859: FILE: lib/os/heap.h:86:
+                             enum chunk_fields f, chunkid_t val)$

-:859: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#859: FILE: lib/os/heap.h:86:
+                             enum chunk_fields f, chunkid_t val)$

-:1126: WARNING:LONG_LINE: line over 80 characters
#1126: FILE: tests/lib/heap/src/main.c:161:
+	TC_PRINT("Testing maximally fragmented (%d byte) heap\n", SMALL_HEAP_SZ);

- total: 9 errors, 10 warnings, 1120 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

NOTE: Whitespace errors detected.
      You may wish to use scripts/cleanpatch or scripts/cleanfile

Your patch has style problems, please review.

NOTE: Ignored message types: AVOID_EXTERNS BRACES CONFIG_EXPERIMENTAL CONST_STRUCT DATE_TIME FILE_PATH_CHANGES MINMAX NETWORKING_BLOCK_COMMENT_STYLE PRINTK_WITHOUT_KERN_LEVEL SPLIT_STRING VOLATILE

NOTE: If any of the errors are false positives, please report
      them to the maintainers.

npitre · 2019-07-18T01:25:01Z

OK. I have a truckload of comments but overall I really like it. Let's see how I can convey that to github.

npitre · 2019-07-18T04:30:59Z

REview part 1. lib/os/heap.h: ``` struct z_heap { u64_t *buf; struct z_heap_bucket *buckets; u32_t len; u32_t size_mask; u32_t chunk0; u32_t avail_buckets; }; ``` The buf field is always equivalent to the address of this structure instance and is redundant. You could do: ``` static inline u64_t heap_buf(struct z_heap *h) { return (u64_t *)h; } ``` and spare one pointer. Also `struct z_heap_bucket *buckets` could be moved last and turned into a zero-length array which would save another pointer. There is no need to chunk-align it. You may also get rid of size_mask. I suggest that you do: ``` static inline chunkid_t used(struct z_heap *h, chunkid_t c) { return (chunk_field(h, c, SIZE_AND_USED) & 1) != 0; } static ALWAYS_INLINE chunkid_t size(struct z_heap *h, chunkid_t c) { return chunk_field(h, c, SIZE_AND_USED) >> 1; } static inline void chunk_set_used(struct z_heap *h, chunkid_t c, bool used) { chunk_set(h, c, SIZE_AND_USED, ((size(h, c) << 1) | (used ? 1 : 0)); } ``` This will dispense with size_mask and generate simpler code. The `ALWAYS_INLINE` might then become unnecessary. Then, to avoid mistakes, the setting of thesize should use an accessor too. Also, `used()` should return a bool. ``` static inline bool big_heap(struct z_heap *h) { return sizeof(size_t) > 4 || h->len > 0x7fff; } ``` Here `sizeof(size_t) > 4` is wrong. That should be `sizeof(void *) > 4` as there is no guarantee that size_t is always 64-bit wide on 64-bit targets. Would also be a good idea to comment that this is used to enforce 64-bit alignment on returned memory. As an optimization, it might be a good idea to add a Kconfig option for allowing heaps larger than 0x7fff on 32-bit targets (with a default to y). When unset, `big_heap()` would always return false and the conditional code in `chunk_field()` and `chunk_set()` would get optimized away at compile time. No need to make such Kconfig prompt visible on 64-bit as `big_heap()` is always true in that case. I'd suggest renaming `chunk_field()` to `chunk_get()` so to be analogous to `chunk_get()`. Maybe `size()` and `left_size()` should return `size_t` not `chunkid_t`. In `chunksz()` I'd suggest using `ROUND_UP(bytes, CHUNK_UNIT)` or even use that directly in the code in place of `chunksz()` to make the code more obvious. About `bucket_idx()`: there is a `u32_count_leading_zero()` provided by math_extras_impl.h that probably ought to be used instead of `__builtin_clz()`. Same deal for the `__builtin_ctz()` used elsewhere. That's it for now. Part 2 to come later.

andyross · 2019-07-18T17:36:49Z

Yeah, there's some space for optimization for sure. The "buf" and "buckets" pointers are in the struct still because it started out as a separate struct and moved into the heap memory late in the process.

With the precomputed size_mask, that was definitely a win when I looked at it originally, though that may have changed. I think the bigger problem there is that the bit is on the wrong field, the LEFT_SIZE field is used much more infrequently and should be the one polluted with extra ops. (The reason is that they both had high bits at one point in an abandoned misdesign).

Most of the rest of the notes make sense. Will address when I get back to this. Though the first priority here is to get the mem_pool wrapper work (in some form -- it's looking more complicated than I thought to make it switchable between backends) up.

(And honestly, the timer work in #17155 has a much better chance of landing for 2.0 and I really need to be doing that first. It was only the fear of endlessly codereviewing mem_pool bugs that scared me into getting this into code so soon...)

npitre · 2019-07-18T20:15:13Z

Review part 2. lib/os/heap.c: ``` static void *chunk_mem(struct z_heap *h, chunkid_t c) { u8_t *ret = ((u8_t *)&h->buf[c]) + chunk_header_bytes(h); CHECK(!(((size_t)ret) & (big_heap(h) ? 7 : 3))); return ret; } ``` That `size_t` should be `uintptr_t`. ``` static void free_list_remove(struct z_heap *h, int bidx, chunkid_t c) { struct z_heap_bucket *b = &h->buckets[bidx]; CHECK(!used(h, c)); CHECK(b->next != 0); CHECK(b->list_size > 0); CHECK((((h->avail_buckets & (1 << bidx)) == 0) == (h->buckets[bidx].next == 0))); ``` The 4th check looks wrong. The `h->buckets[bidx].next` is the same as `b->next` which was just tested not to be 0. So that should probably be just `CHECK(h->avail_buckets & (1 << bidx));`. In `free_list_add()` the last check is: ``` CHECK(h->avail_buckets & (1 << bucket_idx(h, size(h, c)))); ``` The `bucket_idx()` is quite costly on CPUs that lack a CLZ instruction. Would be better to simply use `b` here not to make validation checks more costly than necessary. Also, `b` is used as a `struct z_heap_bucket *` in other functions, so that would make the code clearer if `b` was renamed to `bi` or `bidx` here. ``` static ALWAYS_INLINE bool last_chunk(struct z_heap *h, chunkid_t c) { return (c + size(h, c)) == h->len; } ``` I'd suggest adding a `CHECK(c + size(h, c) <= h->len)` here. In `split_alloc()` there is: ``` if (rem >= (big_heap(h) ? 2 : 1)) { ``` That's the second time those magic 2 and 1 values are used, this time with no justification. Would be best to abstract that into some `min_chunksize()` or the like. The open coded `c != h->chunk0` could be abstracted into some `!first_chunk(h, c)` for symetry. In `sys_heap_alloc()`: ``` for (int i = 0; i < loops; i++) { CHECK(b->next != 0); if (size(h, b->next) >= sz) { return split_alloc(h, bi, sz); } else { b->next = free_next(h, b->next); } } ``` It might be a good idea to remember where the loop started and break early when there are fewer chunks than the loop count. And perhaps remember the best fit as well given the loop count is quite small. ``` void sys_heap_init(struct sys_heap *heap, void *mem, size_t bytes) { /* Must fit in a 32 bit count of u64's */ #if __SIZEOF_SIZE_T__ > 4 CHECK(bytes < 0x800000000ULL); #endif ``` Might be best to use `if (sizeof(size_t) > 4)` here to be more "standard" code. The compiler will optimize it away on 32-bit targets anyway, but it'll validate the syntax in all cases. The rest of the function has potential for simplifications following comments in my part 1.

npitre · 2019-07-18T21:11:01Z

On Thu, 18 Jul 2019, Andy Ross wrote: With the precomputed size_mask, that was definitely a win when I looked at it originally, though that may have changed. I think the bigger problem there is that the bit is on the wrong field, the LEFT_SIZE field is used much more infrequently and should be the one polluted with extra ops. (The reason is that they both had high bits at one point in an abandoned misdesign).

Agreed on using LEFT_SIZE. However it is more efficient if bits are to be found at a constant offset which is achieved by moving the flag bit down.

Most of the rest of the notes make sense. Will address when I get back to this. Though the first priority here is to get the mem_pool wrapper work (in some form -- it's looking more complicated than I thought to make it switchable between backends) up.

What about something like: ``` #define K_MEM_HEAP_DEFINE(name, sz) \ char __aligned(CHUNK_UNIT) _heap_buf_##name[ROUND_UP(sz, CHUNK_UNIT)]; \ struct z_heap name ALIAS_OF(_heap_buf_##name); \ Z_STRUCT_SECTION_ITERABLE(k_heap_init, _heap_init_##name) = { \ .mem = _heap_buf_##name, \ .size = ROUND_UP(sz, CHUNK_UNIT), \ } #define K_MEM_POOL_DEFINE(name, minsz, maxsz, nmax, align) \ __K_MEM_POOL_HEAP_DEFINE(name, ROUND_UP(maxsz, CHUNK_UNIT) * (nmax)) #define __K_MEM_POOL_HEAP_DEFINE(name size) \ K_MEM_HEAP_DEFINE(name, \ (size) + sizeof(struct z_heap) + HEAP_BUCKET_SIZE(size)) #define HEAP_BUCKET_SIZE(size) \ (const_ilog2((8 + (size)) / CHUNK_UNIT) * sizeof(struct z_heap_bucket)) #define const_ilog2(x) \ ( \ (x) < 2 ? 0 : \ (x) < 2 ? 0 : \ (x) < 4 ? 1 : \ (x) < 8 ? 2 : \ [...] ``` Then you only need to initialize the heap at run time.

(And honestly, the timer work in #17155 has a much better chance of landing for 2.0 and I really need to be doing that first. It was only the fear of endlessly codereviewing mem_pool bugs that scared me into getting this into code so soon...)

Hey, I reviewed and identified all the remaining bugs in that code for you. So you shouldn't worry that much. ;-)

pizi-nordic

I apologize, but at the moment I have no chance to do detailed code review. Instead, I would like to point to some high-level aspects:

We are staring to see SoCs with data caches in the Zephyr. Since we are doing new allocator, we should include this aspect in the design. The minimum requirement here is to allocate chunks which are aligned to cache line size (usually 16, 32, or 64 bytes) both in address and size. Ideally such configuration should be done per heap, as this is necessary only for memory which will be later accessed through DMA.

IMHO we should create some debugging infrastructure which allow the users do detect common errors related to memory allocation. I suggest the following:
- We should check if the pointer being freed is the pointer returned by sys_heap_alloc() and belongs to the same heap (this should be easy to do using pointer arithmetic, checking the alignment and data structure describing given chunk).
- We should detect double-free scenarios.
- We should add head and tail guard areas to check for access beyond allocated region.
  [ For those not familiar with the concept: The memory below and above allocated chunk is filled by distinctive pattern (like 0xDEADC0DE) which is checked periodically and/or during free. If the pattern is altered, you wrote out of the bonds. If the pattern propagated somewhere causing crash, you know that it origins from read beyond allocated area. ]
- We should support data trashing to detect use-after-free scenarios.
  [ For those not familiar with the concept: The freed memory is filled with distinctive pattern (different than the one used in guards). If you see the pattern somewhere, you will know its origin. Also the pattern is verified periodically and/or during allocation, so a write to the freed region could be detected. ]
- Ideally we should be able to trace allocation point of each chunk (file, line, function). Such information together with CLI interface for heap state inspection will be a great tool helping to fight with memory leaks. Moreover, the CLI interface dumping the state of the heap could be also helpful during development.

pizi-nordic · 2019-07-19T11:30:01Z

include/sys/sys_heap.h

+ *
+ * @param h Heap from which to allocate
+ * @param bytes Number of bytes requested
+ * @return Pointer to memory the caller can now use


IMHO it would be good to add information about alignment of allocated memory.

normal malloc returns a pointer that is "suitably aligned for any kind of variable". I think we should to, otherwise there will be a problems

pizi-nordic · 2019-07-19T11:53:58Z

include/sys/sys_heap.h

+ * @param free Callback to perform a free of a pointer returned from
+ *             @a alloc.  Passes back the @a arg parameter as a
+ *             context handle.
+ * @param arg Context handle to pass back to the callbacks


I would change arg to context.

npitre · 2019-07-19T13:31:47Z

On Fri, 19 Jul 2019, Piotr Zięcik wrote: - We are staring to see SoCs with data caches in the Zephyr. Since we are doing new allocator, we should include this aspect in the design. The minimum requirement here is to allocate chunks which are aligned to cache line size (usually 16, 32, or 64 bytes) both in address and size. Ideally such configuration should be done per heap, as this is necessary only for memory which will be later accessed through DMA.

That'll be very easy to do. Right now this is hardcoded to CHUNK_UNIT but that can be made into a variable later. The debugging infrastructure you suggest can also be added on top of this design without major issues. Guard patterns can be factored in by `chunk_header_bytes()` etc. But as there is no such facility in the existing allocator at the moment anyway, I'd suggest this be done separately.

psidhu · 2019-07-19T17:53:40Z

lib/os/heap-validate.c

+	bool emptybit = (h->avail_buckets & (1 << bidx)) == 0;
+	bool emptylist = b->next == 0;
+	bool emptycount = b->list_size == 0;
+	bool empties_match = emptybit == emptylist && emptybit == emptycount;


Parenthesis are not needed here, but very good to have. They will speak your true intentions over guessed ones in the future.

pdunaj

First comments. The biggest problem may be with locking. The former scheme assumed poll maintains the lock so can unlock irqs while in loops. Now we would have to lock entire operation on k_malloc level.

pdunaj · 2019-07-26T09:36:21Z

include/sys/sys_heap.h

+ * sys_heap_alloc such that it can be used for other purposes.  The
+ * caller must not use the memory region after entry to this function.
+ *
+ * @note The sys_heap implementation is not internally synchronized.


that would mean lock should be taken by k_malloc

pdunaj · 2019-07-26T09:39:02Z

lib/os/heap-validate.c

+ * running one and corrupting it. YMMV.
+ */
+
+static size_t max_chunkid(struct z_heap *h)


const struct z_heap *h when h is read-only?

also in other places

pdunaj · 2019-07-26T09:45:58Z

lib/os/heap.c

+	return ret;
+}
+
+static void free_list_remove(struct z_heap *h, int bidx,


size_t bidx (or unsigned in) as this cannot be negative

pdunaj · 2019-07-26T09:49:10Z

lib/os/heap.c

+	b->list_size--;
+
+	if (b->list_size == 0) {
+		h->avail_buckets &= ~(1 << bidx);


BIT(bidx) instead of (1 << bidx)
or better
WRITE_BIT(h->avail_buckets, bidx, 0);

also in other places

pdunaj · 2019-07-26T09:53:23Z

lib/os/heap.c

+{
+	int b = bucket_idx(h, size(h, c));
+
+	if (h->buckets[b].list_size++ == 0) {


It works but mixing increment with other operators in an expression is very bug prone.

pdunaj · 2019-07-26T10:32:08Z

lib/os/heap.c

+{
+	/* Must fit in a 32 bit count of u64's */
+#if __SIZEOF_SIZE_T__ > 4
+	CHECK(bytes < 0x800000000ULL);


bytes <= UINT32_MAX, bytes <= INT32_MAX?

why 0x800000000ULL?

nordic-krch · 2019-08-12T09:56:52Z

include/sys/sys_heap.h

+ * Note: this cannot catch every possible error, but if it returns
+ * true then the heap is in a consistent state and can correctly
+ * handle any sys_heap_alloc() request and free any live pointer
+ * returned from a previou allocation.


nordic-krch · 2019-08-12T10:00:20Z

include/sys/sys_heap.h

+ *                       failures and a very fragmented heap.
+ * @param result Struct into which to store test results.
+ */
+void sys_heap_stress(void *(*alloc)(void *arg, size_t bytes),


i wonder if this is a good place to have that api, after all it is API for testing (same with sys_heap_validate). What about having sys_heap_validate.h or something similar and not spoil this header.

nordic-krch · 2019-08-12T10:14:05Z

lib/os/heap.h

+
+#define CHUNK_UNIT 8
+
+enum chunk_fields { SIZE_AND_USED, LEFT_SIZE, FREE_PREV, FREE_NEXT };


some explanation here would be good.

nordic-krch · 2019-08-12T10:15:06Z

lib/os/heap.h

+	return sizeof(size_t) > 4 || h->len > 0x7fff;
+}
+
+static inline size_t chunk_field(struct z_heap *h, chunkid_t c,


chunk_field_get?

nordic-krch · 2019-08-12T10:15:21Z

lib/os/heap.h

+	}
+}
+
+static inline void chunk_set(struct z_heap *h, chunkid_t c,


chunk_field_set?

nordic-krch · 2019-08-12T11:29:22Z

lib/os/heap.c

+	}
+}
+
+static void free_list_add(struct z_heap *h, chunkid_t c)


function explanation is missing. From what it is doing i would rather call it add_to_bucket

nordic-krch · 2019-08-12T11:30:09Z

lib/os/heap.c

+	return ret;
+}
+
+static void free_list_remove(struct z_heap *h, int bidx,


comment missing, also i would call it remove_from_backet.

nordic-krch · 2019-08-12T11:34:11Z

lib/os/heap.h

+	return (chunk_field(h, c, SIZE_AND_USED) & ~h->size_mask) != 0;
+}
+
+static ALWAYS_INLINE chunkid_t size(struct z_heap *h, chunkid_t c)


i would like to see setters and getter for all those field. Now it is inconsistent. There are getters but fields are set using chunk_set which is a bit confusing in heap.c code.

nordic-krch · 2019-08-12T11:43:16Z

lib/os/heap.c

+	free_list_add(h, c);
+}
+
+void *sys_heap_alloc(struct sys_heap *heap, size_t bytes)


i wonder if it would make sense to have loops as an argument instead of kconfig option. it would allow to have use wrapper like (alloc_from_isr with lower value and alloc which favors lower fragmentation over performance)

nordic-krch · 2019-08-12T11:56:19Z

lib/os/heap.c

+	 */
+	int loops = MIN(b->list_size, CONFIG_SYS_HEAP_ALLOC_LOOPS);
+
+	for (int i = 0; i < loops; i++) {


actually, i don't fully understand why this loop is needed. Since you have mask avail_buckets then you need to pickup first bucket which has elements available and chunk size is equal/bigger than requested size. It seems that code below this for loop is doing that.

ok, no i see it. bucket contains chunks with size between 2^n and 2^(n+1) so there is a chance to find proper candidate. In that case maybe there should be an option to tell function to check all chunks in the smaller backet before going to the bigger bucket.

carlescufi · 2019-09-16T12:35:41Z

@andyross can we resurrect this PR and get it in for 2.1?

galak · 2019-09-19T15:41:39Z

dev-review:

Question is how to progress Zephyr APIs w/regard to this and mempool.
- do we have both and deprecate mempool
Requires more thought on @andyross side w/regards to how mempool is plumbed through other Zephyr APIs.
Will try to find a plan for 2.1. @andyross will look at it once timer stuff is dealt with.

npitre · 2019-09-22T21:57:55Z

I have some cycles to spare at the moment. So if @andyross agrees I'd be happy to work on this and bring it forward.

carlescufi · 2019-09-30T09:46:36Z

I have some cycles to spare at the moment. So if @andyross agrees I'd be happy to work on this and bring it forward.

We discussed this in a recent meeting and the idea was to try and move it forward as soon as possible, so +1 from me.

npitre · 2019-09-30T13:55:19Z

On Mon, 30 Sep 2019, Carles Cufí wrote: We discussed this in a recent meeting and the idea was to try and move it forward as soon as possible, so +1 from me.

I went ahead with it. I'll submit patches this week.

andyross · 2019-09-30T16:16:18Z

I should be able to get back on this within a week or two, but no complaints if @npitre wants to play with it. I have some early work on an an alloc_aligned implementation that doesn't require changes to the heap internals (just finds a bigger block and fragments it). Also we need to add some latency tests to track ISR-friendly performance.

And the portability side is the hardest. I see three paths:

Just use this as a new API and keep the mempool one around for existing users until we decide to deprecate it.
Port only the IPC side of the APIs (the struct k_mem_block abstration) to sit on top of this for the benefit of our (somewhat oddball) APIs that use it.
Provide a full implementation (or as full as possible) of sys_mem_pool and k_mem_pool using this as a backend. Not 100% doable because the old code has some tested promises on alignment and packing behavior that aren't true for generic heaps.

npitre · 2019-09-30T16:40:33Z

On Mon, 30 Sep 2019, Andy Ross wrote: I have some early work on an an alloc_aligned implementation that doesn't require changes to the heap internals (just finds a bigger block and fragments it).

Good. That's also what I had in mind but didn't get around to implementing it.

And the portability side is the hardest. I see three paths: 1. Just use this as a new API and keep the mempool one around for existing users until we decide to deprecate it.

Long (or maybe not even so long) term this is certainly the way to go if the new allocator proves to be superior.

2. Port only the IPC side of the APIs (the struct k_mem_block abstration) to sit on top of this for the benefit of our (somewhat oddball) APIs that use it. 3. Provide a full implementation (or as full as possible) of sys_mem_pool and k_mem_pool using this as a backend. Not 100% doable because the old code has some tested promises on alignment and packing behavior that aren't true for generic heaps.

I implemented 2 and 3 at this point, although I still have to test it. Alignment is simply ignored for now. I still have to look if some user actually cares. When alignment is important then the slab allocator is usually more suitable.

andyross · 2019-09-30T16:41:59Z

IIRC there are tests that do things like check that sequential allocations from an empty heap are indeed split the way it expects from the original implementation. I don't personally think those are behaviors worth preserving, but we need to find them and call them out explicitly when removing.

andyross · 2020-03-30T19:27:51Z

This got too stale. Resubmitted in #23941

Andy Ross added 2 commits July 17, 2019 13:39

andyross requested review from andrewboie, carlescufi, galak, MaureenHelm and nashif as code owners July 17, 2019 21:01

andyross requested review from pdunaj and npitre July 17, 2019 21:02

andyross added the DNM This PR should not be merged (Do Not Merge) label Jul 17, 2019

andyross commented Jul 17, 2019

View reviewed changes

zephyrbot added area: API Changes to public APIs area: Tests Issues related to a particular existing or missing test labels Jul 17, 2019

carlescufi requested review from nordic-krch, pizi-nordic and pabigot July 18, 2019 16:52

pizi-nordic reviewed Jul 19, 2019

View reviewed changes

psidhu reviewed Jul 19, 2019

View reviewed changes

pdunaj suggested changes Jul 26, 2019

View reviewed changes

nordic-krch requested changes Aug 12, 2019

View reviewed changes

rlubos mentioned this pull request Sep 16, 2019

samples: nrf9160: Increase heap size in various samples nrfconnect/sdk-nrf#1127

Closed

carlescufi added the dev-review To be discussed in dev-review meeting label Sep 17, 2019

galak added this to the v2.1.0 milestone Sep 19, 2019

galak removed the dev-review To be discussed in dev-review meeting label Sep 26, 2019

npitre mentioned this pull request Oct 4, 2019

my work on "sys_heap: a new/simpler/faster memory allocator" #19602

Closed

dleach02 removed this from the v2.1.0 milestone Dec 3, 2019

andyross closed this Mar 30, 2020


		#define CHUNK_UNIT 8

		enum chunk_fields { SIZE_AND_USED, LEFT_SIZE, FREE_PREV, FREE_NEXT };

sys_heap: a new/simpler/faster memory allocator #17628

sys_heap: a new/simpler/faster memory allocator #17628

Conversation

andyross commented Jul 17, 2019

Choose a reason for hiding this comment

zephyrbot commented Jul 17, 2019

checkpatch issues

npitre commented Jul 18, 2019 via email

npitre commented Jul 18, 2019 via email

andyross commented Jul 18, 2019

npitre commented Jul 18, 2019 via email

npitre commented Jul 18, 2019 via email

pizi-nordic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

npitre commented Jul 19, 2019 via email

Choose a reason for hiding this comment

pdunaj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlescufi commented Sep 16, 2019

galak commented Sep 19, 2019

npitre commented Sep 22, 2019 via email

carlescufi commented Sep 30, 2019

npitre commented Sep 30, 2019 via email

andyross commented Sep 30, 2019

npitre commented Sep 30, 2019 via email

andyross commented Sep 30, 2019

andyross commented Mar 30, 2020