-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy tls #64
Lazy tls #64
Conversation
Copying an idea from mimalloc, initialise the TLS variable to a global allocator that doesn't own any memory and then lazily check when we hit a slow path (which we always do when using the global allocator, because it doesn't own any memory) if we are the global allocator and replace it. There is a slight complication compared to mimalloc's version of this idea. Snmalloc collects outgoing messages and it's possible for the first operation in a thread to be a free of memory allocated by a different thread. We address this by initialising the queues with a size value indicating that they are full and then do the lazy check when about to insert a message that would make a queue full. This will then trigger lazy creation of an allocator. Global initialisation doesn't work for the fake allocator, so skip most of its constructor.
By caching the result of the first call to ThreadAlloc::get(), we were always hitting a code path that should be hit once per thread in normal operation.
We were passing an argument less than 4K to the MAP_ALIGNED macro, which caused an undefined shift. The compiler helpfully propagated the undef values back to earlier in the code and gave us some exciting nonsense.
Most compilers are happy if you say always-inline but they can't. GCC will complain. Here, we have two mutually recursive functions that are marked as always inline. In an optimised build, one is inlined into the other and then becomes a tail-recursive function that should inline the tail call. Inlining the tail call can be done by simply jumping to the start of the function and so everything is fine. In a debug build, the second transform doesn't happen and so we're left with a call to an always-inline function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
size_t sz = sizeclass_to_size(sizeclass); | ||
if ((remote.size + sz) < REMOTE_CACHE) | ||
{ | ||
stats().remote_free(sizeclass); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add assert that we are not fake at this point!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps put something similar in other places where we don't expect to see the fake allocator.
This is needed because in some configurations the constructor for the global placeholder is not called before the first allocation (i.e. when other globals call the allocator in their constructor) and so we ended up following a null pointer.
Fixes an issue where the global placeholder allocator was being released.
Lazily initialise the TLS pointer on slow paths, so that we can avoid a check that it's initialised on the fast paths. We now statically set the TLS pointer to the address of a global allocator, which does not own any memory. When we try to allocate from it, we hit a slow path and there we check if it's actually the global allocator (in which case we allocate a real allocator and store a pointer to it in TLS).
This trims one branch from the fast path in
malloc
.