Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy tls #64

Merged
merged 11 commits into from
Jul 8, 2019
Merged

Lazy tls #64

merged 11 commits into from
Jul 8, 2019

Conversation

davidchisnall
Copy link
Collaborator

Lazily initialise the TLS pointer on slow paths, so that we can avoid a check that it's initialised on the fast paths. We now statically set the TLS pointer to the address of a global allocator, which does not own any memory. When we try to allocate from it, we hit a slow path and there we check if it's actually the global allocator (in which case we allocate a real allocator and store a pointer to it in TLS).

This trims one branch from the fast path in malloc.

Copying an idea from mimalloc, initialise the TLS variable to a global
allocator that doesn't own any memory and then lazily check when we hit
a slow path (which we always do when using the global allocator, because
it doesn't own any memory) if we are the global allocator and replace
it.

There is a slight complication compared to mimalloc's version of this
idea.  Snmalloc collects outgoing messages and it's possible for the
first operation in a thread to be a free of memory allocated by a
different thread.  We address this by initialising the queues with a
size value indicating that they are full and then do the lazy check when
about to insert a message that would make a queue full.  This will then
trigger lazy creation of an allocator.

Global initialisation doesn't work for the fake allocator, so skip most
of its constructor.
By caching the result of the first call to ThreadAlloc::get(), we were
always hitting a code path that should be hit once per thread in normal
operation.
We were passing an argument less than 4K to the MAP_ALIGNED macro, which
caused an undefined shift.  The compiler helpfully propagated the undef
values back to earlier in the code and gave us some exciting nonsense.
sylvanc
sylvanc previously approved these changes Jul 5, 2019
Most compilers are happy if you say always-inline but they can't.  GCC
will complain.  Here, we have two mutually recursive functions that are
marked as always inline.  In an optimised build, one is inlined into the
other and then becomes a tail-recursive function that should inline the
tail call.  Inlining the tail call can be done by simply jumping to the
start of the function and so everything is fine.  In a debug build, the
second transform doesn't happen and so we're left with a call to an
always-inline function.
mjp41
mjp41 previously approved these changes Jul 5, 2019
Copy link
Member

@mjp41 mjp41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

size_t sz = sizeclass_to_size(sizeclass);
if ((remote.size + sz) < REMOTE_CACHE)
{
stats().remote_free(sizeclass);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add assert that we are not fake at this point!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps put something similar in other places where we don't expect to see the fake allocator.

@davidchisnall davidchisnall dismissed stale reviews from mjp41 and sylvanc via 50695d0 July 5, 2019 10:41
sylvanc
sylvanc previously approved these changes Jul 5, 2019
This is needed because in some configurations the constructor for the
global placeholder is not called before the first allocation (i.e. when
other globals call the allocator in their constructor) and so we ended
up following a null pointer.
mjp41
mjp41 previously approved these changes Jul 5, 2019
Fixes an issue where the global placeholder allocator was being
released.
@davidchisnall davidchisnall merged commit 97bfa68 into master Jul 8, 2019
@mjp41 mjp41 mentioned this pull request Jul 9, 2019
@mjp41 mjp41 deleted the lazy_tls branch July 9, 2019 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants