Skip to content

Commit

Permalink
Merge to master for 1.4.2
Browse files Browse the repository at this point in the history
  • Loading branch information
mjansson authored Apr 25, 2021
2 parents 21c3f8b + 7990f8e commit b55d218
Show file tree
Hide file tree
Showing 13 changed files with 576 additions and 241 deletions.
2 changes: 1 addition & 1 deletion BENCHMARKS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Benchmarks
Contained in a parallell repository is a benchmark utility that performs interleaved allocations (both aligned to 8 or 16 bytes, and unaligned) and deallocations (both in-thread and cross-thread) in multiple threads. It measures number of memory operations performed per CPU second, as well as memory overhead by comparing the virtual memory mapped with the number of bytes requested in allocation calls. The setup of number of thread, cross-thread deallocation rate and allocation size limits is configured by command line arguments.
Contained in a parallel repository is a benchmark utility that performs interleaved allocations (both aligned to 8 or 16 bytes, and unaligned) and deallocations (both in-thread and cross-thread) in multiple threads. It measures number of memory operations performed per CPU second, as well as memory overhead by comparing the virtual memory mapped with the number of bytes requested in allocation calls. The setup of number of thread, cross-thread deallocation rate and allocation size limits is configured by command line arguments.

https://github.com/mjansson/rpmalloc-benchmark

Expand Down
19 changes: 19 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,22 @@
1.4.2

Fixed an issue where calling _exit might hang the main thread cleanup in rpmalloc if another
worker thread was terminated while holding exclusive access to the global cache.

Improved caches to prioritize main spans in a chunk to avoid leaving main spans mapped due to
remaining subspans in caches.

Improve cache reuse by allowing large blocks to use caches from slightly larger cache classes.

Fixed an issue where thread heap statistics would go out of sync when a free span was deferred
to another thread heap

API breaking change - added flag to rpmalloc_thread_finalize to avoid releasing thread caches.
Pass nonzero value to retain old behaviour of releasing thread caches to global cache.

Add option to config to set a custom error callback for assert failures (if ENABLE_ASSERT)


1.4.1

Dual license as both released to public domain or under MIT license
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Configuration of the thread and global caches can be important depending on your

# Required functions

Before calling any other function in the API, you __MUST__ call the initization function, either __rpmalloc_initialize__ or __pmalloc_initialize_config__, or you will get undefined behaviour when calling other rpmalloc entry point.
Before calling any other function in the API, you __MUST__ call the initialization function, either __rpmalloc_initialize__ or __pmalloc_initialize_config__, or you will get undefined behaviour when calling other rpmalloc entry point.

Before terminating your use of the allocator, you __SHOULD__ call __rpmalloc_finalize__ in order to release caches and unmap virtual memory, as well as prepare the allocator for global scope cleanup at process exit or dynamic library unload depending on your use case.

Expand Down Expand Up @@ -104,7 +104,7 @@ The allocator is based on a fixed but configurable page alignment (defaults to 6

Memory blocks are divided into three categories. For 64KiB span size/alignment the small blocks are [16, 1024] bytes, medium blocks (1024, 32256] bytes, and large blocks (32256, 2097120] bytes. The three categories are further divided in size classes. If the span size is changed, the small block classes remain but medium blocks go from (1024, span size] bytes.

Small blocks have a size class granularity of 16 bytes each in 64 buckets. Medium blocks have a granularity of 512 bytes, 61 buckets (default). Large blocks have a the same granularity as the configured span size (default 64KiB). All allocations are fitted to these size class boundaries (an allocation of 36 bytes will allocate a block of 48 bytes). Each small and medium size class has an associated span (meaning a contiguous set of memory pages) configuration describing how many pages the size class will allocate each time the cache is empty and a new allocation is requested.
Small blocks have a size class granularity of 16 bytes each in 64 buckets. Medium blocks have a granularity of 512 bytes, 61 buckets (default). Large blocks have the same granularity as the configured span size (default 64KiB). All allocations are fitted to these size class boundaries (an allocation of 36 bytes will allocate a block of 48 bytes). Each small and medium size class has an associated span (meaning a contiguous set of memory pages) configuration describing how many pages the size class will allocate each time the cache is empty and a new allocation is requested.

Spans for small and medium blocks are cached in four levels to avoid calls to map/unmap memory pages. The first level is a per thread single active span for each size class. The second level is a per thread list of partially free spans for each size class. The third level is a per thread list of free spans. The fourth level is a global list of free spans.

Expand All @@ -113,7 +113,7 @@ Each span for a small and medium size class keeps track of how many blocks are a
Large blocks, or super spans, are cached in two levels. The first level is a per thread list of free super spans. The second level is a global list of free super spans.

# Memory mapping
By default the allocator uses OS APIs to map virtual memory pages as needed, either `VirtualAlloc` on Windows or `mmap` on POSIX systems. If you want to use your own custom memory mapping provider you can use __rpmalloc_initialize_config__ and pass function pointers to map and unmap virtual memory. These function should reserve and free the requested number of bytes.
By default the allocator uses OS APIs to map virtual memory pages as needed, either `VirtualAlloc` on Windows or `mmap` on POSIX systems. If you want to use your own custom memory mapping provider you can use __rpmalloc_initialize_config__ and pass function pointers to map and unmap virtual memory. These function should reserve and free the requested number of bytes.

The returned memory address from the memory map function MUST be aligned to the memory page size and the memory span size (which ever is larger), both of which is configurable. Either provide the page and span sizes during initialization using __rpmalloc_initialize_config__, or use __rpmalloc_config__ to find the required alignment which is equal to the maximum of page and span size. The span size MUST be a power of two in [4096, 262144] range, and be a multiple or divisor of the memory page size.

Expand All @@ -128,7 +128,7 @@ Super spans (spans a multiple > 1 of the span size) can be subdivided into small

A span that is a subspan of a larger super span can be individually decommitted to reduce physical memory pressure when the span is evicted from caches and scheduled to be unmapped. The entire original super span will keep track of the subspans it is broken up into, and when the entire range is decommitted tha super span will be unmapped. This allows platforms like Windows that require the entire virtual memory range that was mapped in a call to VirtualAlloc to be unmapped in one call to VirtualFree, while still decommitting individual pages in subspans (if the page size is smaller than the span size).

If you use a custom memory map/unmap function you need to take this into account by looking at the `release` parameter given to the `memory_unmap` function. It is set to 0 for decommitting invididual pages and the total super span byte size for finally releasing the entire super span memory range.
If you use a custom memory map/unmap function you need to take this into account by looking at the `release` parameter given to the `memory_unmap` function. It is set to 0 for decommitting individual pages and the total super span byte size for finally releasing the entire super span memory range.

# Memory fragmentation
There is no memory fragmentation by the allocator in the sense that it will not leave unallocated and unusable "holes" in the memory pages by calls to allocate and free blocks of different sizes. This is due to the fact that the memory pages allocated for each size class is split up in perfectly aligned blocks which are not reused for a request of a different size. The block freed by a call to `rpfree` will always be immediately available for an allocation request within the same size class.
Expand Down
12 changes: 9 additions & 3 deletions build/ninja/clang.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def initialize(self, project, archs, configs, includepaths, dependlibs, libpaths
self.cxxcmd = '$toolchain$cxx -MMD -MT $out -MF $out.d $includepaths $moreincludepaths $cxxflags $carchflags $cconfigflags $cmoreflags $cxxenvflags -c $in -o $out'
self.ccdeps = 'gcc'
self.ccdepfile = '$out.d'
self.arcmd = self.rmcmd('$out') + ' && $toolchain$ar crsD $ararchflags $arflags $arenvflags $out $in'
self.arcmd = self.rmcmd('$out') + ' && $toolchain$ar crs $ararchflags $arflags $arenvflags $out $in'
if self.target.is_windows():
self.linkcmd = '$toolchain$link $libpaths $configlibpaths $linkflags $linkarchflags $linkconfigflags $linkenvflags /debug /nologo /subsystem:console /dynamicbase /nxcompat /manifest /manifestuac:\"level=\'asInvoker\' uiAccess=\'false\'\" /tlbid:1 /pdb:$pdbpath /out:$out $in $libs $archlibs $oslibs $frameworks'
self.dllcmd = self.linkcmd + ' /dll'
Expand All @@ -52,7 +52,7 @@ def initialize(self, project, archs, configs, includepaths, dependlibs, libpaths
'-fno-trapping-math', '-ffast-math']
self.cwarnflags = ['-W', '-Werror', '-pedantic', '-Wall', '-Weverything',
'-Wno-c++98-compat', '-Wno-padded', '-Wno-documentation-unknown-command',
'-Wno-implicit-fallthrough', '-Wno-static-in-inline', '-Wno-reserved-id-macro']
'-Wno-implicit-fallthrough', '-Wno-static-in-inline', '-Wno-reserved-id-macro', '-Wno-disabled-macro-expansion']
self.cmoreflags = []
self.mflags = []
self.arflags = []
Expand All @@ -76,8 +76,14 @@ def initialize(self, project, archs, configs, includepaths, dependlibs, libpaths
self.oslibs += ['m']
if self.target.is_linux() or self.target.is_raspberrypi():
self.oslibs += ['dl']
if self.target.is_raspberrypi():
self.linkflags += ['-latomic']
if self.target.is_bsd():
self.oslibs += ['execinfo']
if self.target.is_haiku():
self.cflags += ['-D_GNU_SOURCE=1']
self.linkflags += ['-lpthread']
self.oslibs += ['m']
if not self.target.is_windows():
self.linkflags += ['-fomit-frame-pointer']

Expand Down Expand Up @@ -391,7 +397,7 @@ def make_linkconfigflags(self, config, targettype, variables):
if targettype == 'sharedlib':
flags += ['-shared', '-fPIC']
if config != 'debug':
if targettype == 'bin' or targettype == 'sharedlib':
if (targettype == 'bin' or targettype == 'sharedlib') and self.use_lto():
flags += ['-flto']
return flags

Expand Down
7 changes: 6 additions & 1 deletion build/ninja/gcc.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ def initialize(self, project, archs, configs, includepaths, dependlibs, libpaths
self.cxxcmd = '$toolchain$cxx -MMD -MT $out -MF $out.d $includepaths $moreincludepaths $cxxflags $carchflags $cconfigflags $cmoreflags $cxxenvflags -c $in -o $out'
self.ccdeps = 'gcc'
self.ccdepfile = '$out.d'
self.arcmd = self.rmcmd('$out') + ' && $toolchain$ar crsD $ararchflags $arflags $arenvflags $out $in'
self.arcmd = self.rmcmd('$out') + ' && $toolchain$ar crs $ararchflags $arflags $arenvflags $out $in'
self.linkcmd = '$toolchain$link $libpaths $configlibpaths $linkflags $linkarchflags $linkconfigflags $linkenvflags -o $out $in $libs $archlibs $oslibs'

#Base flags
Expand Down Expand Up @@ -54,8 +54,13 @@ def initialize(self, project, archs, configs, includepaths, dependlibs, libpaths
self.linkflags += ['-pthread']
if self.target.is_linux() or self.target.is_raspberrypi():
self.oslibs += ['dl']
if self.target.is_raspberrypi():
self.linkflags += ['-latomic']
if self.target.is_bsd():
self.oslibs += ['execinfo']
if self.target.is_haiku():
self.cflags += ['-D_GNU_SOURCE=1']
self.linkflags += ['-lpthread']

self.includepaths = self.prefix_includepaths((includepaths or []) + ['.'])

Expand Down
5 changes: 5 additions & 0 deletions build/ninja/generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,9 @@ def __init__(self, project, includepaths = [], dependlibs = [], libpaths = [], v
parser.add_argument('--updatebuild', action='store_true',
help = 'Update submodule build scripts',
default = '')
parser.add_argument('--lto', action='store_true',
help = 'Build with Link Time Optimization',
default = False)
options = parser.parse_args()

self.project = project
Expand Down Expand Up @@ -91,6 +94,8 @@ def __init__(self, project, includepaths = [], dependlibs = [], libpaths = [], v
variables['monolithic'] = True
if options.coverage:
variables['coverage'] = True
if options.lto:
variables['lto'] = True
if self.subninja != '':
variables['internal_deps'] = True

Expand Down
9 changes: 7 additions & 2 deletions build/ninja/platform.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import sys

def supported_platforms():
return [ 'windows', 'linux', 'macos', 'bsd', 'ios', 'android', 'raspberrypi', 'tizen', 'sunos' ]
return [ 'windows', 'linux', 'macos', 'bsd', 'ios', 'android', 'raspberrypi', 'tizen', 'sunos', 'haiku' ]

class Platform(object):
def __init__(self, platform):
Expand All @@ -20,7 +20,7 @@ def __init__(self, platform):
self.platform = 'macos'
elif self.platform.startswith('win'):
self.platform = 'windows'
elif 'bsd' in self.platform:
elif 'bsd' in self.platform or self.platform.startswith('dragonfly'):
self.platform = 'bsd'
elif self.platform.startswith('ios'):
self.platform = 'ios'
Expand All @@ -32,6 +32,8 @@ def __init__(self, platform):
self.platform = 'tizen'
elif self.platform.startswith('sunos'):
self.platform = 'sunos'
elif self.platform.startswith('haiku'):
self.platform = 'haiku'

def platform(self):
return self.platform
Expand Down Expand Up @@ -63,5 +65,8 @@ def is_tizen(self):
def is_sunos(self):
return self.platform == 'sunos'

def is_haiku(self):
return self.platform == 'haiku'

def get(self):
return self.platform
10 changes: 9 additions & 1 deletion build/ninja/toolchain.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ def __init__(self, host, target, toolchain):
#Set default values
self.build_monolithic = False
self.build_coverage = False
self.build_lto = False
self.support_lua = False
self.internal_deps = False
self.python = 'python'
Expand Down Expand Up @@ -132,7 +133,7 @@ def initialize_archs(self, archs):
def initialize_default_archs(self):
if self.target.is_windows():
self.archs = ['x86-64']
elif self.target.is_linux() or self.target.is_bsd() or self.target.is_sunos():
elif self.target.is_linux() or self.target.is_bsd() or self.target.is_sunos() or self.target.is_haiku():
localarch = subprocess.check_output(['uname', '-m']).decode().strip()
if localarch == 'x86_64' or localarch == 'amd64':
self.archs = ['x86-64']
Expand Down Expand Up @@ -208,6 +209,8 @@ def parse_default_variables(self, variables):
self.build_monolithic = get_boolean_flag(val)
elif key == 'coverage':
self.build_coverage = get_boolean_flag(val)
elif key == 'lto':
self.build_lto = get_boolean_flag(val)
elif key == 'support_lua':
self.support_lua = get_boolean_flag(val)
elif key == 'internal_deps':
Expand All @@ -234,6 +237,8 @@ def parse_prefs(self, prefs):
self.build_monolithic = get_boolean_flag(prefs['monolithic'])
if 'coverage' in prefs:
self.build_coverage = get_boolean_flag( prefs['coverage'] )
if 'lto' in prefs:
self.build_lto = get_boolean_flag( prefs['lto'] )
if 'support_lua' in prefs:
self.support_lua = get_boolean_flag(prefs['support_lua'])
if 'python' in prefs:
Expand All @@ -258,6 +263,9 @@ def is_monolithic(self):
def use_coverage(self):
return self.build_coverage

def use_lto(self):
return self.build_lto

def write_variables(self, writer):
writer.variable('buildpath', self.buildpath)
writer.variable('target', self.target.platform)
Expand Down
36 changes: 33 additions & 3 deletions rpmalloc/malloc.c
Original file line number Diff line number Diff line change
Expand Up @@ -292,26 +292,55 @@ DllMain(HINSTANCE instance, DWORD reason, LPVOID reserved) {
else if (reason == DLL_THREAD_ATTACH)
rpmalloc_thread_initialize();
else if (reason == DLL_THREAD_DETACH)
rpmalloc_thread_finalize();
rpmalloc_thread_finalize(1);
return TRUE;
}

//end BUILD_DYNAMIC_LINK
#else

extern void
_global_rpmalloc_init(void) {
rpmalloc_set_main_thread();
rpmalloc_initialize();
}

#if defined(__clang__) || defined(__GNUC__)

static void __attribute__((constructor))
initializer(void) {
_global_rpmalloc_init();
}

#elif defined(_MSC_VER)

#pragma section(".CRT$XIB",read)
__declspec(allocate(".CRT$XIB")) void (*_rpmalloc_module_init)(void) = _global_rpmalloc_init;
#pragma comment(linker, "/include:_rpmalloc_module_init")

#endif

//end !BUILD_DYNAMIC_LINK
#endif

#else

#include <pthread.h>
#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>

extern void
rpmalloc_set_main_thread(void);

static pthread_key_t destructor_key;

static void
thread_destructor(void*);

static void __attribute__((constructor))
initializer(void) {
rpmalloc_set_main_thread();
rpmalloc_initialize();
pthread_key_create(&destructor_key, thread_destructor);
}
Expand Down Expand Up @@ -340,7 +369,7 @@ thread_starter(void* argptr) {
static void
thread_destructor(void* value) {
(void)sizeof(value);
rpmalloc_thread_finalize();
rpmalloc_thread_finalize(1);
}

#ifdef __APPLE__
Expand Down Expand Up @@ -368,7 +397,8 @@ pthread_create(pthread_t* thread,
const pthread_attr_t* attr,
void* (*start_routine)(void*),
void* arg) {
#if defined(__linux__) || defined(__FreeBSD__) || defined(__OpenBSD__) || defined(__APPLE__) || defined(__HAIKU__)
#if defined(__linux__) || defined(__FreeBSD__) || defined(__OpenBSD__) || defined(__NetBSD__) || defined(__DragonFly__) || \
defined(__APPLE__) || defined(__HAIKU__)
char fname[] = "pthread_create";
#else
char fname[] = "_pthread_create";
Expand Down
Loading

0 comments on commit b55d218

Please sign in to comment.