Skip to content

Commit

Permalink
Parallel DWARF parsing and improved parallel code parsing (dyninst#651)
Browse files Browse the repository at this point in the history
* changes for parallelizing symtabapi

* Tweaks to use OpenMP (still one last reducer left)

* Mark a false race in Type.C.
    std::call_once has a barrier at the end, so anything that happens inside is visible to anything that happens after.

* Replace the Cilk reducer with an OpenMP reduction

* Remove some dead code, and move the reducer into the source file.

* Mark the last race so far, move VG macros to a separate file.

* Annotate the two core locks, negating a lot of the race reports
    Implementation is not great though, I moved mcs_init to be a real function call.
    There might be a better way to do this, will investigate at some point.

* Very awkwardly annotate a parallel hashmap by key. Consider implementing better magic.

* Wrap the callback in `omp critical`, to make it parallel-safe no matter what happens.

* Swap out the vector for a TBB concurrent_queue.

* Remove some unnessesary annotations, using a proper OpenMP handles them.

* Revert the core lock annotations, they don't actually really work.

* Macro-replace the locks with boost equivalents. Revert this commit later once the testing is complete, or refactor it fully out.

* Replace the 1-entry cache with a vector that expands with the number of threads.

* Reuse the DwarfWalker between loop iterations, to save some work

* Use the standard allocator, so that Valgrind can track it properly

* Silence some warnings

* Replace the write with a CAS, letting Valgrind mostly ignore it.

* Strengthen the condition, its good enough for now.

* Re-annotate the hash map similar to an RW lock (which it technically is)

* Nearly fully fix the function-static annotations, using a very small constructor.

* Add libc++ annotations to vgannotations, and shift includes around to make them work

* Actually do the annotations right. I learned things today.

* Tell Helgrind to ignore a few more things

* Disable checking on everything

* A few tweaks to annotation

* Add h-b arcs to the pfq rwlock

* Apparently vectors didn't work like I thought they did. Whoops.

* Elfutils is now more thread-safe, and fix the annotations to keep DRD quiet.

* Make vgannotations.h local to Dyninst, and do something C++ for the lazy inits.

* Make the custom locks first-class C++ types, and compatible with C++17's syntax.

* Replace the lock implementations with more reasonable alternatives.

* Move the annotation to handle the case where std::pair does the write

* Unify all the TBB types under a Dyninst-tagged namespace, to permit refactor

Also fix a few whitespace errors near affected lines.

* Move concurrent_hash_map annotations into the unified class.

* Shift the thread-local stuff into a template class

* Wiggle the parallel loops and fine-grain the locking for a performance boost.

* Use Boost's call_once to try and be a bit more portable

* Put some parallelism into DwarfFrameParser. Fixes blue42u#18 and blue42u#19

* Use atomics for the reference counting. Fixes blue42u#4.

* Add an extra lock to the StringTable, and use it to mediate access to the internals. Fixes blue42u#20.

* Add a lock for inlines management, fixes blue42u#22.

* Add a lock to the function frame vectors, fixes blue42u#21.

* Replace a map+lock with a proper parallel hashmap for performance.

* Replace a mutex'd multi_index_container with a series of concurrent_hash_maps.

* Remove the comments that were left over from Cilkscreen race detection.

...I meant to do this a long time ago...

* Fix a number of minor issues, and one possibly important typo.

* Use RAII-style classes when handling the locks.

* Add a few defines for when Valgrind annotations are turned off.

* Parallelize some stuff in Object, makes the loading of files a little faster.

* Swap out some hash_maps for their concurrent forms, more easy parallelism.

* Actually do the parsing properly, and add a lock where it was needed.

* Adding more parallelism.

* Get rid of unnecessary serial code in finalization and add parallelism for hints initialization

* Removed a lock, and attempts to fix the resulting issues that arose afterwards.

* Tweaks to remove the phase-based approach

* 1. Fix missing parsed edges caused by early resuming frames
2. Always choosing the alphabetically smallest name for a function if there are multiple

* Delete swap_busy

* 1. Handle ud2 instruction, which will raise an undefined opcode exception. Therefore, control flow should not fall through
2. When deleting a bogus function, the reference counts of the blocks in the function should be decremented

* 1. Rewrite createAndRecordFrame to allow concurrent frame creations.
2. Simplify the use of frame status: BAD_LOOKUP means frame does not exists
   and a caller should only create a new frame when the result is BAD_LOOKUP

* 1. Resume functions as soon as a function finds a ret instruction
2. Rewrite parts of the tail call correction in finalization

* Estimate parsing task size by function symbol size
and launch large tasks first

* Delete omp critical in parisng of a fram

* Get rid of unnecessary assert

* Small changes to fix some maybe-races

* Munge the annotations for c_hash_map a little, and hotfix a possible race.

* Silence a number of warnings. Identations a mess but its quieter.

* Rewrite the c_hash_map to expose the accessors as a rwlock.

* Replace lock for delayed frames with concurrent hash map

* Parallelize SymtabCodeSource::init_hints

* Parallelize CodeObject::process_hints

* Fix an infinite loop in the parsing finalization stages.

* 1. Fix parallelization for CodeObject::process_hint(): a local variable declared is moved from outside loop to inside loop
2. Fix debug print crash in Parser.C
3. Some code cleanup

* ParseAPI now initialize hints in parallel. So, if the user of ParseAPI overloads the function for creating
ParseAPI::Function, that code should be thread-safe.

Change the DynCFGFactor in dyninstAPI to use mutex to be thread-safe.

* Lasily preparing range data for functions and blocks.

* Add block ranges and clean some dead code

* Add a CMake flag to enable Valgrind annotations

* Use dyn_c_vector, and disable the parallelism for ELF stuff for now.

* Rough additions of locks in various places to make things work.

* Properly initialize the Module in all constructors.

* Fix compilation

* Get rid unnecessary boost::lock_guard and use entry lookup in loop tree construction

* Use static AArch64 decoder tables. Fixes dyninst#630.

Other improvements include faster compile time (for affected files),
~1s less load time, ~7.8MiB smaller binary, and ~200KiB more memory usage.

Performance effects not yet tested.

* Should not delete unused ParseAPI::Function during parsing because CFGFactory will do delete all created functions in its destructor.

* Use exchange instead of store to keep Valgrind happy.

* Several fixes for analyzing .a files

1. Rewrite the OverlappingParseData to correct handle overlapping code regions.
2. In .o files, code starts at address 0, so address 0 can be a valid address.
   So, change indirect call target from address 0 to max address to represent indiret calls
3. Use CodeSource to check address validity, which would allow cross CodeRegions valid address;
   but use CodeRegion for raw code bytes, because using CodeSource may return code bytes
   from other regions that are overlapping.

* Fixes for gap parsing, which uses a different interface to call Parser

* Fix range data related to gap parsing

* Fix loop tree callee construction on Power

* Fix function removing in parsing finalization

* Fix crahses of symtabAPI tests on power

* Fix deadlock in constructing analysis graph for jump table analysis.

* Type refcount refactor, part 1: s/Type*/boost::shared_ptr<Type>/g

* Purge Type::refCount and all related code bits. The shared_ptr's handle it now.

Also add proper SFINAE on that one template, since now typeScalar and Type
have the same size (thank GCC's automatic bitfield construction).

* Add the backwards compatbility layer

* Fixup for a minor issue that should have popped up sooner

* CFGFactory class does not need to inherit boost_guard

* Fix compilation on ARM, and fix part of the backwards-compat layer.

* Adjust cmake file to new elfutils

* libdyninstAPI_RT.so should not link against libgomp, which would
cause crash at program startup time due to calling into uninitialized
rewritten libc.

* Cleanup OpenMP handling in build system

* Fix compilation when USE_OpenMP is set to OFF

* Fix a debug print crash

* Fix infinite recursion caused by missing stack unwind debug info

* Disable installing trampolines in instrumentation

* Stack walk should always have an increasing SP on x86

* Several fixes for parallel code parsing

1. Add a jump table finalization step. The assumption here is that different jump tables
   do not share entries. So, if one jump table runs into another one, we know that the
   entries that are overlapping with the next table are out-of-bound.
2. Remove edges and blocks for created by out-of-bound jump table entries
3. Handle problems of failing to resolve jump tables caused by out-of-bound entries from
   other jump table entries.

Co-authored-by: Jonathon Anderson <17242663+blue42u@users.noreply.github.com>
Co-authored-by: Tim Haines <thaines.astro@gmail.com>
  • Loading branch information
3 people committed Jan 13, 2020
1 parent 0dffe9f commit d233ae7
Show file tree
Hide file tree
Showing 102 changed files with 3,445 additions and 3,426 deletions.
16 changes: 14 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ include(LibIberty)

include(shared)

if(USE_OpenMP)
find_package(OpenMP REQUIRED)
endif()

# PLATFORM is set by including 'shared'
if(PLATFORM MATCHES "bgq")
message(WARNING "Support for Bluegene/Q has been deprecated and will be removed in a future release")
Expand Down Expand Up @@ -61,12 +65,20 @@ foreach (dir ${HEADER_DIRS})
include_directories ( ${DYNINST_ROOT}/${dir}/h )
endforeach()

set(DYNINST_RT_CMAKE_C_FLAGS "${CMAKE_C_FLAGS}")
set(DYNINST_RT_CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS}")

set(ADD_VALGRIND_ANNOTATIONS OFF CACHE BOOL "Enable annotations for Valgrind analysis")
if(ADD_VALGRIND_ANNOTATIONS)
find_package(Valgrind REQUIRED)
include_directories(${Valgrind_INCLUDE_DIRS})
add_definitions(-DENABLE_VG_ANNOTATIONS)
endif()

include_directories (
${DYNINST_ROOT}
${DYNINST_ROOT}/external
${CILKSCREEN_INCLUDES}
${TBB_INCLUDE_DIRS}
${TBB_INCLUDE_DIRS}
)

# Component time
Expand Down
3 changes: 2 additions & 1 deletion cmake/ElfUtils.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ if(NOT UNIX)
endif()

# Minimum acceptable version of elfutils
set(_min_version 0.173)
set(_min_version 0.178)
set(ElfUtils_MIN_VERSION ${_min_version}
CACHE STRING "Minimum acceptable elfutils version")
if(${ElfUtils_MIN_VERSION} VERSION_LESS ${_min_version})
Expand Down Expand Up @@ -117,6 +117,7 @@ else()
<SOURCE_DIR>/configure
--enable-install-elfh
--prefix=${CMAKE_INSTALL_PREFIX}
--disable-debuginfod
BUILD_COMMAND make install
INSTALL_COMMAND ""
)
Expand Down
45 changes: 45 additions & 0 deletions cmake/Modules/FindValgrind.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#========================================================================================
# FindValgrind.cmake
#
# Find Valgrind include dirs
#
# ----------------------------------------
#
# Use this module by invoking find_package with the form::
#
# find_package(Valgrind
# [REQUIRED] # Fail with error if Valgrind headers are not found
# )
#
# This module reads hints about search locations from variables::
#
# Valgrind_ROOT_DIR - Base directory the of Valgrind installation
# Valgrind_INCLUDEDIR - Hint directory that contains the Valgrind headers files
#
# and saves search results persistently in CMake cache entries::
#
# Valgrind_FOUND - True if headers were found
# Valgrind_INCLUDE_DIRS - Valgrind include directories
#
#========================================================================================

include(DyninstSystemPaths)

find_path(Valgrind_INCLUDE_DIR
NAMES valgrind.h
HINTS ${Valgrind_ROOT_DIR}/include ${Valgrind_ROOT_DIR} ${Valgrind_INCLUDEDIR}
PATHS ${DYNINST_SYSTEM_INCLUDE_PATHS}
PATH_SUFFIXES valgrind
DOC "Valgrind include directory")

include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(Valgrind
FOUND_VAR
Valgrind_FOUND
REQUIRED_VARS
Valgrind_INCLUDE_DIR)

# Export cache variables
if(Valgrind_FOUND)
set(Valgrind_INCLUDE_DIRS ${Valgrind_INCLUDE_DIR})
endif()
8 changes: 5 additions & 3 deletions common/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,13 @@ else()
include_directories (
${PROJECT_SOURCE_DIR}/common/h # don't include common/src; anything from there can still collide with default includes.
# stupid Windows case-insensitive naming.
${CILKSCREEN_INCLUDES}
)
endif()


set (SRC_LIST
src/mcs-lock.C
src/pfq-rwlock.C
src/race-detector-annotations.C
src/concurrent.C
src/string-regex.C
src/Timer.C
src/Types.C
Expand Down Expand Up @@ -120,6 +118,10 @@ endif()
target_link_private_libraries(common ${Boost_LIBRARIES})
target_link_libraries(common PUBLIC ${TBB_LIBRARIES})

if(USE_OpenMP)
set_target_properties(common PROPERTIES COMPILE_FLAGS ${OpenMP_CXX_FLAGS} LINK_FLAGS ${OpenMP_CXX_FLAGS})
endif()

IF (USE_COTIRE)
cotire(common)
ENDIF()
Expand Down
74 changes: 26 additions & 48 deletions common/h/IBSTree-fast.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
#include <boost/mpl/inherit.hpp>
#include <iostream>

#include "pfq-rwlock.h"
#include "concurrent.h"

namespace Dyninst
{
Expand All @@ -51,7 +51,7 @@ namespace Dyninst
class IBSTree_fast {
private:
// reader-writer lock to coordinate concurrent operations
mutable pfq_rwlock_t rwlock;
mutable dyn_rwlock rwlock;

public:
typedef typename ITYPE::type interval_type;
Expand All @@ -68,25 +68,20 @@ namespace Dyninst

IBSTree_fast()
{
pfq_rwlock_init(rwlock);
}
~IBSTree_fast()
{
//std::cerr << "Fast interval tree had " << unique_intervals.size() << " unique intervals and " << overlapping_intervals.size() << " overlapping" << std::endl;
}
int size() const
{
pfq_rwlock_read_lock(rwlock);
int result = overlapping_intervals.size() + unique_intervals.size();
pfq_rwlock_read_unlock(rwlock);
return result;
dyn_rwlock::shared_lock l(rwlock);
return overlapping_intervals.size() + unique_intervals.size();
}
bool empty() const
{
pfq_rwlock_read_lock(rwlock);
bool result = unique_intervals.empty() && overlapping_intervals.empty();
pfq_rwlock_read_unlock(rwlock);
return result;
dyn_rwlock::shared_lock l(rwlock);
return unique_intervals.empty() && overlapping_intervals.empty();
}
void insert(ITYPE*);
void remove(ITYPE*);
Expand All @@ -97,11 +92,10 @@ namespace Dyninst
void clear();
friend std::ostream& operator<<(std::ostream& stream, const IBSTree_fast<ITYPE>& tree)
{
pfq_rwlock_read_lock(tree.rwlock);
dyn_rwlock::shared_lock l(tree.rwlock);
std::copy(tree.unique_intervals.begin(), tree.unique_intervals.end(),
std::ostream_iterator<typename Dyninst::IBSTree_fast<ITYPE>::interval_set::value_type>(stream, "\n"));
stream << tree.overlapping_intervals;
pfq_rwlock_read_unlock(tree.rwlock);
return stream;
}

Expand All @@ -110,10 +104,8 @@ namespace Dyninst
template <class ITYPE>
void IBSTree_fast<ITYPE>::insert(ITYPE* entry)
{
pfq_rwlock_node_t me;
pfq_rwlock_write_lock(rwlock, me);
dyn_rwlock::unique_lock l(rwlock);

do {
// find in overlapping first
std::set<ITYPE*> dummy;
if(overlapping_intervals.find(entry, dummy))
Expand All @@ -123,7 +115,7 @@ namespace Dyninst
typename interval_set::iterator lower =
unique_intervals.upper_bound(entry->low());
// lower.high first >= entry.low
if (lower != unique_intervals.end() && (**lower == *entry)) break;
if (lower != unique_intervals.end() && (**lower == *entry)) return;
typename interval_set::iterator upper = lower;
while(upper != unique_intervals.end() &&
(*upper)->low() <= entry->high())
Expand All @@ -141,47 +133,37 @@ namespace Dyninst
unique_intervals.insert(entry);
}
}
} while(0);

pfq_rwlock_write_unlock(rwlock, me);
}
template <class ITYPE>
void IBSTree_fast<ITYPE>::remove(ITYPE* entry)
{
pfq_rwlock_node_t me;
pfq_rwlock_write_lock(rwlock, me);
dyn_rwlock::unique_lock l(rwlock);

overlapping_intervals.remove(entry);
typename interval_set::iterator found = unique_intervals.find(entry->high());
if(found != unique_intervals.end() && *found == entry) unique_intervals.erase(found);
pfq_rwlock_write_unlock(rwlock, me);
}
template<class ITYPE>
int IBSTree_fast<ITYPE>::find(interval_type X, std::set<ITYPE*> &results) const
{
int count = 0;
pfq_rwlock_read_lock(rwlock);
do {
int num_old_results = results.size();

int num_overlapping = overlapping_intervals.find(X, results);
if(num_overlapping > 0) { count = num_overlapping; break; }

typename interval_set::const_iterator found_unique = unique_intervals.upper_bound(X);
if(found_unique != unique_intervals.end())
{
if((*found_unique)->low() > X) { count = 0; break; }
results.insert(*found_unique);
}
count = results.size() - num_old_results;
} while (0);
pfq_rwlock_read_unlock(rwlock);
return count;
dyn_rwlock::shared_lock l(rwlock);
int num_old_results = results.size();

int num_overlapping = overlapping_intervals.find(X, results);
if(num_overlapping > 0) return num_overlapping;

typename interval_set::const_iterator found_unique = unique_intervals.upper_bound(X);
if(found_unique != unique_intervals.end())
{
if((*found_unique)->low() > X) return 0;
results.insert(*found_unique);
}
return results.size() - num_old_results;
}
template <typename ITYPE>
int IBSTree_fast<ITYPE>::find(ITYPE* I, std::set<ITYPE*>&results) const
{
pfq_rwlock_read_lock(rwlock);
dyn_rwlock::shared_lock l(rwlock);
int num_old_results = results.size();
int num_overlapping = overlapping_intervals.find(I, results);
if(num_overlapping) return num_overlapping;
Expand All @@ -193,13 +175,12 @@ namespace Dyninst
++ub;
}
int result = results.size() - num_old_results;
pfq_rwlock_read_unlock(rwlock);
return result;
}
template<typename ITYPE>
void IBSTree_fast<ITYPE>::successor(interval_type X, std::set<ITYPE*>& results) const
{
pfq_rwlock_read_lock(rwlock);
dyn_rwlock::shared_lock l(rwlock);
ITYPE* overlapping_ub = overlapping_intervals.successor(X);

typename interval_set::const_iterator unique_ub = unique_intervals.upper_bound(X);
Expand All @@ -215,7 +196,6 @@ namespace Dyninst
{
results.insert(*unique_ub);
}
pfq_rwlock_read_unlock(rwlock);
}
template <typename ITYPE>
ITYPE* IBSTree_fast<ITYPE>::successor(interval_type X) const
Expand All @@ -229,11 +209,9 @@ namespace Dyninst
template <typename ITYPE>
void IBSTree_fast<ITYPE>::clear()
{
pfq_rwlock_node_t me;
pfq_rwlock_write_lock(rwlock, me);
dyn_rwlock::unique_lock l(rwlock);
overlapping_intervals.clear();
unique_intervals.clear();
pfq_rwlock_write_unlock(rwlock, me);
}

}
Expand Down
Loading

0 comments on commit d233ae7

Please sign in to comment.