Skip to content

fjtapia/sort_parallel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Boost Sort Parallel

PRELIMINARY

The Boost Sort Parallel Library, is part of the Boost Sort Library, created by Steven Ross.

DESCRIPTION

This library provide stable and not stable sorting algorithms, in single thread and parallel versions.

These algorithms do not use any other library or utility. To compile and run only need a C++11 compliant compiler.

The algorithms use a comparison object, in the same way than the sort algorithms of the standard library. If you don't define, by default is std::less object, which uses internally the operator < internally for comparisons.

The algorithms are exception safe, it means, the exceptions generated by the algorithms guarantee the integrity of the objects to sort , but not their relative order. If the exception is generated inside the objects (in the move or in the copy constructor.. ) the results can be unpredictable.

This library is include only. Don't need to link with any static or dynamic library. Don't have dependence of any other boost files, variables or libraries or any other external libraries. To use, only need a to include the files of the boost/sort/parallel folder, any more.

This table provide you a brief description of the sort algorithms of the library.

Algorithm Parallel Stable Additional memory Best, average, and worst case
sort no no Log N NLogN, NLogN, NLogN
stable_sort no yes N / 2 NLogN, NLogN, NLogN
parallel_sort yes no block_size* num_threads NLogN, NLogN, NLogN
parallel_stable_sort yes yes N / 2 NLogN, NLogN, NLogN
sample_sort yes yes N NLogN, NLogN, NLogN

The block_size is an internal parameter of the algorithm, which in order to achieve the highest speed, change according the size of the objects to sort according the next table. The strings use a block_size of 128.

object size (bytes) 1 - 15 16 - 31 32 - 63 64 - 127 128 - 255 256 - 511 512 -
block_size 4096 2048 1024 768 512 256 128

PRESENT PERSPECTIVE

In the parallel sorting algorithms, we can find two categories of algorithms.

SUBDIVISION ALGORITHMS

Filter the data and generate two or more parts. Each part obtained is filtered and divided by other thread, until the size of the data to sort is smaller than a predefined size, then it is sorted by a single thread. The algorithm most frecuently used in the filter and sorting is quicksort.

These algorithms are fast with a small number of threads, but with a great number of HW threads , show their lacks. Examples of this category are

  • Intel Threading Building Blocks (TBB)
  • Microsoft PPL Parallel Sort.

MERGING ALGORITHMS

Divide the data in parts, and each part is sorted by a thread. When the parts are sorted, must merge them to obtain the final results. The problem of these algorithms is they need additional memory to the merge, and usually with the same size than the data.

With a small number of threads, have similar speed than the subdivision algorithms, but with many threads they are much more faster . Examples of this category are :

  • GCC Parallel Sort (based on OpenMP)
  • Microsoft PPL Parallel Buffered Sort

NEW PARALLEL SORT ALGORITHM

This generate an undesirable duality. With a small number of threads use one algorithm, and with a big number use other. Due this, the SW designed for a small machine is inadequate for a big machine and vice versa. But the main problem, in the merging algorithms is the memory used, usually of the same size than the data.

This version have as novelty a new parallel_sort algorithm ( internally named Block Indirect), created for processors connected with shared memory.

It is a hybrid algorithm. With small number of threads, it is a subdivision algorithm, but with many threads is a merging algorithms, which need a small auxiliary memory ( block_size * num_threads).

The block_size is an internal parameter of the algorithm, which in order to achieve the highest speed, change according the size of the objects to sort according the next table. The strings use a block_size of 128.

object size (bytes) 1 - 15 16 - 31 32 - 63 64 - 127 128 - 255 256 - 511 512 -
block_size 4096 2048 1024 768 512 256 128

This algorithm permit to eliminate the duality. When run in a machine with small number of threads have the performance of TBB, and when run in a machine with many threads, the same code have the performance of GCC Parallel Sort, with the additional advantage of the small memory consumption.

The algorithm use as auxiliary memory a block_size elements for each thread. The worst case is when have very big elements and many threads. With big elements (512 bytes), and 12 threads, The memory measured was:

Algorithm Memory used in MB
GCC Parallel Sort (OpenMP) 1565 MB
Threading Building Blocks (TBB) 783 MB
Block Indirect Sort 812 MB

This new parallel_sort algorithm had been created and implemented specifically for this library by the author.

*In the documentation of the project, you can find all the information about the algorithm ( description, characteristics, how to program, and articles with the detailed description of the new algorithm) index.html

*In the project you have a small benchmarks. If you want run in your machine the full benchmarks with non free SW, you have all the code, intructions and procedures in (https://github.com/fjtapia/sort_parallel_benchmark)

Installation

* This library is **include only**. * Don't need to link with any static or dynamic library. * Don't have dependence of any other boost libraries or any other external libraries. * To use, only need include the files of the boost/sort/parallel folder. * This library had been compiled successfully with the next compilers
Compiler
GCC 4.8, 4.9 , 5.2
CLANG 3.6, 3.7, 3.8
Visual C++ 2013, 2015

Author and Copyright

This library is integrated in the Boost library, inside the boost::sort library, with the spreadsort algorithms designed and implemented by Steven Ross.

You can find in https://github.com/fjtapia/sort_parallel

This algorithm had been ideate, designed and implemented beginning from zero. After read hundreds of articles and books, I didn't find any similar. If someone knows something about this or something similar, please, say me.

Anyway, the important is not the author, is provide a fast, robust, and easy to use algorithm to the community of programmers.

Copyright 2016 Francisco Tapia (fjtapia@gmail.com) Distributed under the Boost Software License, Version 1.0. (See http://www.boost.org/LICENSE_1_0.txt)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages