-
-
Notifications
You must be signed in to change notification settings - Fork 430
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use 20000 as block size instead of the chunk size from traits::maxima…
…l_number_of_chucks(...).
- Loading branch information
Showing
1 changed file
with
2 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
aae9fb6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely take the traits into account. This is a means for the user to influence things. I wouldn't like for this to be not possible any more.
aae9fb6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hkaiser But, the problem is that
max_chunks
which is returned bytraits::maximal_number_of_chunks(...)
is very big and it decreases the performance very very much.In this case, chunk size of existed traits is inadaptable to use for block size.
As I think, block size should not be influenced by data count.
aae9fb6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end we need to have a means for the user to control the used chunk sizes. How do you suggest we do that?
aae9fb6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hkaiser
One idea is adding new thing to executor parameters traits. (
hpx/hpx/parallel/executors/executor_parameter_traits.hpp
Line 353 in aae9fb6
For example, add the function
get_block_size(...)
intostruct executor_parameter_traits { ... }
.The problem is that this new feature is only used in parallel::partition. (I think that it is not good because 'executor parameter traits' is generic interface.)
In fact, this new feature is used for parallel algorithms using parallel::partition, too.
And the user can be confused chunk size and block size.
Another idea is adding parameter to interface of parallel::partition.
It violates the interface of C++ standard, so it seems bad. But, it is meaningful because this is very simple and clear solution. But, because block size should be propagated when other parallel algorithms use parallel::partition, it is bad solution.
aae9fb6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taeguk Is there at least one chunker which implements
maximal_number_of_chunks
? Looking at traits implementation, you should get4 * cores
. It should not be big.The concept of "block size" is itself quite generic, many algorithms are "blocked", especially in linear algebra. I think it's perfectly fine to have both chunk and block size because they are intended to represent different concepts. As you noticed, chunk size may depend on the number of elements. For many problems, block size depends on cache size and CPU architecture.
aae9fb6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mcopik Sorry, I mean not
max_chunks
, but chunk size. Like you say,maximal_number_of_chunks
returns small number. And so, chunk size is very big because chunk size ismax_chunks / data count
.In fact, there is
get_chunk_size(...)
in executor parameter traits. If I use that withexecution::par
, I get(num_tasks + 4 * cores - 1) / (4 * cores)
as chunk size. Anyway, using chunk size for block size in parallel::partition brings bad performance. The reason is that if we use big block size,remaining_blocks
that are remained after sub-partitioning by each thread have very big size, so sequential codes should be performed long time.As I think, the concept of "block size" is not generic. In parallel::partition, cache size and CPU architecture are not all considerations to determine block size. There is more thing. Like I said above, big block size makes
remaining_blocks
bigger. The max value of sum of sizes ofremaining_blocks
isblock size * cores
. The point is thatremaining_blocks
are processed sequentially. So, big block size can decrease parallelism. But, the very small block size is also bad because very small block size occurs bad cache utilization and excessive many block fetches. Therefore, it is important to find adequate block size. (If you want to know whatremaining_blocks
are, seehpx/hpx/parallel/algorithms/partition.hpp
Lines 816 to 896 in aae9fb6
Anyway, because of above reason, I think "block size of parallel::partition" is not generic concept.
And, I want to say that maybe generally 'block size' and 'chunk size' are used for same meaning. But, "block size of parallel::partition" is different from meanings of general 'block size' and 'chunk size' because "block size of parallel::partition" is determined with considering
remaining_blocks
. (maybe it can be confusing because of its naming.)If we should add "block size of parallel::partition" into traits, maybe using obvious namings may be better like using 'partition_block_size', 'block_size_for_partition', 'partition_chunk_size', or 'chunk_size_for_partition'.
aae9fb6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it to be very confusing to introduce both
chunk_size
andblock_size
From the users perspective these look very similar, even more s as (to the best of my knowledge) none of the algorithms would need both at the same time.