-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add arrange storage tactic #60155
Add arrange storage tactic #60155
Conversation
f53f97d
to
b195f32
Compare
paddle/cinn/common/integer_set.cc
Outdated
@@ -78,6 +86,14 @@ std::optional<bool> SymbolicExprAnalyzer::ProveNE(const ir::Expr& lhs, | |||
if (diff.is_constant()) { | |||
return diff.get_constant() != 0; | |||
} | |||
ir::Expr diff_lower_bound = LowerBound(diff); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use ProveEQ to implement ProveNE?
ProveEQ returns true <==> ProveNE returns false
ProveEQ returns false <==> ProveNE returns true
ProveEQ returns optional null <==> ProveNE return optional null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified.
paddle/cinn/common/integer_set.cc
Outdated
@@ -456,5 +472,35 @@ std::optional<bool> SingleIntervalIntSet::ProveSuperSet( | |||
return std::nullopt; | |||
} | |||
|
|||
ir::Expr EnhancedSimplify( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the name, it does something like replacing Mod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
VLOG(6) << "thread_load_range of [" << iter_values_of_load[i] << "] = [" | ||
<< thread_load_range.Min() << " : " << thread_load_range.Max() | ||
<< "]"; | ||
std::optional<bool> is_thread_sub_set = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this logic strictly right?
Suppose the following kernel code snippet:
int N; // tensor length is N
tensor[threadIdx.x] = input[threadIdx.x] + 9; // threadIdx.x range is 0 ~ N-1
__syncthreads();
output[threadIdx.x] = tensor[(threadIdx.x + 1) % N]; // something like shift move
It seems the tensor store and load have same IntSet, but we cannot use local 1 in this case. It should be shared, you have to return CudaAxisType::kCudaThread in this case?
Although, my case may not happen in current CINN, so current logic is fine to me :-) , just a question here, this may not always mathmetically correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IntSet Evaluate(Expr expr,
const std::unordered_map<ir::Var, ir::Var>& fixed,
const std::unordered_map<ir::Var, IntSet>& var_domain);
In this case, when we consider whether it is cross thread, threadIdx.x is fixed.
thread_store_range: [i : i] thread_load_range: [(threadIdx.x + 1) % N : (threadIdx.x + 1) % N]
thread_load_range is not a subset of thread_store_range, so it is cross thread.
when we consider whether it is cross block, BlockIdx.x/y/x is fixed, threadIdx.x will be replaced with min and max.
block_store_range: [0 : N - 1] block_load_range: [0 : N - 1]
block_load_range is a subset of block_store_range, so it is not cross block.
So we return CudaAxisType::kCudaThread in this case.
However, it poses a high requirement for simplifying symbolic expressions and IntSet, such as:
replace 0 and N - 1 in [(threadIdx.x + 1) % N : (threadIdx.x + 1) % N] ==> [0 : N]
ir::Expr load_block = load_and_block.second; | ||
std::optional<CudaAxisType> cross_type = | ||
AnalyzeCrossType(var2for_map, store, load, store_block, load_block); | ||
if (!cross_type.has_value()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to me that the logic is:
If we don't know cross Block or cross Thread, set to local
Should we do a safer logic? Such as
If we know it won't cross Block or cross Thread, set to local
if we know cross Block, throw error
else set to shared
Note: the difference of two logic is the handling of unknown case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cross_type without a value indicates that it does not cross any threads or blocks.
In the AnalyzeCrossType, for safety reasons, if it is uncertain whether it crosses blocks, it will be considered as crossing blocks, and the same applies to cross threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* [CINN] Add arrange storage tactic * [CINN] Apply tactics in dy group scheduler * Polish codes
PR types
Others
PR changes
Others
Description
Pcard-74042