Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add arrange storage tactic #60155

Merged
merged 4 commits into from
Dec 25, 2023

Conversation

BiynXu
Copy link
Contributor

@BiynXu BiynXu commented Dec 19, 2023

PR types

Others

PR changes

Others

Description

Pcard-74042

@@ -78,6 +86,14 @@ std::optional<bool> SymbolicExprAnalyzer::ProveNE(const ir::Expr& lhs,
if (diff.is_constant()) {
return diff.get_constant() != 0;
}
ir::Expr diff_lower_bound = LowerBound(diff);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use ProveEQ to implement ProveNE?

ProveEQ returns true <==> ProveNE returns false
ProveEQ returns false <==> ProveNE returns true
ProveEQ returns optional null <==> ProveNE return optional null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified.

@@ -456,5 +472,35 @@ std::optional<bool> SingleIntervalIntSet::ProveSuperSet(
return std::nullopt;
}

ir::Expr EnhancedSimplify(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the name, it does something like replacing Mod

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

VLOG(6) << "thread_load_range of [" << iter_values_of_load[i] << "] = ["
<< thread_load_range.Min() << " : " << thread_load_range.Max()
<< "]";
std::optional<bool> is_thread_sub_set =
Copy link
Member

@zhhsplendid zhhsplendid Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this logic strictly right?

Suppose the following kernel code snippet:

int N; // tensor length is N
tensor[threadIdx.x] = input[threadIdx.x] + 9; // threadIdx.x range is 0 ~ N-1
__syncthreads();
output[threadIdx.x] = tensor[(threadIdx.x + 1) % N]; // something like shift move

It seems the tensor store and load have same IntSet, but we cannot use local 1 in this case. It should be shared, you have to return CudaAxisType::kCudaThread in this case?

Although, my case may not happen in current CINN, so current logic is fine to me :-) , just a question here, this may not always mathmetically correct?

Copy link
Contributor Author

@BiynXu BiynXu Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IntSet Evaluate(Expr expr,
                const std::unordered_map<ir::Var, ir::Var>& fixed,
                const std::unordered_map<ir::Var, IntSet>& var_domain);

In this case, when we consider whether it is cross thread, threadIdx.x is fixed.
thread_store_range: [i : i] thread_load_range: [(threadIdx.x + 1) % N : (threadIdx.x + 1) % N]
thread_load_range is not a subset of thread_store_range, so it is cross thread.
when we consider whether it is cross block, BlockIdx.x/y/x is fixed, threadIdx.x will be replaced with min and max.
block_store_range: [0 : N - 1] block_load_range: [0 : N - 1]
block_load_range is a subset of block_store_range, so it is not cross block.
So we return CudaAxisType::kCudaThread in this case.
However, it poses a high requirement for simplifying symbolic expressions and IntSet, such as:
replace 0 and N - 1 in [(threadIdx.x + 1) % N : (threadIdx.x + 1) % N] ==> [0 : N]

ir::Expr load_block = load_and_block.second;
std::optional<CudaAxisType> cross_type =
AnalyzeCrossType(var2for_map, store, load, store_block, load_block);
if (!cross_type.has_value()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to me that the logic is:

If we don't know cross Block or cross Thread, set to local

Should we do a safer logic? Such as

If we know it won't cross Block or cross Thread, set to local
if we know cross Block, throw error
else set to shared

Note: the difference of two logic is the handling of unknown case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cross_type without a value indicates that it does not cross any threads or blocks.
In the AnalyzeCrossType, for safety reasons, if it is uncertain whether it crosses blocks, it will be considered as crossing blocks, and the same applies to cross threads.

Copy link
Member

@zhhsplendid zhhsplendid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Courtesy-Xs Courtesy-Xs merged commit 23af8cf into PaddlePaddle:develop Dec 25, 2023
29 checks passed
Wanglongzhi2001 pushed a commit to Wanglongzhi2001/Paddle that referenced this pull request Jan 7, 2024
* [CINN] Add arrange storage tactic

* [CINN] Apply tactics in dy group scheduler

* Polish codes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants