Add arrange storage tactic #60155

BiynXu · 2023-12-19T12:41:00Z

PR types

Others

PR changes

Others

Description

Pcard-74042

zhhsplendid · 2023-12-22T02:30:46Z

paddle/cinn/common/integer_set.cc

@@ -78,6 +86,14 @@ std::optional<bool> SymbolicExprAnalyzer::ProveNE(const ir::Expr& lhs,
  if (diff.is_constant()) {
    return diff.get_constant() != 0;
  }
+  ir::Expr diff_lower_bound = LowerBound(diff);


Can we use ProveEQ to implement ProveNE?

ProveEQ returns true <==> ProveNE returns false
ProveEQ returns false <==> ProveNE returns true
ProveEQ returns optional null <==> ProveNE return optional null

zhhsplendid · 2023-12-22T02:32:10Z

paddle/cinn/common/integer_set.cc

@@ -456,5 +472,35 @@ std::optional<bool> SingleIntervalIntSet::ProveSuperSet(
  return std::nullopt;
 }

+ir::Expr EnhancedSimplify(


Change the name, it does something like replacing Mod

zhhsplendid · 2023-12-22T03:10:29Z

paddle/cinn/ir/group_schedule/tactic/arrange_storage_tactic.cc

+    VLOG(6) << "thread_load_range of [" << iter_values_of_load[i] << "] = ["
+            << thread_load_range.Min() << " : " << thread_load_range.Max()
+            << "]";
+    std::optional<bool> is_thread_sub_set =


Is this logic strictly right?

Suppose the following kernel code snippet:

int N; // tensor length is N tensor[threadIdx.x] = input[threadIdx.x] + 9; // threadIdx.x range is 0 ~ N-1 __syncthreads(); output[threadIdx.x] = tensor[(threadIdx.x + 1) % N]; // something like shift move

It seems the tensor store and load have same IntSet, but we cannot use local 1 in this case. It should be shared, you have to return CudaAxisType::kCudaThread in this case?

Although, my case may not happen in current CINN, so current logic is fine to me :-) , just a question here, this may not always mathmetically correct?

IntSet Evaluate(Expr expr, const std::unordered_map<ir::Var, ir::Var>& fixed, const std::unordered_map<ir::Var, IntSet>& var_domain)；

In this case, when we consider whether it is cross thread, threadIdx.x is fixed.
thread_store_range: [i : i] thread_load_range: [(threadIdx.x + 1) % N : (threadIdx.x + 1) % N]
thread_load_range is not a subset of thread_store_range, so it is cross thread.
when we consider whether it is cross block, BlockIdx.x/y/x is fixed, threadIdx.x will be replaced with min and max.
block_store_range: [0 : N - 1] block_load_range: [0 : N - 1]
block_load_range is a subset of block_store_range, so it is not cross block.
So we return CudaAxisType::kCudaThread in this case.
However, it poses a high requirement for simplifying symbolic expressions and IntSet, such as:
replace 0 and N - 1 in [(threadIdx.x + 1) % N : (threadIdx.x + 1) % N] ==> [0 : N]

zhhsplendid · 2023-12-22T03:18:08Z

paddle/cinn/ir/group_schedule/tactic/arrange_storage_tactic.cc

+    ir::Expr load_block = load_and_block.second;
+    std::optional<CudaAxisType> cross_type =
+        AnalyzeCrossType(var2for_map, store, load, store_block, load_block);
+    if (!cross_type.has_value()) {


This seems to me that the logic is:

If we don't know cross Block or cross Thread, set to local

Should we do a safer logic? Such as

If we know it won't cross Block or cross Thread, set to local
if we know cross Block, throw error
else set to shared

Note: the difference of two logic is the handling of unknown case.

The cross_type without a value indicates that it does not cross any threads or blocks.
In the AnalyzeCrossType, for safety reasons, if it is uncertain whether it crosses blocks, it will be considered as crossing blocks, and the same applies to cross threads.

zhhsplendid

LGTM

* [CINN] Add arrange storage tactic * [CINN] Apply tactics in dy group scheduler * Polish codes

BiynXu added 3 commits December 19, 2023 11:02

[CINN] Add arrange storage tactic

d0da994

[CINN] Apply tactics in dy group scheduler

792c612

Merge branch 'develop' into add_arrange_storage_tactic

b195f32

BiynXu force-pushed the add_arrange_storage_tactic branch from f53f97d to b195f32 Compare December 21, 2023 12:08

zhhsplendid reviewed Dec 22, 2023

View reviewed changes

Polish codes

2ca9c92

zhhsplendid approved these changes Dec 22, 2023

View reviewed changes

Courtesy-Xs merged commit 23af8cf into PaddlePaddle:develop Dec 25, 2023
29 checks passed

Wanglongzhi2001 pushed a commit to Wanglongzhi2001/Paddle that referenced this pull request Jan 7, 2024

Add arrange storage tactic (PaddlePaddle#60155)

94bbd16

* [CINN] Add arrange storage tactic * [CINN] Apply tactics in dy group scheduler * Polish codes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add arrange storage tactic #60155

Add arrange storage tactic #60155

BiynXu commented Dec 19, 2023

zhhsplendid Dec 22, 2023

BiynXu Dec 22, 2023

zhhsplendid Dec 22, 2023

BiynXu Dec 22, 2023

zhhsplendid Dec 22, 2023 •

edited

Loading

BiynXu Dec 22, 2023 •

edited

Loading

zhhsplendid Dec 22, 2023

BiynXu Dec 22, 2023

zhhsplendid left a comment

Add arrange storage tactic #60155

Add arrange storage tactic #60155

Conversation

BiynXu commented Dec 19, 2023

PR types

PR changes

Description

zhhsplendid Dec 22, 2023

Choose a reason for hiding this comment

BiynXu Dec 22, 2023

Choose a reason for hiding this comment

zhhsplendid Dec 22, 2023

Choose a reason for hiding this comment

BiynXu Dec 22, 2023

Choose a reason for hiding this comment

zhhsplendid Dec 22, 2023 • edited Loading

Choose a reason for hiding this comment

BiynXu Dec 22, 2023 • edited Loading

Choose a reason for hiding this comment

zhhsplendid Dec 22, 2023

Choose a reason for hiding this comment

BiynXu Dec 22, 2023

Choose a reason for hiding this comment

zhhsplendid left a comment

Choose a reason for hiding this comment

zhhsplendid Dec 22, 2023 •

edited

Loading

BiynXu Dec 22, 2023 •

edited

Loading