Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lang] MatrixNdarray refactor part9: Add scalarization for AllocaStmt #6168

Merged
merged 10 commits into from
Sep 29, 2022

Conversation

jim19930609
Copy link
Contributor

@jim19930609 jim19930609 commented Sep 27, 2022

Related issue = #5873, #5819

This PR is working "Part ④" in #5873.

[AllocaStmt scalarization]

Before:
  TensorType<4 x i32>* addr = AllocaStmt(TensorType<4 x i32>)

After:
  i32 addr0 = AllocaStmt(i32)
  i32 addr1 = AllocaStmt(i32)
  i32 addr2 = AllocaStmt(i32)
  i32 addr3 = AllocaStmt(i32)

  scalarized_local_tensor_map_[addr] = {addr0, addr1, addr2, addr3}

[Load AllocaStmt]

Before:
  TensorType<4 x i32> val = LoadStmt(TensorType<4 x i32>* alloca_src)

After:
  i32 val0 = LoadStmt(scalarized_local_tensor_map_[stmt][0])
  i32 val1 = LoadStmt(scalarized_local_tensor_map_[stmt][1])
  i32 val2 = LoadStmt(scalarized_local_tensor_map_[stmt][2])
  i32 val3 = LoadStmt(scalarized_local_tensor_map_[stmt][3])

  tmp = MatrixInitStmt(val0, val1, val2, val3)
  stmt->replace_all_usages_with(tmp)

[Store to AllocaStmt]

Before:
  StoreStmt(TensorType<4 x i32>* alloca_dest_stmt, TensorType<4 x i32> val)

After:
  StoreStmt(i32* scalarized_local_tensor_map_[stmt][0], 
            i32 val->cast<MatrixInitStmt>()->val[0]) 
  StoreStmt(i32* scalarized_local_tensor_map_[stmt][1], 
            i32 val->cast<MatrixInitStmt>()->val[1]) 
  StoreStmt(i32* scalarized_local_tensor_map_[stmt][2], 
            i32 val->cast<MatrixInitStmt>()->val[2]) 
  StoreStmt(i32* scalarized_local_tensor_map_[stmt][3], 
            i32 val->cast<MatrixInitStmt>()->val[3])

@netlify
Copy link

netlify bot commented Sep 27, 2022

Deploy Preview for docsite-preview ready!

Name Link
🔨 Latest commit 431ff4c
🔍 Latest deploy log https://app.netlify.com/sites/docsite-preview/deploys/63352275f0d76c000889f8a5
😎 Deploy Preview https://deploy-preview-6168--docsite-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@jim19930609
Copy link
Contributor Author

jim19930609 commented Sep 27, 2022

Gonna enable more python tests in following PRs

@AD1024
Copy link
Contributor

AD1024 commented Sep 27, 2022

Thanks for implementing this! LGTM.
Are we leveraging the CFG pass to eleminate that MatrixInitStmt holding pointers to scalarized values?

@jim19930609
Copy link
Contributor Author

Thanks for implementing this! LGTM. Are we leveraging the CFG pass to eleminate that MatrixInitStmt holding pointers to scalarized values?

I dont think MatrixInitStmt is holding any pointer-typed values? Are you referring to the redundant MatrixInitStmt inserted during scalarization, those are remove right after scalarization is done (There's will be MatrixInitCleanUp pass).

Copy link
Contributor

@strongoier strongoier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I think having a case in the LowerMatrixPtr pass is much easier? You don't have to modify the Scalarize pass at all.

@jim19930609
Copy link
Contributor Author

jim19930609 commented Sep 28, 2022

Hmm.. I think having a case in the LowerMatrixPtr pass is much easier? You don't have to modify the Scalarize pass at all.

But it is trying to scalarize the AllocaStmt(TensorType) into AllocaStmt(scalar0), ... thus well fits the scalarization pass?

The other thing with LowerMatrixPtr is that it doesn't rely on whether "scalarized" is turned-on or not, but scalarization for AllocaStmt should only be enabled when we decide to do so, thus I feel it's better to decouple these two passes?

@strongoier
Copy link
Contributor

strongoier commented Sep 28, 2022

But it is trying to scalarize the AllocaStmt(TensorType) into AllocaStmt(scalar0), ... thus well fits the scalarization pass?

IIUC Scalarization aims at faking a tensor operation with a bunch of scalar operations, and LowerMatrixPtr aims at eliminating intermediate matrix ptrs. I agree that AllocaStmt(TensorType) -> AllocaStmt(scalar0), ... should be put in Scalarization, but I think PtrOffsetStmt(AllocaStmt) should be better handled in LowerMatrixPtr. Thus theoretically we need an intermediate representation like MatrixOfPtrStmt.

However, considering that AllocaStmt(TensorType) -> AllocaStmt(scalar0), ... may not be needed in the end, and we haven't decided which parts are mandatory or optional, I think we don't need to be so strict with pass separation for now. The only suggestion here may be to simplify the implementation a bit - it seems that visit(PtrOffsetStmt) already covers the changes in visit(LoadStmt) / visit(StoreStmt) / ...?

@jim19930609
Copy link
Contributor Author

But it is trying to scalarize the AllocaStmt(TensorType) into AllocaStmt(scalar0), ... thus well fits the scalarization pass?

IIUC Scalarization aims at faking a tensor operation with a bunch of scalar operations, and LowerMatrixPtr aims at eliminating intermediate matrix ptrs. I agree that AllocaStmt(TensorType) -> AllocaStmt(scalar0), ... should be put in Scalarization, but I think PtrOffsetStmt(AllocaStmt) should be better handled in LowerMatrixPtr. Thus theoretically we need an intermediate representation like MatrixOfPtrStmt.

However, considering that AllocaStmt(TensorType) -> AllocaStmt(scalar0), ... may not be needed in the end, and we haven't decided which parts are mandatory or optional, I think we don't need to be so strict with pass separation for now. The only suggestion here may be to simplify the implementation a bit - it seems that visit(PtrOffsetStmt) already covers the changes in visit(LoadStmt) / visit(StoreStmt) / ...?

Simplified irpass::scalarize() but still kept scalarization for AllocaStmt and PtrOffsetStmt as part of irpass::scalarize(). The barrier to split "scalarization for AllocaStmt" and "optimization for PtrOffsetStmt(AllocaStmt)" lies in the fact that we need this scalarized_local_tensor_map_ to record the mapping from original AllocaStmt to the scalarized ones, and transferring this scalarized_local_tensor_map_ across passes will be implementation-wisely ugly.

Copy link
Contributor

@strongoier strongoier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!!

taichi/transforms/scalarize.cpp Outdated Show resolved Hide resolved
taichi/transforms/scalarize.cpp Show resolved Hide resolved
tests/cpp/transforms/scalarize_test.cpp Outdated Show resolved Hide resolved
taichi/transforms/scalarize.cpp Outdated Show resolved Hide resolved
taichi/transforms/scalarize.cpp Outdated Show resolved Hide resolved
taichi/transforms/scalarize.cpp Outdated Show resolved Hide resolved
@jim19930609 jim19930609 merged commit 8be973b into taichi-dev:master Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants