-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] PFT rewrite-based do-concurrent parallelization #230
base: amd-trunk-dev
Are you sure you want to change the base?
Conversation
This is a proof of concept on a PFT rewrite-based approach to do OpenMP-based parallelization of `do concurrent` Fotran loops. The main advantage of this approach over an MLIR pass-based one is that it should allow us to avoid re-implementing and sharing significant pieces of PFT to MLIR lowering between Flang lowering and the MLIR pass. The current WIP replicates the PFT structure of an `!$omp parallel do` when encountering a `do concurrent` loop. It is still in very early stages and the resulting PFT cannot be lowered to MLIR yet, as it seems to be missing some symbol updates. However, it can already be tested: ```sh $ cat test.f90 subroutine foo() implicit none integer :: i do concurrent(i=1:10) end do !$omp parallel do do i=1,10 end do end subroutine $ flang-new -fc1 -fdebug-unparse -fopenmp test.f90 SUBROUTINE foo IMPLICIT NONE INTEGER i !$OMP PARALLEL DO DO i=1_4,10_4 END DO !$OMP PARALLEL DO DO i=1_4,10_4 END DO END SUBROUTINE $ flang-new -fc1 -fdebug-dump-parse-tree -fopenmp test.f90 Program -> ProgramUnit -> SubroutineSubprogram | SubroutineStmt | | Name = 'foo' | SpecificationPart | | ImplicitPart -> ImplicitPartStmt -> ImplicitStmt -> | | DeclarationConstruct -> SpecificationConstruct -> TypeDeclarationStmt | | | DeclarationTypeSpec -> IntrinsicTypeSpec -> IntegerTypeSpec -> | | | EntityDecl | | | | Name = 'i' | ExecutionPart -> Block | | ExecutionPartConstruct -> ExecutableConstruct -> OpenMPConstruct -> OpenMPLoopConstruct | | | OmpBeginLoopDirective | | | | OmpLoopDirective -> llvm::omp::Directive = parallel do | | | | OmpClauseList -> | | | DoConstruct | | | | NonLabelDoStmt | | | | | LoopControl -> LoopBounds | | | | | | Scalar -> Name = 'i' | | | | | | Scalar -> Expr = '1_4' | | | | | | | LiteralConstant -> IntLiteralConstant = '1' | | | | | | Scalar -> Expr = '10_4' | | | | | | | LiteralConstant -> IntLiteralConstant = '10' | | | | Block | | | | EndDoStmt -> | | ExecutionPartConstruct -> ExecutableConstruct -> OpenMPConstruct -> OpenMPLoopConstruct | | | OmpBeginLoopDirective | | | | OmpLoopDirective -> llvm::omp::Directive = parallel do | | | | OmpClauseList -> | | | DoConstruct | | | | NonLabelDoStmt | | | | | LoopControl -> LoopBounds | | | | | | Scalar -> Name = 'i' | | | | | | Scalar -> Expr = '1_4' | | | | | | | LiteralConstant -> IntLiteralConstant = '1' | | | | | | Scalar -> Expr = '10_4' | | | | | | | LiteralConstant -> IntLiteralConstant = '10' | | | | Block | | | | EndDoStmt -> | EndSubroutineStmt -> ```
Thanks Sergio for working on this. At this stage, this definitely looks simpler than the pass solution. The initial WIP for the pass (here: https://github.com/llvm/llvm-project/pull/77285/files) was similar to your proposal in terms of simplicity; if you remove all comments, tests, and boilerplate, you end up with a few lines of logic to do the actual conversion. However, I do understand that PFT rewriting is going to probably be much simpler than the pass when we map to One important point I would like to make clear: the current issues we are facing now with Additionally, the PFT rewriting approach is quite simpler but, I think, is quite limiting as well. For the following reasons:
Admittedly, the pass looks like a lot of code compared to the PFT rewriting at the current stage of the PR. However, much of that code are:
The pass has been validated on LBL's inference engine (which is a quite large codebase with annoying features):
I have to admit that I am biased though. The pass is one of my ugly babies that I contributed since I joined the team. Therefore, adding Michael Klemm and Michael Kruse to chime in. Maybe they have further input. And it is a very nice dicussion to have reglardless of the result, so thanks for opening the WIP. |
I share @ergawy concerns here. DO CONCURRENT should regularly need program analysis, for instance regarding localization rules. Just adding a For our first implementation that explicitly requires to be user-enabled using |
I also agree that for a proper translation of
|
This is a proof of concept on a PFT rewrite-based approach to do OpenMP-based parallelization of
do concurrent
Fotran loops. The main advantage of this approach over an MLIR pass-based one is that it should allow us to avoid re-implementing and sharing significant pieces of PFT to MLIR lowering between Flang lowering and the MLIR pass, potentially also making it much simpler to keep feature parity.The current WIP replicates the PFT structure of an
!$omp parallel do
when encountering ado concurrent
loop. It is still in very early stages and the resulting PFT cannot be lowered to MLIR yet, as it seems to be missing some symbol updates. However, it can already be tested: