Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Jacobi update second step vertex #49

Merged
merged 1 commit into from
Oct 18, 2023

Conversation

balancap
Copy link
Contributor

@balancap balancap commented Oct 17, 2023

The second step of the Jacobi algorithm is an usual "sparse" access
update pattern, which can not be simply optimized with a rpt loop.

The loop is intrinsically limited by the number of load/store
operations. Nevertheless, by using aop outer product intrinsic +
unrolling of 2 steps, the bundling of operations in the loop can be
massively improved, leading to 40% decrease of cycle count.

Additionally, this PR is adding a thin AMP C++ abstraction, allowing
a simple implementation between IPU hardware and IPU model.

@balancap balancap marked this pull request as draft October 17, 2023 11:12
@balancap balancap force-pushed the optimize-jacobi-update-second-step-vertex branch 3 times, most recently from dc61601 to 6ab4883 Compare October 17, 2023 16:16
@balancap balancap closed this Oct 17, 2023
@balancap balancap reopened this Oct 17, 2023
@balancap balancap marked this pull request as ready for review October 17, 2023 16:17
@balancap balancap force-pushed the optimize-jacobi-update-second-step-vertex branch from 6ab4883 to 6551867 Compare October 17, 2023 16:32
@balancap balancap changed the title Optimize jacobi update second step vertex Optimize Jacobi update second step vertex Oct 17, 2023
The second step of the Jacobi algorithm is an usual "sparse" access
update pattern, which can not be simply optimized with a `rpt` loop.

The loop is intrinsically limited by the number of load/store
operations. Nevertheless, by using `aop` outer product intrinsic +
unrolling of 2 steps, the bundling of operations in the loop can be
massively improved, leading to 40% decrease of cycle count.

Additionally, this PR is adding a thin AMP C++ abstraction, allowing
a simple implementation between IPU hardware and IPU model.
@balancap balancap force-pushed the optimize-jacobi-update-second-step-vertex branch from 6551867 to 1335dbb Compare October 17, 2023 19:01
@balancap balancap merged commit 2aea6a7 into main Oct 18, 2023
6 checks passed
@balancap balancap deleted the optimize-jacobi-update-second-step-vertex branch October 18, 2023 09:32
balancap added a commit that referenced this pull request Oct 18, 2023
The recent improvement in PR #49 introduced a regression in Jacobi
`eigh`, raising an error when size % 4 == 2. This is due to the partial
loop unrolling in Jacobi second update stage.

This PR is fixing the issue by passing explicitely the offset and size
of the workload to the vertex.
balancap added a commit that referenced this pull request Oct 18, 2023
The recent improvement in PR #49 introduced a regression in Jacobi
`eigh`, raising an error when size % 4 == 2. This is due to the partial
loop unrolling in Jacobi second update stage.

This PR is fixing the issue by passing explicitely the offset and size
of the workload to the vertex.
balancap added a commit that referenced this pull request Oct 19, 2023
The recent improvement in PR #49 introduced a regression in Jacobi
`eigh`, raising an error when size % 4 == 2. This is due to the partial
loop unrolling in Jacobi second update stage.

This PR is fixing the issue by passing explicitely the offset and size
of the workload to the vertex.
balancap added a commit that referenced this pull request Oct 19, 2023
The recent improvement in PR #49 introduced a regression in Jacobi
`eigh`, raising an error when size % 4 == 2. This is due to the partial
loop unrolling in Jacobi second update stage.

This PR is fixing the issue by passing explicitely the offset and size
of the workload to the vertex.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant