Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

heat.pow() speed-up when exponent is int #1141

Merged
merged 6 commits into from
Apr 24, 2023
Merged

heat.pow() speed-up when exponent is int #1141

merged 6 commits into from
Apr 24, 2023

Conversation

ClaudiaComito
Copy link
Contributor

@ClaudiaComito ClaudiaComito commented Apr 18, 2023

Description

This PR exploits torch.pow(a,b)'s performance when b is an integer. For a torch/heat performance comparison (on 1 process) for a few binary operations, see table below.

The problem was identified by @coquelin77 in #789 and #793. This PR replaces #793 which had gotten quite stale and difficult to update. Thanks again @coquelin77 for all the work.

Issue/s resolved: #789

Changes proposed:

  • In _operations.__binary_op In arithmetics.pow: introduce an early check for integer scalar terms, perform the torch operation immediately without further checks and return results.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Memory requirements

NA

Performance

a = randn(10000, 10000) (float64 torch tensor or float64 DNDarray respectively)

operation torch 1 proc main 1 proc 789-pow 1 proc
a * a 0.14 0.14 0.14
a + a 0.14 0.14 0.14
a + 2 0.14 0.14 0.14
a / a 0.14 0.14 0.14
a ** a 1.16 1.16 1.16
a ** 2 0.14 1.3 0.14 <---
2 ** a 0.56 0.56 0.56
a ** 2.5 1.03 1.03 1.03

Due Diligence

  • All split configurations tested
  • Multiple dtypes tested in relevant functions
  • Documentation updated (if needed)
  • Title of PR is suitable for corresponding CHANGELOG entry

Does this change modify the behaviour of other functions? If so, which?

no

@ClaudiaComito ClaudiaComito added bug Something isn't working enhancement New feature or request benchmarking labels Apr 18, 2023
@ClaudiaComito ClaudiaComito added this to the 1.3.0 milestone Apr 18, 2023
@ghost
Copy link

ghost commented Apr 18, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

@github-actions
Copy link
Contributor

Thank you for the PR!

@codecov
Copy link

codecov bot commented Apr 18, 2023

Codecov Report

Merging #1141 (ed242bc) into main (eaa90d6) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1141      +/-   ##
==========================================
+ Coverage   91.77%   91.79%   +0.01%     
==========================================
  Files          72       72              
  Lines       10485    10497      +12     
==========================================
+ Hits         9623     9636      +13     
+ Misses        862      861       -1     
Flag Coverage Δ
unit 91.79% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
heat/core/arithmetics.py 99.11% <100.00%> (+0.51%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@github-actions
Copy link
Contributor

Thank you for the PR!

@mtar
Copy link
Collaborator

mtar commented Apr 19, 2023

  • Why are floats excluded? torch.pow supports them.
  • Such an early exit would be a good fit for binary_op itself. All calling functions would benefit from the speedup.

mtar
mtar previously approved these changes Apr 24, 2023
@ClaudiaComito
Copy link
Contributor Author

Thanks @mtar.

  • Why are floats excluded? torch.pow supports them.

floats exponents will be dealt with in the general __binary_op(), torch.pow() optimization for integer exponents makes for a significant speedup, see table above.

* Such an early exit would be a good fit for `binary_op` itself. All calling functions would benefit from the speedup.

Turns out many binary operations in torch require both inputs to be tensors, so we need to wrap the ints into DNDarrays in those cases.

@github-actions
Copy link
Contributor

Thank you for the PR!

1 similar comment
@github-actions
Copy link
Contributor

Thank you for the PR!

@github-actions
Copy link
Contributor

Thank you for the PR!

@github-actions
Copy link
Contributor

Thank you for the PR!

@mtar
Copy link
Collaborator

mtar commented Apr 24, 2023

Fix broken link

@mtar mtar merged commit 9f6b2eb into main Apr 24, 2023
@mtar mtar deleted the bug/789-pow-performance branch April 24, 2023 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arithmetics benchmarking bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

performance issues on a single MPI process
2 participants