-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracing to reduce compilation impact #16
Conversation
@stavros11 great, that looks good. I imagine the tracing takes just few milliseconds, right? |
Codecov Report
@@ Coverage Diff @@
## main #16 +/- ##
===========================================
- Coverage 100.00% 99.65% -0.35%
===========================================
Files 9 9
Lines 548 580 +32
===========================================
+ Hits 548 578 +30
- Misses 0 2 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
import time
start_time = time.time()
from qibojit import custom_operators as op
total_time = time.time() - start_time takes about 0.72sec on this branch and 0.47sec on main. So strictly speaking any speed-up we get in the dry run time we lose it during import. I guess that with jit compilation has to happen at some point during execution so this time is probably unavoidable. With that in mind, I am not sure if implementing tracing is very useful after all. Let me know what you think. |
From a technical perspective it doesn't matter as you say. However the jit behaviour may introduce some "inconsistency" between runs, so unaware users may quote wrong performance values. I believe this could be useful for GPU kernels too, even if the impact should be smaller. |
I repeated the same benchmark on GPU and in this case I find that using tracing has no impact at all: H gate
X gate
Z gate
U1 gate
Variational
QFT
Using tracing also increases the import time from 0.28sec to 0.85sec (on GPU) so based on all these numbers it probably preferrable to not do tracing, at least for GPU. I am rerunning these benchmarks now to confirm, because I am not sure if this behavior is to be expected. |
@stavros11 if you agree, I believe this PR can be closed, the JIT overhead time is not a real game changer at this point. |
It is true that the compilation impact is small particularly for large qubit numbers, however I believe it is still an issue to consider for small qubit numbers. For example, in the comparison with qiskit here, qibojit appears to have a constant dry run time of 0.15sec for up to 20 qubits, while qiskit (and even qibotf) is orders of magnitude faster. This is quite important because when the user wants to execute the circuit only once then he will get the dry run performance instead of the simulation perfromance. That being said, I am not sure if the approach proposed in this PR will solve this issue. |
Fixes #15. I included dummy calls to all kernels during backend creation, so that they are executed when the user does
import qibojit
and enables the corresponding backend. Here are some benchmarks on CPU:H gate
X gate
Z gate
U1 gate
Variational
QFT
As expected using tracing increases dry run performance for small circuits and up to 25 qubits, but it is pretty much useless for larger circuits. I still think it would be a useful feature to add as many applications involve executing small circuits.
I will implement something similar for GPU and update with the corresponding benchmarks here.