Tracing to reduce compilation impact #16

stavros11 · 2021-07-06T13:25:24Z

Fixes #15. I included dummy calls to all kernels during backend creation, so that they are executed when the user does import qibojit and enables the corresponding backend. Here are some benchmarks on CPU:

H gate

nqubits	dry_run_time_no_tracing	dry_run_time_tracing	simulation_time_no_tracing	simulation_time_tracing
3	0.03883	0.00101	0.00012	0.00018
4	0.04029	0.00094	0.00015	0.00017
5	0.03869	0.00102	0.00017	0.00020
6	0.03887	0.00126	0.00020	0.00022
7	0.04002	0.00107	0.00025	0.00025
8	0.03970	0.00122	0.00025	0.00034
9	0.03897	0.00123	0.00027	0.00029
10	0.04122	0.00123	0.00031	0.00032
11	0.03903	0.00116	0.00033	0.00034
12	0.04511	0.00120	0.00037	0.00045
13	0.04036	0.00130	0.00050	0.00041
14	0.03932	0.00123	0.00043	0.00044
15	0.03951	0.00115	0.00050	0.00050
16	0.04073	0.00190	0.00060	0.00061
17	0.03985	0.00215	0.00076	0.00071
18	0.04072	0.00241	0.00111	0.00119
19	0.04227	0.00304	0.00176	0.00174
20	0.04662	0.00497	0.00272	0.00293
21	0.05479	0.00838	0.00843	0.00807
22	0.05543	0.01318	0.01361	0.01444
23	0.07568	0.02717	0.02684	0.02605
24	0.14193	0.09118	0.07672	0.08226
25	0.26921	0.22878	0.21630	0.22906
26	0.54258	0.48731	0.46803	0.48327
27	1.04616	0.99224	0.99331	0.99309
28	2.06012	2.03047	1.99716	1.98241
29	4.22786	4.15998	4.09199	4.10959
30	8.55268	8.53007	8.45776	8.45357

X gate

nqubits	dry_run_time_no_tracing	dry_run_time_tracing	simulation_time_no_tracing	simulation_time_tracing
3	0.03906	0.00087	0.00012	0.00014
4	0.03890	0.00086	0.00016	0.00017
5	0.03908	0.00095	0.00018	0.00023
6	0.03861	0.00094	0.00020	0.00022
7	0.03835	0.00126	0.00024	0.00025
8	0.03899	0.00098	0.00026	0.00027
9	0.03889	0.00077	0.00028	0.00029
10	0.03898	0.00103	0.00030	0.00032
11	0.03935	0.00115	0.00031	0.00042
12	0.03890	0.00130	0.00036	0.00037
13	0.03929	0.00123	0.00040	0.00050
14	0.03911	0.00138	0.00043	0.00044
15	0.03947	0.00161	0.00047	0.00052
16	0.03960	0.00180	0.00052	0.00054
17	0.03951	0.00219	0.00072	0.00064
18	0.04126	0.00227	0.00115	0.00112
19	0.04216	0.00430	0.00150	0.00166
20	0.04686	0.00457	0.00217	0.00233
21	0.04941	0.00813	0.00813	0.00796
22	0.06655	0.01104	0.01187	0.01114
23	0.07404	0.02505	0.02061	0.02204
24	0.09690	0.08203	0.04291	0.07608
25	0.27174	0.22117	0.22328	0.21543
26	0.53316	0.47668	0.47367	0.46639
27	1.03854	0.97954	0.96412	0.96038
28	2.04613	2.00222	1.97958	1.97868
29	4.16524	4.12108	4.08635	4.08787
30	8.47480	8.44756	8.44690	8.42881

Z gate

nqubits	dry_run_time_no_tracing	dry_run_time_tracing	simulation_time_no_tracing	simulation_time_tracing
3	0.03887	0.00095	0.00012	0.00015
4	0.03896	0.00111	0.00015	0.00018
5	0.03876	0.00112	0.00018	0.00020
6	0.03962	0.00098	0.00020	0.00021
7	0.03977	0.00092	0.00025	0.00025
8	0.03938	0.00109	0.00028	0.00028
9	0.03890	0.00189	0.00027	0.00028
10	0.04000	0.00107	0.00030	0.00032
11	0.04269	0.00113	0.00032	0.00034
12	0.03977	0.00112	0.00035	0.00038
13	0.03990	0.00125	0.00038	0.00054
14	0.03889	0.00121	0.00041	0.00043
15	0.03854	0.00131	0.00044	0.00045
16	0.04293	0.00162	0.00049	0.00062
17	0.03925	0.00212	0.00061	0.00060
18	0.04039	0.00285	0.00096	0.00099
19	0.04362	0.00309	0.00140	0.00137
20	0.04329	0.00515	0.00187	0.00201
21	0.04815	0.01269	0.00721	0.00730
22	0.05148	0.01005	0.01086	0.01072
23	0.06545	0.01974	0.01901	0.01917
24	0.11028	0.05267	0.05032	0.04857
25	0.17808	0.13269	0.12970	0.12903
26	0.34624	0.31691	0.28981	0.29540
27	0.66572	0.59874	0.60049	0.60252
28	1.27480	1.21778	1.21357	1.20480
29	2.57943	2.52021	2.46787	2.47242
30	5.09619	5.11328	5.07868	5.08505

U1 gate

nqubits	dry_run_time_no_tracing	dry_run_time_tracing	simulation_time_no_tracing	simulation_time_tracing
3	0.03991	0.00531	0.00012	0.00012
4	0.03933	0.00512	0.00015	0.00014
5	0.03892	0.00506	0.00017	0.00017
6	0.03904	0.00563	0.00019	0.00020
7	0.03870	0.00519	0.00023	0.00023
8	0.04001	0.00523	0.00026	0.00025
9	0.03923	0.00612	0.00029	0.00034
10	0.04727	0.00524	0.00043	0.00030
11	0.03901	0.00544	0.00032	0.00033
12	0.03982	0.00529	0.00036	0.00036
13	0.04058	0.00534	0.00040	0.00039
14	0.04431	0.00545	0.00044	0.00043
15	0.03947	0.00646	0.00047	0.00056
16	0.03952	0.00571	0.00050	0.00053
17	0.03956	0.00630	0.00062	0.00060
18	0.04101	0.00613	0.00099	0.00096
19	0.04185	0.00628	0.00143	0.00140
20	0.04610	0.00929	0.00189	0.00196
21	0.04585	0.01174	0.00623	0.00745
22	0.05105	0.01568	0.01210	0.01122
23	0.07235	0.02469	0.01923	0.01878
24	0.09319	0.05991	0.03914	0.05181
25	0.17419	0.14351	0.12904	0.13499
26	0.33601	0.28957	0.29448	0.29540
27	0.66967	0.61394	0.60510	0.59597
28	1.27847	1.23286	1.21042	1.20997
29	2.57013	2.55760	2.48110	2.48676
30	5.18837	5.11972	5.08512	5.07799

Variational

nqubits	dry_run_time_no_tracing	dry_run_time_tracing	simulation_time_no_tracing	simulation_time_tracing
3	0.32801	0.00139	0.00025	0.00027
4	0.04223	0.00156	0.00034	0.00043
5	0.04756	0.00169	0.00039	0.00050
6	0.04242	0.00182	0.00050	0.00051
7	0.04255	0.00181	0.00055	0.00055
8	0.04273	0.00182	0.00070	0.00066
9	0.04276	0.00214	0.00069	0.00078
10	0.04325	0.00271	0.00077	0.00098
11	0.04326	0.00281	0.00090	0.00106
12	0.05009	0.00286	0.00093	0.00120
13	0.04305	0.00324	0.00101	0.00127
14	0.04337	0.00287	0.00112	0.00115
15	0.04333	0.00342	0.00129	0.00160
16	0.04381	0.00331	0.00149	0.00150
17	0.04742	0.00422	0.00178	0.00185
18	0.04780	0.00565	0.00294	0.00318
19	0.05194	0.00692	0.00383	0.00384
20	0.05415	0.01029	0.00612	0.00621
21	0.06286	0.01822	0.01495	0.01462
22	0.07207	0.02555	0.02596	0.02631
23	0.10188	0.05336	0.05225	0.05164
24	0.26765	0.15188	0.17923	0.13180
25	0.53185	0.51606	0.50273	0.50413
26	1.12934	1.10084	1.09080	1.07534
27	2.31296	2.27511	2.18202	2.18210
28	4.67028	4.62523	4.56733	4.56500
29	9.39009	9.43142	9.28382	9.28754
30	19.46835	19.47182	19.38908	19.41174

QFT

nqubits	dry_run_time_no_tracing	dry_run_time_tracing	simulation_time_no_tracing	simulation_time_tracing
3	0.34139	0.00609	0.00022	0.00035
4	0.04578	0.00690	0.00035	0.00043
5	0.04498	0.00635	0.00049	0.00049
6	0.04501	0.00651	0.00064	0.00066
7	0.04543	0.00654	0.00083	0.00085
8	0.04581	0.00671	0.00108	0.00110
9	0.04628	0.00823	0.00123	0.00161
10	0.04650	0.00726	0.00151	0.00155
11	0.04730	0.00797	0.00177	0.00185
12	0.04832	0.00805	0.00210	0.00214
13	0.04829	0.00836	0.00254	0.00244
14	0.04852	0.00907	0.00282	0.00280
15	0.05031	0.00975	0.00334	0.00322
16	0.05015	0.01051	0.00383	0.00384
17	0.06351	0.01182	0.00467	0.00452
18	0.05316	0.01525	0.00641	0.00723
19	0.05786	0.01739	0.00875	0.00849
20	0.06242	0.02166	0.01282	0.01265
21	0.07303	0.03035	0.02315	0.02308
22	0.09896	0.04934	0.03958	0.03902
23	0.15114	0.09357	0.08671	0.08417
24	0.30378	0.32358	0.21404	0.27303
25	1.06996	1.03187	0.95303	0.94591
26	2.65440	2.60621	2.60128	2.55157
27	5.59047	5.53308	5.50032	5.49488
28	11.66997	11.63067	11.55782	11.56300
29	24.48819	24.33579	24.25917	24.35532
30	51.28703	51.13320	51.03517	51.06885

As expected using tracing increases dry run performance for small circuits and up to 25 qubits, but it is pretty much useless for larger circuits. I still think it would be a useful feature to add as many applications involve executing small circuits.

I will implement something similar for GPU and update with the corresponding benchmarks here.

scarrazza · 2021-07-06T13:29:31Z

@stavros11 great, that looks good. I imagine the tracing takes just few milliseconds, right?

codecov · 2021-07-06T13:32:05Z

Codecov Report

Merging #16 (476caf0) into main (a4f1cf4) will decrease coverage by 0.34%.
The diff coverage is 94.11%.

@@             Coverage Diff             @@
##              main      #16      +/-   ##
===========================================
- Coverage   100.00%   99.65%   -0.35%     
===========================================
  Files            9        9              
  Lines          548      580      +32     
===========================================
+ Hits           548      578      +30     
- Misses           0        2       +2

Flag	Coverage Δ
unittests	`99.65% <94.11%> (-0.35%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/qibojit/custom_operators/backends.py	`98.01% <94.11%> (-1.99%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a4f1cf4...476caf0. Read the comment docs.

stavros11 · 2021-07-06T13:37:58Z

@stavros11 great, that looks good. I imagine the tracing takes just few milliseconds, right?

import time
start_time = time.time()
from qibojit import custom_operators as op
total_time = time.time() - start_time

takes about 0.72sec on this branch and 0.47sec on main. So strictly speaking any speed-up we get in the dry run time we lose it during import. I guess that with jit compilation has to happen at some point during execution so this time is probably unavoidable. With that in mind, I am not sure if implementing tracing is very useful after all. Let me know what you think.

scarrazza · 2021-07-06T13:54:50Z

From a technical perspective it doesn't matter as you say.

However the jit behaviour may introduce some "inconsistency" between runs, so unaware users may quote wrong performance values.

I believe this could be useful for GPU kernels too, even if the impact should be smaller.

stavros11 · 2021-07-07T08:47:21Z

I believe this could be useful for GPU kernels too, even if the impact should be smaller.

I repeated the same benchmark on GPU and in this case I find that using tracing has no impact at all:

H gate

nqubits	dry_run_time_no_tracing	dry_run_time_tracing	simulation_time_no_tracing	simulation_time_tracing
3	0.00032	0.00032	0.00015	0.00015
4	0.00038	0.00037	0.00020	0.00022
5	0.00041	0.00042	0.00022	0.00023
6	0.00045	0.00045	0.00026	0.00027
7	0.00054	0.00054	0.00031	0.00030
8	0.00059	0.00059	0.00034	0.00034
9	0.00063	0.00065	0.00039	0.00040
10	0.00070	0.00069	0.00044	0.00045
11	0.00077	0.00079	0.00051	0.00051
12	0.00084	0.00085	0.00057	0.00056
13	0.00089	0.00088	0.00061	0.00059
14	0.00091	0.00095	0.00062	0.00065
15	0.00099	0.00100	0.00069	0.00069
16	0.00105	0.00102	0.00225	0.00072
17	0.00135	0.00130	0.00077	0.00078
18	0.06811	0.00150	0.00111	0.00093
19	0.00186	0.00184	0.00130	0.00127
20	0.00243	0.00242	0.00183	0.00183
21	0.00359	0.00356	0.00299	0.00299
22	0.00589	0.07354	0.00534	0.00546
23	0.01075	0.01076	0.01018	0.01018
24	0.02094	0.02096	0.02027	0.02026
25	0.04200	0.09192	0.04118	0.04118
26	0.14565	0.08565	0.08462	0.10184
27	0.17592	0.22881	0.17454	0.18790
28	0.36323	0.42722	0.37065	0.36052
29	0.75002	0.74971	0.74495	0.74499
30	1.61108	1.64170	1.54516	1.54028

X gate

nqubits	dry_run_time_no_tracing	dry_run_time_tracing	simulation_time_no_tracing	simulation_time_tracing
3	0.00022	0.00022	0.00010	0.00010
4	0.00025	0.00024	0.00012	0.00012
5	0.00027	0.00027	0.00014	0.00013
6	0.00029	0.00030	0.00015	0.00016
7	0.00033	0.00034	0.00017	0.00017
8	0.00036	0.00037	0.00020	0.00019
9	0.00039	0.00039	0.00022	0.00021
10	0.00041	0.00041	0.00023	0.00024
11	0.00045	0.00044	0.00026	0.00027
12	0.00046	0.00047	0.00027	0.00028
13	0.00051	0.00049	0.00031	0.00031
14	0.00052	0.00052	0.00033	0.00033
15	0.00054	0.00054	0.00034	0.00034
16	0.00056	0.00056	0.00037	0.00287
17	0.04973	0.06510	0.00066	0.00063
18	0.04883	0.05023	0.00066	0.00046
19	0.04501	0.08884	0.00105	0.00100
20	0.05027	0.05102	0.00144	0.00151
21	0.00300	0.00301	0.00257	0.00258
22	0.07628	0.06503	0.00505	0.00503
23	0.04610	0.01025	0.00976	0.00977
24	0.02041	0.02040	0.01984	0.01984
25	0.04148	0.04145	0.04076	0.06163
26	0.08511	0.08509	0.09871	0.09724
27	0.17559	0.21436	0.17398	0.18694
28	0.36259	0.36267	0.37261	0.37340
29	0.74924	0.74927	0.74423	0.74425
30	1.62038	1.54776	1.54873	1.53825

Z gate

nqubits	dry_run_time_no_tracing	dry_run_time_tracing	simulation_time_no_tracing	simulation_time_tracing
3	0.00023	0.00022	0.00010	0.00010
4	0.00025	0.00024	0.00012	0.00012
5	0.00026	0.00027	0.00014	0.00014
6	0.00031	0.00029	0.00016	0.00016
7	0.00034	0.00036	0.00018	0.00018
8	0.00037	0.00036	0.00020	0.00019
9	0.00040	0.00039	0.00022	0.00021
10	0.00041	0.00042	0.00023	0.00024
11	0.00043	0.00045	0.00025	0.00027
12	0.00047	0.00046	0.00028	0.00028
13	0.00050	0.00050	0.00030	0.00030
14	0.00052	0.00051	0.00032	0.00032
15	0.00053	0.00054	0.00034	0.00033
16	0.00055	0.00057	0.00204	0.00160
17	0.00082	0.00085	0.00038	0.00039
18	0.05699	0.00087	0.00042	0.00044
19	0.05473	0.00100	0.00078	0.00056
20	0.00141	0.06491	0.00095	0.00111
21	0.02668	0.07699	0.00190	0.00204
22	0.03810	0.03718	0.00324	0.00318
23	0.03168	0.00592	0.00552	0.00545
24	0.04689	0.01133	0.01081	0.01077
25	0.03534	0.02254	0.02186	0.02185
26	0.04573	0.04569	0.04480	0.04473
27	0.09364	0.09360	0.10605	0.10520
28	0.19234	0.19237	0.18972	0.18970
29	0.39619	0.39610	0.39488	0.39863
30	0.81637	0.89814	0.80699	0.80696

U1 gate

nqubits	dry_run_time_no_tracing	dry_run_time_tracing	simulation_time_no_tracing	simulation_time_tracing
3	0.00045	0.00045	0.00015	0.00015
4	0.00054	0.00053	0.00018	0.00018
5	0.00063	0.00061	0.00022	0.00022
6	0.00070	0.00069	0.00026	0.00025
7	0.00077	0.00077	0.00029	0.00030
8	0.00086	0.00083	0.00031	0.00031
9	0.00093	0.00092	0.00037	0.00036
10	0.00099	0.00100	0.00039	0.00040
11	0.00108	0.00108	0.00044	0.00043
12	0.00121	0.00116	0.00049	0.00047
13	0.00123	0.00125	0.00050	0.00050
14	0.00133	0.00135	0.00058	0.00056
15	0.00198	0.00198	0.00057	0.00058
16	0.00203	0.00205	0.00062	0.00061
17	0.00236	0.00234	0.00064	0.00063
18	0.00246	0.04720	0.00070	0.00108
19	0.00262	0.00264	0.00086	0.00085
20	0.02501	0.00310	0.00135	0.00126
21	0.00372	0.00373	0.00190	0.00192
22	0.00498	0.00502	0.00315	0.00317
23	0.06422	0.00707	0.00586	0.00578
24	0.06328	0.01255	0.01114	0.01114
25	0.02387	0.07127	0.02224	0.02221
26	0.08621	0.09940	0.04512	0.04514
27	0.09502	0.09505	0.10559	0.10689
28	0.19394	0.19401	0.19025	0.19023
29	0.39809	0.39794	0.39649	0.39603
30	0.88143	0.81843	0.80871	0.80784

Variational

nqubits	dry_run_time_no_tracing	dry_run_time_tracing	simulation_time_no_tracing	simulation_time_tracing
3	0.00100	0.00100	0.00039	0.00040
4	0.00137	0.00131	0.00059	0.00057
5	0.00154	0.00152	0.00064	0.00066
6	0.00193	0.00186	0.00084	0.00080
7	0.00209	0.00210	0.00090	0.00091
8	0.00243	0.00239	0.00109	0.00105
9	0.00262	0.00265	0.00115	0.00115
10	0.00308	0.00304	0.00139	0.00138
11	0.00329	0.00323	0.00154	0.00152
12	0.00366	0.00363	0.00172	0.00175
13	0.00395	0.00381	0.00180	0.00178
14	0.00421	0.00425	0.00199	0.00205
15	0.00437	0.00446	0.00203	0.00209
16	0.00478	0.00485	0.00473	0.00453
17	0.05603	0.00523	0.00238	0.00233
18	0.00586	0.03288	0.00280	0.00275
19	0.00678	0.00667	0.00351	0.00348
20	0.06724	0.06173	0.00492	0.00491
21	0.01088	0.01098	0.00742	0.00749
22	0.01638	0.06145	0.01281	0.01288
23	0.02729	0.08072	0.02359	0.02361
24	0.05042	0.05032	0.04655	0.04647
25	0.14438	0.13746	0.09276	0.09353
26	0.19565	0.19560	0.19112	0.19116
27	0.42325	0.39497	0.38977	0.39482
28	0.81766	0.83431	0.81105	0.81117
29	1.67043	1.67034	1.67399	1.66941
30	3.47159	3.54215	3.46718	3.45827

QFT

nqubits	dry_run_time_no_tracing	dry_run_time_tracing	simulation_time_no_tracing	simulation_time_tracing
3	0.00081	0.00081	0.00042	0.00043
4	0.00127	0.00126	0.00071	0.00070
5	0.00181	0.00181	0.00104	0.00104
6	0.00253	0.00244	0.00146	0.00142
7	0.00329	0.00322	0.00193	0.00190
8	0.00414	0.00414	0.00249	0.00243
9	0.00517	0.00515	0.00306	0.00304
10	0.00628	0.00635	0.00379	0.00379
11	0.00754	0.00762	0.00454	0.00460
12	0.00902	0.00887	0.00549	0.00541
13	0.01035	0.01053	0.00628	0.00642
14	0.01207	0.01203	0.00738	0.00740
15	0.01389	0.01372	0.00845	0.00845
16	0.01613	0.01575	0.00977	0.00957
17	0.01782	0.02762	0.01214	0.01446
18	0.01983	0.05703	0.01185	0.01210
19	0.02309	0.02315	0.01433	0.01415
20	0.02721	0.02746	0.01741	0.01752
21	0.09203	0.03476	0.02455	0.02453
22	0.04747	0.09694	0.03565	0.03586
23	0.07034	0.07130	0.14096	0.13147
24	0.15085	0.15888	0.10488	0.10479
25	0.21557	0.21597	0.20146	0.20168
26	0.42341	0.42902	0.40797	0.48844
27	0.91766	0.85822	0.84159	0.89879
28	1.78227	1.78222	1.76484	1.76423
29	3.72861	3.72819	3.73509	3.75041
30	7.90598	7.86165	7.84191	7.84056

Using tracing also increases the import time from 0.28sec to 0.85sec (on GPU) so based on all these numbers it probably preferrable to not do tracing, at least for GPU. I am rerunning these benchmarks now to confirm, because I am not sure if this behavior is to be expected.

scarrazza · 2021-09-20T16:59:05Z

@stavros11 if you agree, I believe this PR can be closed, the JIT overhead time is not a real game changer at this point.
Do you agree?

stavros11 · 2021-09-21T11:54:32Z

@stavros11 if you agree, I believe this PR can be closed, the JIT overhead time is not a real game changer at this point.
Do you agree?

It is true that the compilation impact is small particularly for large qubit numbers, however I believe it is still an issue to consider for small qubit numbers. For example, in the comparison with qiskit here, qibojit appears to have a constant dry run time of 0.15sec for up to 20 qubits, while qiskit (and even qibotf) is orders of magnitude faster. This is quite important because when the user wants to execute the circuit only once then he will get the dry run performance instead of the simulation perfromance.

That being said, I am not sure if the approach proposed in this PR will solve this issue.

stavros11 added 4 commits July 5, 2021 20:59

Add op tracing during numba backend import

a29b0a3

Fix probs

69626ad

Merge branch 'main' into tracing

83e976e

Update docstring

1a52d7e

stavros11 requested a review from scarrazza July 6, 2021 13:25

Fix fSim tracing

8c46f4a

Tracing on GPU

476caf0

scarrazza closed this Oct 27, 2021

scarrazza deleted the tracing branch February 11, 2022 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracing to reduce compilation impact #16

Tracing to reduce compilation impact #16

stavros11 commented Jul 6, 2021

scarrazza commented Jul 6, 2021

codecov bot commented Jul 6, 2021 •

edited

Loading

stavros11 commented Jul 6, 2021

scarrazza commented Jul 6, 2021

stavros11 commented Jul 7, 2021

scarrazza commented Sep 20, 2021

stavros11 commented Sep 21, 2021

Tracing to reduce compilation impact #16

Tracing to reduce compilation impact #16

Conversation

stavros11 commented Jul 6, 2021

scarrazza commented Jul 6, 2021

codecov bot commented Jul 6, 2021 • edited Loading

Codecov Report

stavros11 commented Jul 6, 2021

scarrazza commented Jul 6, 2021

stavros11 commented Jul 7, 2021

scarrazza commented Sep 20, 2021

stavros11 commented Sep 21, 2021

codecov bot commented Jul 6, 2021 •

edited

Loading