Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark external libraries #11

Merged
merged 322 commits into from
Mar 15, 2022
Merged

Benchmark external libraries #11

merged 322 commits into from
Mar 15, 2022

Conversation

stavros11
Copy link
Member

@stavros11 stavros11 commented Aug 9, 2021

Adds a template and script for benchmarking external quantum simulation libraries different than Qibo (fixes #10). We should cover at least the libraries included in HyQuas benchmark paper. Here is a list of required libraries:

Python:

  • QCGPU
  • Qiskit
  • Qulacs
  • qsim (+ cuQuantum)
  • qibotf, tensorflow
  • projectq
  • hybridq NASA
  • cuQuantum?

These benchmarks can be executed using the new compare.py script and the library is selected using the --library flag.

The supported libraries are defined under benchmarks/libaries and the goal is to support all circuits included the Qibo benchmark for all libraries. This works by defining every circuit using OpenQASM and then build each library's circuit from this. This is straightforward for libraries that have built-in Qasm loaders such as Qiskit and Qibo, while for the rest (eg. Qulacs) I use the Qasm parser we have in Qibo modified to add the gates from the corresponding library. All circuits we have here can be written in the Qasm format we support in Qibo except perhaps QAOA which contains some RZZ gates which we do not have built-in in Qibo.

Next steps for this PR:

  • Write QAOA circuit in Qasm format that is succesfully read by all libraries.
  • Add the above libraries.
  • Add GPU support for the libraries that support it (currently only CPU is implemented).
  • Add possibility to use different qibo backends (eg. instead of just --library qibo we should have --library qibojit, etc.)
  • Run benchmarks.

Note: I noticed that Qibo's U2 and U3 gates follow a different parameter convention when compared to Qiskit and other libraries. For example check our docs vs Qiskit's docs. This should not affect performance which is what we mainly care about here but it may confuse users that use these gates for other applications as it will change results. The main issue is that for example parsing u3(0.1,0.2,0.3) q[0]; from Qasm will create a different gate in Qibo and a different in Qiskit (and others). I guess Qiskit should be the reference for such conventions given that Qasm is developed by IBM.

@stavros11
Copy link
Member Author

stavros11 commented Aug 10, 2021

Here are some numbers using the compare.py script for Qibo (default qibojit backend), qiskit and qulacs, all on qibo machine CPU. Note that unlike our first paper Qiskit is using all threads and performance is particularly good. I confirmed in several cases that the correct wavefunction is returned, so the simulation is not skipped. I am not sure if they do some kind of circuit simplification to achieve that performance.

@scarrazza you can confirm Qiskit's performance by running something simple, eg a QFT for 30 qubits: python compare.py --nqubits 30 --circuit qft --library qiskit. On qibo machine this takes 37sec with Qiskit, 50sec with Qibo and 80sec with Qulacs.

CPU - dry run times - qft
nqubits qibo qibotf dry_run_time_qiskit dry_run_time_qulacs
3 0.14609 0.01614 0.00141 0.00003
4 0.14219 0.02567 0.00114 0.00003
5 0.13952 0.01790 0.00106 0.00004
6 0.14063 0.01482 0.00114 0.00005
7 0.14224 0.03153 0.00129 0.00004
8 0.14368 0.02951 0.00160 0.00008
9 0.14293 0.02227 0.00157 0.00015
10 0.14489 0.02659 0.00196 0.00029
11 0.14266 0.02941 0.00227 0.00067
12 0.14414 0.02864 0.00305 0.00095
13 0.14532 0.01835 0.00476 0.04251
14 0.14601 0.02937 0.00911 0.08639
15 0.15146 0.02963 0.03047 0.05059
16 0.15435 0.03529 0.03393 0.05788
17 0.14970 0.03201 0.03692 0.08698
18 0.15598 0.03784 0.04524 0.04941
19 0.15866 0.03899 0.05674 0.16675
20 0.16004 0.04898 0.07397 0.10116
21 0.17945 0.06781 0.09618 0.16086
22 0.18606 0.09782 0.11774 0.17884
23 0.24259 0.19192 0.17108 0.30569
24 0.48633 0.41005 0.38000 0.62176
25 1.13068 0.95941 0.91085 1.52635
26 2.75894 2.47023 1.88619 3.97448
CPU - dry run times - variational
nqubits qibo qibotf dry_run_time_qiskit dry_run_time_qulacs
3 0.13821 0.02626 0.00127 0.00005
4 0.13915 0.01163 0.00100 0.00004
5 0.13660 0.02306 0.00105 0.00004
6 0.13939 0.01711 0.00124 0.00005
7 0.13899 0.01630 0.00120 0.00006
8 0.14083 0.04023 0.00121 0.00010
9 0.13952 0.03896 0.00158 0.00009
10 0.13938 0.02687 0.00136 0.00026
11 0.14355 0.01670 0.00151 0.00051
12 0.14013 0.02445 0.00168 0.00108
13 0.14623 0.01961 0.00210 0.06769
14 0.14328 0.03573 0.00346 0.05836
15 0.14111 0.01913 0.00764 0.08532
16 0.14549 0.02167 0.00842 0.03759
17 0.14837 0.02696 0.00839 0.07152
18 0.14382 0.02851 0.00863 0.08242
19 0.14968 0.02721 0.01136 0.06706
20 0.15033 0.03511 0.01158 0.09843
21 0.15262 0.04414 0.01776 0.11228
22 0.16749 0.05572 0.02055 0.09962
23 0.20121 0.12087 0.03393 0.13973
24 0.34273 0.26135 0.05600 0.30457
25 0.66060 0.56991 0.14690 0.74243
26 1.22694 1.18731 0.28349 1.44867
CPU - dry run times - bv
nqubits qibo qibotf dry_run_time_qiskit dry_run_time_qulacs
3 0.14355 0.02230 0.00104 0.00004
4 0.15346 0.05262 0.00108 0.00004
5 0.14235 0.01966 0.00124 0.00004
6 0.14811 0.02366 0.00121 0.00005
7 0.14192 0.02395 0.00132 0.00004
8 0.14303 0.03168 0.00112 0.00005
9 0.14264 0.02597 0.00123 0.00006
10 0.14080 0.03298 0.00126 0.00009
11 0.14738 0.01784 0.00157 0.00015
12 0.14324 0.02217 0.00163 0.00027
13 0.13582 0.02507 0.00253 0.08233
14 0.14438 0.02631 0.00298 0.08716
15 0.14555 0.01531 0.00741 0.05170
16 0.14405 0.02410 0.00792 0.04020
17 0.15019 0.02785 0.00842 0.06130
18 0.14511 0.02288 0.00869 0.04584
19 0.14929 0.02792 0.00971 0.04737
20 0.15472 0.02289 0.01119 0.04714
21 0.15934 0.04284 0.01612 0.07518
22 0.16652 0.04939 0.02037 0.08824
23 0.20038 0.09279 0.03760 0.12448
24 0.34668 0.29527 0.06302 0.28741
25 0.68856 0.60402 0.14585 0.74427
26 1.34578 1.25987 0.29990 1.56413
CPU - dry run times - supremacy
nqubits qibo qibotf dry_run_time_qiskit dry_run_time_qulacs
3 0.04730 0.00359 0.00142 0.00005
4 0.04759 0.00379 0.00136 0.00005
5 0.04874 0.01179 0.00153 0.00005
6 0.05135 0.00473 0.00161 0.00005
7 0.04738 0.00469 0.00159 0.00006
8 0.04834 0.00498 0.00171 0.00006
9 0.05123 0.00518 0.00171 0.00007
10 0.05125 0.00554 0.00181 0.00010
11 0.05401 0.00590 0.00174 0.00017
12 0.04822 0.00623 0.00215 0.00556
13 0.04880 0.00647 0.00416 0.00696
14 0.04886 0.00697 0.00378 0.01016
15 0.04925 0.00717 0.00946 0.02494
16 0.05146 0.00830 0.00878 0.02542
17 0.05158 0.00923 0.00940 0.02830
18 0.05266 0.01147 0.00920 0.03056
19 0.05991 0.01528 0.01024 0.01315
20 0.05999 0.02105 0.01199 0.05079
21 0.06598 0.03111 0.01515 0.03932
22 0.08216 0.04886 0.02412 0.05359
23 0.12164 0.09231 0.03420 0.10655
24 0.31780 0.26941 0.05419 0.23985
25 0.66525 0.65051 0.13104 0.81683
26 1.41856 1.44158 0.25266 1.71436
CPU - dry run times - bc
nqubits qibo qibotf dry_run_time_qiskit dry_run_time_qulacs
3 0.10865 0.03400 0.00193 0.00006
4 0.07797 0.05211 0.00279 0.00006
5 0.10672 0.04472 0.00358 0.00008
6 0.10741 0.09421 0.00513 0.00011
7 0.10422 0.06441 0.00638 0.00020
8 0.10134 0.09720 0.00830 0.00044
9 0.09091 0.13271 0.01143 0.00100
10 0.24327 0.13525 0.01711 0.00239
11 0.23828 0.14882 0.02787 0.00573
12 0.25085 0.17866 0.05522 0.08972
13 0.25957 0.19384 0.10197 0.07487
14 0.27024 0.21175 0.21329 0.12629
15 0.29742 0.25463 0.11639
16 0.32297 0.43239 0.14785
17 0.34809 0.49112 0.18162
18 0.41349 0.59651 0.22338
19 0.35504 0.61407 0.35328
20 0.55181 0.91910 0.53437
21 0.93586 1.49616 0.92757
22 1.45611 2.79792 1.78871
23 3.11519 4.29380 4.53765
24 8.05254 10.77202 12.88755
25 60.40125 59.76469 55.70321
26 131.39063 132.83400 128.20206
CPU - dry run times - qv
nqubits qibo qibotf dry_run_time_qiskit dry_run_time_qulacs
3 0.04844 0.00367 0.00135 0.00005
4 0.05273 0.00486 0.00156 0.00005
5 0.04937 0.01037 0.00159 0.00006
6 0.04929 0.00585 0.00156 0.00006
7 0.04943 0.00590 0.00160 0.00007
8 0.04949 0.00698 0.00195 0.00010
9 0.04952 0.00712 0.00175 0.00015
10 0.05000 0.00838 0.00195 0.00029
11 0.05059 0.00824 0.00206 0.00052
12 0.05063 0.01464 0.00241 0.00118
13 0.05086 0.00964 0.00307 0.02454
14 0.05115 0.01086 0.00467 0.01563
15 0.05211 0.01110 0.01296 0.01522
16 0.05456 0.01279 0.00841 0.02311
17 0.05384 0.01446 0.01317 0.01946
18 0.05632 0.01779 0.00892 0.02220
19 0.05945 0.02175 0.01019 0.03308
20 0.06883 0.02923 0.01302 0.04184
21 0.07827 0.04620 0.01464 0.05152
22 0.11751 0.08035 0.02167 0.09155
23 0.16683 0.15201 0.03156 0.18035
24 0.35416 0.42914 0.05478 0.40510
25 1.08719 1.02541 0.10284 1.15616
26 2.26871 2.31484 0.21652 2.63840
CPU - dry run times - hs
nqubits qibo qibotf dry_run_time_qiskit dry_run_time_qulacs
3 0.14078 0.01123 0.00131 0.00004
4 0.14038 0.05976 0.00112 0.00005
5 0.14218 0.02236 0.00114 0.00004
6 0.14158 0.02755 0.00119 0.00004
7 0.14324 0.04778 0.00146 0.00005
8 0.14234 0.02460 0.00129 0.00005
9 0.14140 0.02000 0.00131 0.00007
10 0.14476 0.03292 0.00154 0.00012
11 0.16313 0.02884 0.00175 0.00020
12 0.14204 0.02411 0.00263 0.00042
13 0.14248 0.01856 0.00286 0.04310
14 0.14593 0.02078 0.00446 0.05991
15 0.14506 0.01933 0.00704 0.07036
16 0.15695 0.02311 0.00738 0.06787
17 0.14624 0.02027 0.00799 0.06284
18 0.14966 0.02264 0.00815 0.06346
19 0.14848 0.03451 0.00890 0.05979
20 0.15762 0.04424 0.01076 0.06404
21 0.17340 0.05215 0.01456 0.06969
22 0.18809 0.08329 0.01836 0.12943
23 0.22851 0.14311 0.03769 0.13979
24 0.55507 0.45082 0.06076 0.38338
25 1.12732 1.10636 0.10539 1.24845
26 2.08358 2.06528 0.19700 2.34944
27 3.82514 3.91613 0.40058 4.42283
28 8.15916 8.23759 0.80033 9.46240
29 19.13270 19.33549 1.58801 22.02400
30 34.21462 34.81286 3.11735 39.77831
CPU - simulation times - qft
nqubits qibo qibotf qiskit qulacs
3 0.00023 0.00029 0.00048 0.00001
4 0.00035 0.00041 0.00057 0.00001
5 0.00049 0.00046 0.00058 0.00001
6 0.00066 0.00095 0.00068 0.00002
7 0.00083 0.00093 0.00072 0.00002
8 0.00106 0.00124 0.00093 0.00006
9 0.00126 0.00112 0.00099 0.00012
10 0.00149 0.00185 0.00120 0.00026
11 0.00177 0.00215 0.00165 0.00062
12 0.00208 0.00245 0.00253 0.00089
13 0.00243 0.00284 0.00425 0.00325
14 0.00289 0.00334 0.00803 0.00403
15 0.00326 0.00388 0.03220 0.00480
16 0.00393 0.00475 0.03673 0.00605
17 0.00463 0.00485 0.04235 0.00792
18 0.00613 0.00786 0.04848 0.01138
19 0.00840 0.01024 0.06044 0.01702
20 0.01361 0.01905 0.07265 0.02933
21 0.02258 0.03532 0.09714 0.06354
22 0.04033 0.06544 0.10747 0.11940
23 0.08698 0.14522 0.18945 0.24661
24 0.28090 0.32265 0.31718 0.58849
25 0.84842 0.87645 0.91841 1.49977
26 2.55392 2.24508 1.88219 4.00886
CPU - simulation times - variational
nqubits qibo qibotf qiskit qulacs
3 0.00025 0.00029 0.00049 0.00001
4 0.00036 0.00042 0.00053 0.00001
5 0.00042 0.00037 0.00055 0.00001
6 0.00049 0.00047 0.00059 0.00002
7 0.00057 0.00072 0.00063 0.00003
8 0.00065 0.00077 0.00067 0.00005
9 0.00069 0.00081 0.00071 0.00006
10 0.00079 0.00095 0.00080 0.00022
11 0.00085 0.00104 0.00092 0.00046
12 0.00095 0.00108 0.00118 0.00100
13 0.00103 0.00122 0.00155 0.00135
14 0.00118 0.00108 0.00256 0.00160
15 0.00125 0.00156 0.00529 0.00183
16 0.00155 0.00180 0.00540 0.00220
17 0.00187 0.00220 0.00607 0.00293
18 0.00292 0.00343 0.00675 0.00464
19 0.00425 0.00457 0.00821 0.00682
20 0.00654 0.00897 0.01051 0.01187
21 0.01081 0.01832 0.01775 0.03918
22 0.02311 0.03407 0.02429 0.06024
23 0.05224 0.08439 0.03916 0.10856
24 0.18787 0.24245 0.07135 0.24549
25 0.50435 0.53373 0.16767 0.69856
26 1.08907 1.17849 0.32472 1.47058
CPU - simulation times - bv
nqubits qibo qibotf qiskit qulacs
3 0.00026 0.00030 0.00048 0.00001
4 0.00032 0.00037 0.00052 0.00001
5 0.00040 0.00047 0.00054 0.00001
6 0.00049 0.00065 0.00057 0.00001
7 0.00057 0.00066 0.00059 0.00001
8 0.00064 0.00185 0.00062 0.00001
9 0.00071 0.00063 0.00067 0.00003
10 0.00077 0.00118 0.00072 0.00005
11 0.00226 0.00114 0.00083 0.00010
12 0.00092 0.00170 0.00107 0.00020
13 0.00107 0.00129 0.00151 0.00137
14 0.00115 0.00124 0.00237 0.00152
15 0.00131 0.00158 0.00469 0.00180
16 0.00151 0.00161 0.00517 0.00205
17 0.00192 0.00200 0.00569 0.00252
18 0.00457 0.00324 0.00656 0.00363
19 0.00407 0.00465 0.00766 0.00534
20 0.00638 0.00887 0.01019 0.00869
21 0.01297 0.01710 0.01697 0.02890
22 0.02309 0.03547 0.02538 0.04603
23 0.05003 0.08153 0.04015 0.08575
24 0.19126 0.26410 0.07391 0.22320
25 0.53018 0.56650 0.16424 0.72453
26 1.18478 1.28438 0.33763 1.56880
CPU - simulation times - supremacy
nqubits qibo qibotf qiskit qulacs
3 0.00031 0.00115 0.00052 0.00001
4 0.00038 0.00039 0.00054 0.00001
5 0.00046 0.00049 0.00057 0.00001
6 0.00055 0.00056 0.00061 0.00001
7 0.00065 0.00064 0.00063 0.00001
8 0.00071 0.00068 0.00067 0.00002
9 0.00079 0.00076 0.00076 0.00003
10 0.00085 0.00088 0.00081 0.00006
11 0.00095 0.00096 0.00093 0.00011
12 0.00104 0.00110 0.00122 0.00045
13 0.00116 0.00116 0.00287 0.00152
14 0.00131 0.00135 0.00306 0.00170
15 0.00146 0.00154 0.00470 0.00193
16 0.00176 0.00196 0.00484 0.00226
17 0.00215 0.00261 0.00500 0.00277
18 0.00340 0.00353 0.00623 0.00391
19 0.00481 0.00565 0.00725 0.00605
20 0.00783 0.00965 0.00933 0.00960
21 0.01568 0.01997 0.01557 0.02993
22 0.02940 0.03944 0.02405 0.04505
23 0.07014 0.08953 0.03592 0.09079
24 0.25279 0.24751 0.06739 0.23617
25 0.63385 0.68515 0.14553 0.83233
26 1.33916 1.41088 0.28359 1.73598
CPU - simulation times - bc
nqubits qibo qibotf qiskit qulacs
3 0.00203 0.00264 0.00115 0.00001
4 0.00514 0.00526 0.00260 0.00002
5 0.00815 0.00790 0.00264 0.00003
6 0.01170 0.01159 0.00389 0.00007
7 0.01666 0.01571 0.00527 0.00016
8 0.02274 0.01735 0.00737 0.00041
9 0.02237 0.02723 0.01975 0.00096
10 0.03307 0.03353 0.02533 0.00236
11 0.03635 0.03972 0.02647 0.00553
12 0.04149 0.04628 0.05496 0.04748
13 0.04948 0.05336 0.09865 0.07355
14 0.06095 0.05566 0.21272 0.08081
15 0.07952 0.06912 0.10210
16 0.10098 0.08495 0.12730
17 0.12224 0.12860 0.16115
18 0.16834 0.16485 0.22653
19 0.21687 0.26676 0.34079
20 0.45414 0.41300 0.56256
21 0.61478 0.79633 0.91847
22 1.15462 1.98153 1.77941
23 2.50292 3.33044 4.52949
24 5.60270 8.59538 12.64645
25 59.86970 57.75547 55.59625
26 130.59354 130.69623 128.17125
CPU - simulation times - qv
nqubits qibo qibotf qiskit qulacs
3 0.00032 0.00031 0.00053 0.00001
4 0.00092 0.00056 0.00063 0.00001
5 0.00059 0.00057 0.00065 0.00001
6 0.00086 0.00084 0.00073 0.00002
7 0.00086 0.00085 0.00073 0.00003
8 0.00117 0.00109 0.00084 0.00006
9 0.00113 0.00109 0.00086 0.00010
10 0.00139 0.00137 0.00110 0.00024
11 0.00139 0.00140 0.00112 0.00048
12 0.00167 0.00164 0.00159 0.00112
13 0.00175 0.00178 0.00296 0.00234
14 0.00211 0.00211 0.00379 0.00283
15 0.00246 0.00219 0.00478 0.00309
16 0.00305 0.00289 0.00459 0.00399
17 0.00352 0.00374 0.00544 0.00493
18 0.00542 0.00558 0.00647 0.00767
19 0.00805 0.00856 0.00724 0.01150
20 0.01417 0.01614 0.00917 0.01941
21 0.02470 0.03082 0.01379 0.04733
22 0.05245 0.06647 0.02163 0.08159
23 0.11709 0.14571 0.03694 0.15883
24 0.29440 0.40059 0.06753 0.40329
25 1.01247 1.07168 0.12397 1.16924
26 2.17764 2.24001 0.24122 2.65551
CPU - simulation times - hs
nqubits qibo qibotf qiskit qulacs
3 0.00037 0.00042 0.00058 0.00002
4 0.00059 0.00065 0.00062 0.00002
5 0.00064 0.00071 0.00064 0.00002
6 0.00081 0.00096 0.00071 0.00002
7 0.00093 0.00109 0.00076 0.00002
8 0.00110 0.00126 0.00081 0.00003
9 0.00113 0.00212 0.00085 0.00005
10 0.00143 0.00155 0.00101 0.00009
11 0.00145 0.00161 0.00111 0.00017
12 0.00166 0.00166 0.00161 0.00039
13 0.00180 0.00263 0.00223 0.00222
14 0.00213 0.00201 0.00433 0.00273
15 0.00230 0.00265 0.00340 0.00318
16 0.00260 0.00320 0.00354 0.00383
17 0.00309 0.00382 0.00362 0.00475
18 0.00585 0.00527 0.00491 0.00673
19 0.00896 0.00955 0.00653 0.01052
20 0.01241 0.01585 0.00946 0.01860
21 0.02027 0.02632 0.01529 0.03455
22 0.03742 0.05523 0.02241 0.05722
23 0.08511 0.11802 0.04046 0.11015
24 0.36213 0.35885 0.07321 0.32752
25 1.05356 1.05197 0.12127 1.20950
26 1.92000 1.97689 0.24272 2.33197
27 3.65685 3.83122 0.49052 4.45067
28 7.95587 8.27206 0.97463 9.56781
29 18.97606 19.55705 1.89644 22.21409
30 33.93688 35.07520 3.71810 40.22109

EDIT: Added qibotf times.

@scarrazza
Copy link
Member

scarrazza commented Aug 10, 2021

Thanks for these numbers, do you have similar number for qibotf? For some circuits like hs and qv the difference is too large, are you sure that qiskit is using CPU instead of GPU? What is the average total program execution time, maybe qiskit is precomputing objects during the circuit definition? Does the final state vector is the same for all backends?

@scarrazza
Copy link
Member

Btw, how many threads qiskit is using? It might be possible that this value is different from our default, e.g. limiting the number of threads might have an impact.

@scarrazza
Copy link
Member

Bwt2, does qiskit is really double precision? If I set qibo to single, I get numbers which are quite close to qiskit...

@stavros11
Copy link
Member Author

Thanks for the response and the questions. Some quick answers:

Thanks for these numbers, do you have similar number for qibotf?

I added the possibility to use qibotf in the same script in the latest push, I will update the above tables once I have the numbers. I don't expect much difference from qibojit, certainly will not be much closer to Qiskit.

For some circuits like hs and qv the difference is too large, are you sure that qiskit is using CPU instead of GPU?

I haven't checked htop explicitly during all benchmarks but all the Qiskit runs I checked use CPU. I think Qiskit only uses GPU when the appropriate simulator is used. I also used export CUDA_VISIBLE_DEVICES="" before running all these benchmarks.

What is the average total program execution time, maybe qiskit is precomputing objects during the circuit definition?

The benchmark script logs the circuit creation time too, which in this corresponds to transforming the OpenQASM circuit to the library circuit. Here are the numbers from the above benchmarks:

CPU - circuit creation times - qft
nqubits qibo qiskit qulacs
3 0.00137 0.03624 0.00042
4 0.00147 0.03691 0.00053
5 0.00164 0.03788 0.00065
6 0.00182 0.03867 0.00120
7 0.00205 0.03959 0.00077
8 0.00227 0.04098 0.00123
9 0.00253 0.04207 0.00141
10 0.00282 0.04366 0.00171
11 0.00316 0.04485 0.00196
12 0.00359 0.04626 0.00174
13 0.00403 0.04921 0.00277
14 0.00438 0.05247 0.00305
15 0.00489 0.05358 0.00344
16 0.00536 0.05514 0.00384
17 0.00582 0.05849 0.00419
18 0.00648 0.06071 0.00467
19 0.00711 0.06336 0.00377
20 0.00738 0.06713 0.00569
21 0.00814 0.06927 0.00483
22 0.00948 0.07347 0.00701
23 0.01007 0.07802 0.00769
24 0.01077 0.08100 0.00605
25 0.01166 0.08356 0.00656
26 0.01228 0.08735 0.00967
CPU - circuit creation times - variational
nqubits qibo qiskit qulacs
3 0.00125 0.03717 0.00034
4 0.00134 0.03736 0.00036
5 0.00139 0.03768 0.00048
6 0.00151 0.03789 0.00051
7 0.00156 0.03824 0.00054
8 0.00164 0.03890 0.00057
9 0.00169 0.03837 0.00041
10 0.00182 0.03879 0.00067
11 0.00184 0.03974 0.00070
12 0.00193 0.03997 0.00073
13 0.00200 0.04018 0.00086
14 0.00212 0.04096 0.00086
15 0.00219 0.04074 0.00084
16 0.00224 0.04129 0.00091
17 0.00233 0.04119 0.00067
18 0.00244 0.04113 0.00099
19 0.00247 0.04150 0.00074
20 0.00252 0.04252 0.00077
21 0.00257 0.04281 0.00105
22 0.00263 0.04340 0.00111
23 0.00266 0.04362 0.00112
24 0.00276 0.04406 0.00119
25 0.00288 0.04354 0.00124
26 0.00292 0.04503 0.00133
CPU - circuit creation times - bv
nqubits qibo qiskit qulacs
3 0.00112 0.03656 0.00029
4 0.00118 0.03729 0.00031
5 0.00130 0.03682 0.00034
6 0.00132 0.03758 0.00038
7 0.00135 0.03702 0.00039
8 0.00141 0.03897 0.00032
9 0.00146 0.03771 0.00046
10 0.00150 0.03846 0.00049
11 0.00159 0.03900 0.00055
12 0.00159 0.03907 0.00055
13 0.00165 0.03917 0.00062
14 0.00175 0.04003 0.00066
15 0.00179 0.03949 0.00063
16 0.00189 0.03946 0.00060
17 0.00199 0.03976 0.00068
18 0.00201 0.04076 0.00071
19 0.00210 0.04103 0.00079
20 0.00207 0.04059 0.00079
21 0.00219 0.04237 0.00082
22 0.00214 0.04115 0.00082
23 0.00225 0.04153 0.00087
24 0.00230 0.04144 0.00089
25 0.00243 0.04178 0.00101
26 0.00241 0.04325 0.00101
CPU - circuit creation times - supremacy
nqubits qibo qiskit qulacs
3 0.00229 0.03562 0.00081
4 0.00246 0.03607 0.00095
5 0.00255 0.03639 0.00097
6 0.00269 0.03752 0.00114
7 0.00279 0.03658 0.00122
8 0.00303 0.03744 0.00129
9 0.00319 0.03778 0.00139
10 0.00327 0.03864 0.00155
11 0.00349 0.03892 0.00160
12 0.00359 0.03839 0.00167
13 0.00376 0.03931 0.00177
14 0.00388 0.03961 0.00190
15 0.00397 0.03987 0.00193
16 0.00410 0.04024 0.00206
17 0.00416 0.04043 0.00217
18 0.00433 0.04082 0.00226
19 0.00481 0.04117 0.00236
20 0.00489 0.04243 0.00245
21 0.00500 0.04196 0.00250
22 0.00523 0.04239 0.00269
23 0.00535 0.04270 0.00277
24 0.00531 0.04284 0.00282
25 0.00557 0.04356 0.00283
26 0.00554 0.04380 0.00305
CPU - circuit creation times - bc
nqubits qibo qiskit qulacs
3 0.00698 0.04693 0.00583
4 0.01214 0.05820 0.00840
5 0.01964 0.07442 0.01379
6 0.02745 0.09376 0.01963
7 0.03747 0.11618 0.02688
8 0.05066 0.14329 0.03724
9 0.06499 0.17409 0.04517
10 0.08000 0.20550 0.05642
11 0.09488 0.34107 0.06893
12 0.11436 0.37615 0.08361
13 0.12946 0.42903 0.09443
14 0.15251 0.48116 0.11314
15 0.17660 0.12898
16 0.19943 0.14574
17 0.21900 0.16487
18 0.24287 0.18173
19 0.40476 0.19997
20 0.43427 0.22061
21 0.46144 0.24401
22 0.49091 0.26230
23 0.51685 0.28915
24 0.56229 0.31276
25 0.59079 0.34755
26 0.63578 0.36804
CPU - circuit creation times - qv
nqubits qibo qiskit qulacs
3 0.00550 0.26850 0.23649
4 0.00595 0.26594 0.23317
5 0.00617 0.27456 0.24185
6 0.00679 0.27547 0.24057
7 0.00673 0.27290 0.24221
8 0.00735 0.27700 0.24050
9 0.00735 0.27422 0.23694
10 0.00792 0.28032 0.23819
11 0.00779 0.28106 0.24027
12 0.00845 0.28351 0.23970
13 0.00860 0.28016 0.23401
14 0.00914 0.28502 0.23784
15 0.00905 0.28237 0.23578
16 0.00960 0.27850 0.24174
17 0.00963 0.28461 0.23565
18 0.01040 0.28696 0.23779
19 0.01007 0.28674 0.24116
20 0.01086 0.28600 0.23942
21 0.01082 0.28849 0.24169
22 0.01154 0.28992 0.24264
23 0.01143 0.29168 0.23564
24 0.01178 0.29128 0.23843
25 0.01205 0.28958 0.24142
26 0.01254 0.29300 0.24495
CPU - circuit creation times - hs
nqubits qibo qiskit qulacs
3 0.00103 0.03652 0.00031
4 0.00138 0.03680 0.00045
5 0.00137 0.03762 0.00045
6 0.00154 0.03868 0.00053
7 0.00162 0.03808 0.00057
8 0.00171 0.03890 0.00063
9 0.00173 0.03853 0.00064
10 0.00192 0.03964 0.00071
11 0.00198 0.03991 0.00070
12 0.00212 0.04040 0.00079
13 0.00212 0.04141 0.00081
14 0.00246 0.04184 0.00094
15 0.00240 0.04115 0.00092
16 0.00247 0.04177 0.00098
17 0.00243 0.04270 0.00095
18 0.00264 0.04398 0.00104
19 0.00276 0.04423 0.00112
20 0.00285 0.04403 0.00123
21 0.00291 0.04433 0.00117
22 0.00303 0.04482 0.00131
23 0.00290 0.04475 0.00128
24 0.00328 0.04597 0.00139
25 0.00346 0.04679 0.00145
26 0.00336 0.04689 0.00143
27 0.00326 0.04610 0.00137
28 0.00343 0.04656 0.00106
29 0.00376 0.04879 0.00172
30 0.00357 0.04819 0.00158

Indeed Qiskit has slighlty higher creation in all cases but still wins when considering the sum creation + execution.

Does the final state vector is the same for all backends?

This is exactly what is tested in the new test_libraries.py for all circuits, except qv due to the U3 convention issue. I will try to do a check using the benchmark script too but from a quick look it seems that Qiskit returns the expected states, that's why I wrote that I don't think that something strange like skipping the simulation happens.

Btw, how many threads qiskit is using? It might be possible that this value is different from our default, e.g. limiting the number of threads might have an impact.

Qiskit and Qulacs use all available threads while Qibo uses half of them. This may cause some of the difference but I don't think it explains the whole difference. In past Qibo benchmarks using all threads had minimal change in performance.

Bwt2, does qiskit is really double precision? If I set qibo to single, I get numbers which are quite close to qiskit...

I am not sure exactly what happens during simulation but if I do result.dtype in the state returned by Qiskit I get complex128. Also according to their docs double precision is used by default.

@scarrazza
Copy link
Member

@stavros11 thanks for the comments. I have tested and indeed qiskit is 2x faster when using single precision. Starting from the QFT, if I keep only first layer of H gates, qiskit is 1s faster than qibo. At this point we should revisit each gate, if the single gates have similar performance, then I agree that some extra parallelization is performed by qiskit.

@scarrazza
Copy link
Member

In particular, if I yield just 1 Hadamard the qibo performance is better than qiskit, however as soon as I include 5 Hadamard, one per qubit, the qiskit performance is better, so this sounds like circuit fusion/block parallelization.

@scarrazza
Copy link
Member

Following their docs I think this latest version of qiskit:

  • uses openmp for parallel evaluation
  • enables openmp if nqubits >= 14
  • uses fusion by default

@scarrazza
Copy link
Member

Last comment about that, if I set self.simulator.set_options(fusion_enable=False) and use all threads in qibojit, I get almost the same performance for qiskit and qibo. So, it is the fusion that accelerates the computation.
We should look into that and check if qibo can support it.

@stavros11
Copy link
Member Author

Last comment about that, if I set self.simulator.set_options(fusion_enable=False) and use all threads in qibojit, I get almost the same performance for qiskit and qibo. So, it is the fusion that accelerates the computation.

I have been doing the benchmark using the same option and can confirm that performance is the same with Qibo. Here are the results for all circuits:

CPU - dry run times - qft
nqubits qibojit qibotf qiskit qiskit-nofusion
3 0.14609 0.01614 0.00141 0.00102
4 0.14219 0.02567 0.00114 0.00119
5 0.13952 0.01790 0.00106 0.00145
6 0.14063 0.01482 0.00114 0.00112
7 0.14224 0.03153 0.00129 0.00120
8 0.14368 0.02951 0.00160 0.00174
9 0.14293 0.02227 0.00157 0.00200
10 0.14489 0.02659 0.00196 0.00174
11 0.14266 0.02941 0.00227 0.00217
12 0.14414 0.02864 0.00305 0.00342
13 0.14532 0.01835 0.00476 0.00468
14 0.14601 0.02937 0.00911 0.00867
15 0.15146 0.02963 0.03047 0.06162
16 0.15435 0.03529 0.03393 0.07151
17 0.14970 0.03201 0.03692 0.08129
18 0.15598 0.03784 0.04524 0.09627
19 0.15866 0.03899 0.05674 0.11165
20 0.16004 0.04898 0.07397 0.13880
21 0.17945 0.06781 0.09618 0.19589
22 0.18606 0.09782 0.11774 0.28979
23 0.24259 0.19192 0.17108 0.47256
24 0.48633 0.41005 0.38000 0.55978
25 1.13068 0.95941 0.91085 1.18741
26 2.75894 2.47023 1.88619 2.91471
CPU - dry run times - variational
nqubits qibojit qibotf qiskit qiskit-nofusion
3 0.13821 0.02626 0.00127 0.00105
4 0.13915 0.01163 0.00100 0.00107
5 0.13660 0.02306 0.00105 0.00141
6 0.13939 0.01711 0.00124 0.00113
7 0.13899 0.01630 0.00120 0.00104
8 0.14083 0.04023 0.00121 0.00124
9 0.13952 0.03896 0.00158 0.00112
10 0.13938 0.02687 0.00136 0.00162
11 0.14355 0.01670 0.00151 0.00172
12 0.14013 0.02445 0.00168 0.00158
13 0.14623 0.01961 0.00210 0.00263
14 0.14328 0.03573 0.00346 0.00309
15 0.14111 0.01913 0.00764 0.01736
16 0.14549 0.02167 0.00842 0.01963
17 0.14837 0.02696 0.00839 0.02126
18 0.14382 0.02851 0.00863 0.02218
19 0.14968 0.02721 0.01136 0.02605
20 0.15033 0.03511 0.01158 0.03093
21 0.15262 0.04414 0.01776 0.03962
22 0.16749 0.05572 0.02055 0.06225
23 0.20121 0.12087 0.03393 0.08359
24 0.34273 0.26135 0.05600 0.16211
25 0.66060 0.56991 0.14690 0.53463
26 1.22694 1.18731 0.28349 1.13857
CPU - dry run times - bv
nqubits qibojit qibotf qiskit qiskit-nofusion
3 0.14355 0.02230 0.00104 0.00150
4 0.15346 0.05262 0.00108 0.00115
5 0.14235 0.01966 0.00124 0.00107
6 0.14811 0.02366 0.00121 0.00126
7 0.14192 0.02395 0.00132 0.00110
8 0.14303 0.03168 0.00112 0.00125
9 0.14264 0.02597 0.00123 0.00119
10 0.14080 0.03298 0.00126 0.00116
11 0.14738 0.01784 0.00157 0.00143
12 0.14324 0.02217 0.00163 0.00159
13 0.13582 0.02507 0.00253 0.00220
14 0.14438 0.02631 0.00298 0.00297
15 0.14555 0.01531 0.00741 0.02394
16 0.14405 0.02410 0.00792 0.01852
17 0.15019 0.02785 0.00842 0.02635
18 0.14511 0.02288 0.00869 0.02385
19 0.14929 0.02792 0.00971 0.02737
20 0.15472 0.02289 0.01119 0.03358
21 0.15934 0.04284 0.01612 0.05133
22 0.16652 0.04939 0.02037 0.07103
23 0.20038 0.09279 0.03760 0.11483
24 0.34668 0.29527 0.06302 0.26456
25 0.68856 0.60402 0.14585 0.60601
26 1.34578 1.25987 0.29990 1.24007
CPU - dry run times - supremacy
nqubits qibojit qibotf qiskit qiskit-nofusion
3 0.04730 0.00359 0.00142 0.00146
4 0.04759 0.00379 0.00136 0.00155
5 0.04874 0.01179 0.00153 0.00147
6 0.05135 0.00473 0.00161 0.00147
7 0.04738 0.00469 0.00159 0.00144
8 0.04834 0.00498 0.00171 0.00145
9 0.05123 0.00518 0.00171 0.00184
10 0.05125 0.00554 0.00181 0.00193
11 0.05401 0.00590 0.00174 0.00210
12 0.04822 0.00623 0.00215 0.00205
13 0.04880 0.00647 0.00416 0.00276
14 0.04886 0.00697 0.00378 0.00384
15 0.04925 0.00717 0.00946 0.03849
16 0.05146 0.00830 0.00878 0.03031
17 0.05158 0.00923 0.00940 0.03290
18 0.05266 0.01147 0.00920 0.04191
19 0.05991 0.01528 0.01024 0.03712
20 0.05999 0.02105 0.01199 0.04890
21 0.06598 0.03111 0.01515 0.05854
22 0.08216 0.04886 0.02412 0.09568
23 0.12164 0.09231 0.03420 0.13533
24 0.31780 0.26941 0.05419 0.20371
25 0.66525 0.65051 0.13104 0.67836
26 1.41856 1.44158 0.25266 1.41567
CPU - dry run times - bc
nqubits qibojit qibotf qiskit qiskit-nofusion
3 0.10865 0.03400 0.00193 0.00226
4 0.07797 0.05211 0.00279 0.00257
5 0.10672 0.04472 0.00358 0.00380
6 0.10741 0.09421 0.00513 0.00488
7 0.10422 0.06441 0.00638 0.00627
8 0.10134 0.09720 0.00830 0.00862
9 0.09091 0.13271 0.01143 0.01162
10 0.24327 0.13525 0.01711 0.12332
11 0.23828 0.14882 0.02787 0.02777
12 0.25085 0.17866 0.05522 0.05113
13 0.25957 0.19384 0.10197 0.10160
14 0.27024 0.21175 0.21329 0.21323
CPU - dry run times - qv
nqubits qibojit qibotf qiskit qiskit-nofusion
3 0.04844 0.00367 0.00135 0.00139
4 0.05273 0.00486 0.00156 0.00162
5 0.04937 0.01037 0.00159 0.00157
6 0.04929 0.00585 0.00156 0.00171
7 0.04943 0.00590 0.00160 0.00164
8 0.04949 0.00698 0.00195 0.00177
9 0.04952 0.00712 0.00175 0.00172
10 0.05000 0.00838 0.00195 0.00174
11 0.05059 0.00824 0.00206 0.00187
12 0.05063 0.01464 0.00241 0.00268
13 0.05086 0.00964 0.00307 0.00298
14 0.05115 0.01086 0.00467 0.00471
15 0.05211 0.01110 0.01296 0.03167
16 0.05456 0.01279 0.00841 0.03815
17 0.05384 0.01446 0.01317 0.03695
18 0.05632 0.01779 0.00892 0.04353
19 0.05945 0.02175 0.01019 0.04842
20 0.06883 0.02923 0.01302 0.06003
21 0.07827 0.04620 0.01464 0.07892
22 0.11751 0.08035 0.02167 0.12473
23 0.16683 0.15201 0.03156 0.16395
24 0.35416 0.42914 0.05478 0.31862
25 1.08719 1.02541 0.10284 1.02304
26 2.26871 2.31484 0.21652 2.32828
CPU - dry run times - hs
nqubits qibojit qibotf qiskit qiskit-nofusion
3 0.14078 0.01123 0.00131 0.00109
4 0.14038 0.05976 0.00112 0.00127
5 0.14218 0.02236 0.00114 0.00101
6 0.14158 0.02755 0.00119 0.00106
7 0.14324 0.04778 0.00146 0.00129
8 0.14234 0.02460 0.00129 0.00144
9 0.14140 0.02000 0.00131 0.00133
10 0.14476 0.03292 0.00154 0.00148
11 0.16313 0.02884 0.00175 0.00151
12 0.14204 0.02411 0.00263 0.00245
13 0.14248 0.01856 0.00286 0.00274
14 0.14593 0.02078 0.00446 0.00501
15 0.14506 0.01933 0.00704 0.03207
16 0.15695 0.02311 0.00738 0.03810
17 0.14624 0.02027 0.00799 0.03172
18 0.14966 0.02264 0.00815 0.03970
19 0.14848 0.03451 0.00890 0.04505
20 0.15762 0.04424 0.01076 0.05183
21 0.17340 0.05215 0.01456 0.06366
22 0.18809 0.08329 0.01836 0.10526
23 0.22851 0.14311 0.03769 0.12050
24 0.55507 0.45082 0.06076 0.26734
25 1.12732 1.10636 0.10539 1.07681
26 2.08358 2.06528 0.19700 2.01730
27 3.82514 3.91613 0.40058 3.81916
28 8.15916 8.23759 0.80033 8.24329
29 19.13270 19.33549 1.58801 19.64304
30 34.21462 34.81286 3.11735 35.19634
CPU - simulation times - qft
nqubits qibojit qibotf qiskit qiskit-nofusion
3 0.00023 0.00029 0.00048 0.00048
4 0.00035 0.00041 0.00057 0.00053
5 0.00049 0.00046 0.00058 0.00061
6 0.00066 0.00095 0.00068 0.00068
7 0.00083 0.00093 0.00072 0.00072
8 0.00106 0.00124 0.00093 0.00085
9 0.00126 0.00112 0.00099 0.00097
10 0.00149 0.00185 0.00120 0.00122
11 0.00177 0.00215 0.00165 0.00163
12 0.00208 0.00245 0.00253 0.00252
13 0.00243 0.00284 0.00425 0.00432
14 0.00289 0.00334 0.00803 0.00793
15 0.00326 0.00388 0.03220 0.06548
16 0.00393 0.00475 0.03673 0.07671
17 0.00463 0.00485 0.04235 0.08364
18 0.00613 0.00786 0.04848 0.09972
19 0.00840 0.01024 0.06044 0.11577
20 0.01361 0.01905 0.07265 0.13619
21 0.02258 0.03532 0.09714 0.18999
22 0.04033 0.06544 0.10747 0.26075
23 0.08698 0.14522 0.18945 0.29765
24 0.28090 0.32265 0.31718 0.51357
25 0.84842 0.87645 0.91841 1.27589
26 2.55392 2.24508 1.88219 2.88646
CPU - simulation times - variational
nqubits qibojit qibotf qiskit qiskit-nofusion
3 0.00025 0.00029 0.00049 0.00051
4 0.00036 0.00042 0.00053 0.00053
5 0.00042 0.00037 0.00055 0.00054
6 0.00049 0.00047 0.00059 0.00059
7 0.00057 0.00072 0.00063 0.00074
8 0.00065 0.00077 0.00067 0.00068
9 0.00069 0.00081 0.00071 0.00070
10 0.00079 0.00095 0.00080 0.00080
11 0.00085 0.00104 0.00092 0.00088
12 0.00095 0.00108 0.00118 0.00117
13 0.00103 0.00122 0.00155 0.00158
14 0.00118 0.00108 0.00256 0.00247
15 0.00125 0.00156 0.00529 0.01913
16 0.00155 0.00180 0.00540 0.02132
17 0.00187 0.00220 0.00607 0.02221
18 0.00292 0.00343 0.00675 0.02411
19 0.00425 0.00457 0.00821 0.02657
20 0.00654 0.00897 0.01051 0.03224
21 0.01081 0.01832 0.01775 0.04365
22 0.02311 0.03407 0.02429 0.06475
23 0.05224 0.08439 0.03916 0.07442
24 0.18787 0.24245 0.07135 0.16866
25 0.50435 0.53373 0.16767 0.54643
26 1.08907 1.17849 0.32472 1.16364
CPU - simulation times - bv
nqubits qibojit qibotf qiskit qiskit-nofusion
3 0.00026 0.00030 0.00048 0.00050
4 0.00032 0.00037 0.00052 0.00051
5 0.00040 0.00047 0.00054 0.00052
6 0.00049 0.00065 0.00057 0.00057
7 0.00057 0.00066 0.00059 0.00061
8 0.00064 0.00185 0.00062 0.00064
9 0.00071 0.00063 0.00067 0.00067
10 0.00077 0.00118 0.00072 0.00074
11 0.00226 0.00114 0.00083 0.00089
12 0.00092 0.00170 0.00107 0.00108
13 0.00107 0.00129 0.00151 0.00148
14 0.00115 0.00124 0.00237 0.00235
15 0.00131 0.00158 0.00469 0.01967
16 0.00151 0.00161 0.00517 0.02066
17 0.00192 0.00200 0.00569 0.02271
18 0.00457 0.00324 0.00656 0.02499
19 0.00407 0.00465 0.00766 0.02845
20 0.00638 0.00887 0.01019 0.03321
21 0.01297 0.01710 0.01697 0.04855
22 0.02309 0.03547 0.02538 0.07497
23 0.05003 0.08153 0.04015 0.11676
24 0.19126 0.26410 0.07391 0.26712
25 0.53018 0.56650 0.16424 0.62133
26 1.18478 1.28438 0.33763 1.29223
CPU - simulation times - supremacy
nqubits qibojit qibotf qiskit qiskit-nofusion
3 0.00031 0.00115 0.00052 0.00051
4 0.00038 0.00039 0.00054 0.00054
5 0.00046 0.00049 0.00057 0.00057
6 0.00055 0.00056 0.00061 0.00061
7 0.00065 0.00064 0.00063 0.00063
8 0.00071 0.00068 0.00067 0.00105
9 0.00079 0.00076 0.00076 0.00072
10 0.00085 0.00088 0.00081 0.00081
11 0.00095 0.00096 0.00093 0.00101
12 0.00104 0.00110 0.00122 0.00127
13 0.00116 0.00116 0.00287 0.00175
14 0.00131 0.00135 0.00306 0.00308
15 0.00146 0.00154 0.00470 0.02359
16 0.00176 0.00196 0.00484 0.02600
17 0.00215 0.00261 0.00500 0.02858
18 0.00340 0.00353 0.00623 0.03187
19 0.00481 0.00565 0.00725 0.03511
20 0.00783 0.00965 0.00933 0.04261
21 0.01568 0.01997 0.01557 0.05924
22 0.02940 0.03944 0.02405 0.07921
23 0.07014 0.08953 0.03592 0.09555
24 0.25279 0.24751 0.06739 0.20516
25 0.63385 0.68515 0.14553 0.69297
26 1.33916 1.41088 0.28359 1.43705
CPU - simulation times - bc
nqubits qibojit qibotf qiskit qiskit-nofusion
3 0.00203 0.00264 0.00115 0.00145
4 0.00514 0.00526 0.00260 0.00179
5 0.00815 0.00790 0.00264 0.00332
6 0.01170 0.01159 0.00389 0.00381
7 0.01666 0.01571 0.00527 0.00522
8 0.02274 0.01735 0.00737 0.01796
9 0.02237 0.02723 0.01975 0.01998
10 0.03307 0.03353 0.02533 0.02082
11 0.03635 0.03972 0.02647 0.02681
12 0.04149 0.04628 0.05496 0.04923
13 0.04948 0.05336 0.09865 0.11620
14 0.06095 0.05566 0.21272 0.21620
CPU - simulation times - qv
nqubits qibojit qibotf qiskit qiskit-nofusion
3 0.00032 0.00031 0.00053 0.00053
4 0.00092 0.00056 0.00063 0.00066
5 0.00059 0.00057 0.00065 0.00067
6 0.00086 0.00084 0.00073 0.00072
7 0.00086 0.00085 0.00073 0.00079
8 0.00117 0.00109 0.00084 0.00083
9 0.00113 0.00109 0.00086 0.00086
10 0.00139 0.00137 0.00110 0.00102
11 0.00139 0.00140 0.00112 0.00113
12 0.00167 0.00164 0.00159 0.00167
13 0.00175 0.00178 0.00296 0.00214
14 0.00211 0.00211 0.00379 0.00388
15 0.00246 0.00219 0.00478 0.03039
16 0.00305 0.00289 0.00459 0.03687
17 0.00352 0.00374 0.00544 0.03616
18 0.00542 0.00558 0.00647 0.04268
19 0.00805 0.00856 0.00724 0.04680
20 0.01417 0.01614 0.00917 0.05700
21 0.02470 0.03082 0.01379 0.07853
22 0.05245 0.06647 0.02163 0.11175
23 0.11709 0.14571 0.03694 0.14163
24 0.29440 0.40059 0.06753 0.37110
25 1.01247 1.07168 0.12397 1.03992
26 2.17764 2.24001 0.24122 2.36563
CPU - simulation times - hs
nqubits qibojit qibotf qiskit qiskit-nofusion
3 0.00037 0.00042 0.00058 0.00057
4 0.00059 0.00065 0.00062 0.00063
5 0.00064 0.00071 0.00064 0.00065
6 0.00081 0.00096 0.00071 0.00071
7 0.00093 0.00109 0.00076 0.00073
8 0.00110 0.00126 0.00081 0.00081
9 0.00113 0.00212 0.00085 0.00085
10 0.00143 0.00155 0.00101 0.00100
11 0.00145 0.00161 0.00111 0.00114
12 0.00166 0.00166 0.00161 0.00159
13 0.00180 0.00263 0.00223 0.00230
14 0.00213 0.00201 0.00433 0.00395
15 0.00230 0.00265 0.00340 0.03291
16 0.00260 0.00320 0.00354 0.03397
17 0.00309 0.00382 0.00362 0.03420
18 0.00585 0.00527 0.00491 0.04197
19 0.00896 0.00955 0.00653 0.04875
20 0.01241 0.01585 0.00946 0.05614
21 0.02027 0.02632 0.01529 0.07261
22 0.03742 0.05523 0.02241 0.10516
23 0.08511 0.11802 0.04046 0.11220
24 0.36213 0.35885 0.07321 0.27944
25 1.05356 1.05197 0.12127 1.08479
26 1.92000 1.97689 0.24272 2.05238
27 3.65685 3.83122 0.49052 3.88389
28 7.95587 8.27206 0.97463 8.39834
29 18.97606 19.55705 1.89644 19.96811
30 33.93688 35.07520 3.71810 35.70113

So results are pretty much similar with the exception of dry run times from small qubit numbers. I am not sure if this can be improved if we disable parallelization for nqubits < 14 as Qiskit does by default.

We should look into that and check if qibo can support it.

I agree we should revisit gate fusion in Qibo and if performance is improved so much for most common circuits we could consider making default with some cut-off in the number of qubits. We should open an issue about that in Qibo.

@stavros11
Copy link
Member Author

stavros11 commented Aug 12, 2021

By the way, I added the option to use Qiskit without fusion in the benchmark script (via --library qiskit-nofusion) and also GPU support (--library qiskit-gpu and --library qulacs-gpu). I noticed that when using Qiskit GPU the final state returned is wrong, tests do not pass and if I print the final state from benchmark it is different than other backends (including Qiskit CPU).

I'm not yet sure if this is a bug with Qiskit or a problem in our code but will investigate it further (just noting it in case you try to run something in the meantime).

@scarrazza
Copy link
Member

@stavros11 thank you very much for these numbers and confirmation. I agree concerning fusion and the possibility to set threads automatically, as you have posted in the issue. I will try the new GPU implementations tomorrow.

@scarrazza
Copy link
Member

@stavros11 2 points:

  • which tests are failing for you with qiskit-gpu?
  • the qiskit-gpu performance does not change with the fusion_enable flag, does this happen also for you?

@stavros11
Copy link
Member Author

stavros11 commented Aug 16, 2021

Quick response before I take off for Abu Dhabi:

  • which tests are failing for you with qiskit-gpu?

I was checking this thoroughly yesterday and interestingly the problem exists only on my local machine. I tried both DGX and qibo machine and qiskit-gpu works well there. In my machine I get errors even when using simple qiskit circuits, without all the benchmark code we have here. I’ll give a simple script later. I’m not sure if it is related to CUDA version or something is wrong in my configuration. I followed the same installation procedure everywhere (just pip install qiskit-aer-gpu).

So I believe the code here is okay to try GPU benchmarks as it is. We just need to expand by adding QCGPU and HyQuas.

  • the qiskit-gpu performance does not change with the fusion_enable flag, does this happen also for you?

I haven’t checked how fusion affects GPU yet.

@scarrazza
Copy link
Member

@stavros11, tests are passing on my pc however, if I print the result during dry run and simulation run (like a manual --transfer with nrep=1) I get:

  • for qibojit CPU/GPU and qiskit CPU I get sensible results:
    [0.00012207+0.j 0.00012207+0.j 0.00012207+0.j ... 0.00012207+0.j 0.00012207+0.j 0.00012207+0.j]
    [benchmarks|INFO|2021-08-16 21:43:11]: dry_run_transfer_time: 0.0006430149078369141
    [0.00012207+0.j 0.00012207+0.j 0.00012207+0.j ... 0.00012207+0.j 0.00012207+0.j 0.00012207+0.j]
    
  • however for qiskit-gpu, the performance is quite strange (~4x faster than qibojit), and I get these wrong prints:
    [1.+0.j 0.+0.j 0.+0.j ... 0.+0.j 0.+0.j 0.+0.j] 
    [benchmarks|INFO|2021-08-16 21:42:39]: dry_run_transfer_time: 0.0003075599670410156
    [1.    +0.j 0.0625+0.j 0.0625+0.j ... 0.    +0.j 0.    +0.j 0.    +0.j]
    

Does this happen for you?

@stavros11
Copy link
Member Author

@stavros11, tests are passing on my pc

Note that the tests that are uploaded on GitHub do not test the GPU backends. In order to test these you have to include "qiskit-gpu" and "qulacs-gpu" in the LIBRARIES list in conftest.py.

  • for qibojit CPU/GPU and qiskit CPU I get sensible results:
    [0.00012207+0.j 0.00012207+0.j 0.00012207+0.j ... 0.00012207+0.j 0.00012207+0.j 0.00012207+0.j]
    [benchmarks|INFO|2021-08-16 21:43:11]: dry_run_transfer_time: 0.0006430149078369141
    [0.00012207+0.j 0.00012207+0.j 0.00012207+0.j ... 0.00012207+0.j 0.00012207+0.j 0.00012207+0.j]
    
  • however for qiskit-gpu, the performance is quite strange (~4x faster than qibojit), and I get these wrong prints:
    [1.+0.j 0.+0.j 0.+0.j ... 0.+0.j 0.+0.j 0.+0.j] 
    [benchmarks|INFO|2021-08-16 21:42:39]: dry_run_transfer_time: 0.0003075599670410156
    [1.    +0.j 0.0625+0.j 0.0625+0.j ... 0.    +0.j 0.    +0.j 0.    +0.j]
    

Does this happen for you?

Yes, I observe some strange behavior from qiskit-gpu in all machines. If I add "qiskit-gpu" in the tests, they fail on my machine but pass on Qibomachine. However when I print the state during the benchmark as in your example, I get wrong results in all machine. Also the final state changes if I run the same script more than once even though there is nothing random involved.

Here is a simple script that reproduces these issues:

import qiskit
from qiskit.providers.aer import StatevectorSimulator

def main(nqubits, nreps, gpu, transpile):
    for _ in range(nreps):
        circuit = qiskit.QuantumCircuit(nqubits)
        for i in range(nqubits):
            circuit.h(i)

        if gpu:
            simulator = StatevectorSimulator(method="statevector_gpu")
        else:
            simulator = StatevectorSimulator()

        if transpile:
            circuit = qiskit.transpile(circuit, simulator)

        print("nqubits:", nqubits)
        print("nreps:", nreps)
        print("gpu:", gpu)
        print("transpile:", transpile)
        result = simulator.run(circuit).result()
        print(result.get_statevector(circuit))
        print()

@scarrazza, if you try to run this with gpu = True and nreps > 1 it is very likely that you will get different states between each repetition even though the same circuit is simulated. If you run the same script more than once you may also get different states in each run. Currently in qibo machine the problem appears only when nqubits >= 10, however in my local machine I get even for two qubits.

@scarrazza
Copy link
Member

@stavros11 I confirm all your points. I was monitoring the GPU usage on different systems while running pytest and I realized that only in the qibomachine it doesn't seem to use any GPU % during tests, so maybe it is falling back to CPU (I think qiskit provides some get_device method to check if the backend is using CPU or GPU).

Did you try the qft using qiskit.*.library.QFT directly?

@stavros11
Copy link
Member Author

Did you try the qft using qiskit.*.library.QFT directly?

If I replace the circuit creation with qiskit.circuit.library.QFT in the above script the problem remains for GPU. Note that for the built-in QFT I have to use the transpile option otherwise I get a different error when attempting to get the statevector on both CPU and GPU:

qiskit.exceptions.QiskitError: 'Data for experiment "QFT" could not be found.'

@scarrazza
Copy link
Member

@stavros11 I just monitor the pytest performance on test_libraries for 5, 10, 15, 26 qubits. Tests are failing for 15 and 26, for these tests I can see GPU usage high and CPU usage low, however for <= 10 the CPU usage is very high and GPU is low. So I assume they have some fallback mechanism which selects the appropriate hardware.

As discussed today, let me suggest to complete the other libraries listed in the first post, and perform a final decision afterwards.

@scarrazza
Copy link
Member

@stavros11 concerning qiskit, I have opened this issue Qiskit/qiskit-aer#1319, and they have proposed a fix in this PR Qiskit/qiskit-aer#1325. So it is a qiskit bug.

@scarrazza
Copy link
Member

@stavros11 I have installed the aer master locally and indeed the GPU problem is fixed. On the other hand their performance is a factor 2x slower than qibojit.

@stavros11
Copy link
Member Author

@scarrazza here are some plots using the circuits and libraries we have so far for CPU:

Total dry run time (= creation + dry run) for QFT

image

Total time (= creation + simulation) for QFT

image

Dry run time for 20 qubits

image

Simulation time for 20 qubits

image

Total dry run time (creation + dry run) for 20 qubits

image

Total simulation time (creation + simulation) for 20 qubits

image

Total dry run time (creation + dry run) for 25 qubits

image

Total simulation time (creation + simulation) for 25 qubits

image

It seems that creation time is the main bottleneck for some libraries and circuits. This is the time required to convert the circuit from Qasm to the library's format. For Qulacs I do this conversion manually as I could not find a qasm parser on their docs but for Qiskit I am using QuantumCircuit.from_qasm_str. Anyway, this time is logged seperately from the simulation and dry run times so we may choose to not include it in the plots if we wish, even though it will appear when simulating in practice as the circuit needs to be created.

Other than that, I will try to run some single precision benchmarks with Qibo, Cirq, Qiskit and TFQ because I could not find how to switch TFQ to double and also some GPU benchmarks with Qibo, QCGPU, Qulacs and Qiskit (if their GPU simulator is fixed). Let me know what other configurations and plots would be interesting.

@scarrazza
Copy link
Member

Cool, thanks for these interesting results.

@scarrazza
Copy link
Member

I think we should have a look at the dry-run, I have the suspicious our initialization is not 100% due to *jit, but maybe the object allocations (gate matrices, etc...).

@stavros11
Copy link
Member Author

I think we should have a look at the dry-run, I have the suspicious our initialization is not 100% due to *jit, but maybe the object allocations (gate matrices, etc...).

That is a good point and makes sense because some elements such as gate matrices are allocated during the first execution (which is the dry run) and cached for subsequent runs. However I tried executing the benchmark by recreating a new circuit object before every execution (dry run and simulation) and the difference between dry run and simulation remains. Here are some numbers:

qft
nqubits dry run dry run (recreation) simulation simulation (recreation)
3 0.15707 0.15781 0.00016 0.00028
4 0.15898 0.15498 0.00023 0.00044
5 0.15525 0.15473 0.00031 0.00054
6 0.15515 0.15456 0.00041 0.00080
7 0.15875 0.15547 0.00054 0.00097
8 0.15593 0.15827 0.00068 0.00127
9 0.16144 0.15744 0.00085 0.00158
10 0.15839 0.15660 0.00097 0.00186
11 0.15855 0.15709 0.00117 0.00216
12 0.15825 0.15653 0.00138 0.00279
13 0.15841 0.15906 0.00167 0.00308
14 0.16396 0.15718 0.00193 0.00368
15 0.16583 0.15940 0.00247 0.00430
16 0.16377 0.16107 0.00311 0.00538
17 0.16657 0.16659 0.00478 0.00747
18 0.17491 0.16635 0.00820 0.00985
19 0.17749 0.17763 0.01323 0.01655
20 0.18860 0.18337 0.02189 0.02375
21 0.22588 0.19885 0.04432 0.04593
22 0.34758 0.31781 0.15806 0.16899
23 0.68696 0.71270 0.52971 0.53924
24 1.45951 1.45149 1.28259 1.29806
25 2.92720 2.91710 2.75042 2.76086
26 6.04791 5.99328 5.83122 5.84658
27 12.52631 12.50224 12.85258 12.28759
28 26.20651 26.31926 27.02812 26.11188
variational
nqubits dry run dry run (recreation) simulation simulation (recreation)
3 0.14938 0.15226 0.00018 0.00037
4 0.15371 0.15332 0.00025 0.00053
5 0.15404 0.15019 0.00027 0.00058
6 0.15008 0.15739 0.00031 0.00072
7 0.15250 0.15245 0.00035 0.00078
8 0.15032 0.15541 0.00044 0.00089
9 0.15122 0.15354 0.00046 0.00097
10 0.15075 0.15808 0.00051 0.00108
11 0.17387 0.15102 0.00058 0.00113
12 0.15500 0.15272 0.00083 0.00128
13 0.15667 0.15273 0.00071 0.00159
14 0.15319 0.15159 0.00083 0.00162
15 0.15240 0.15151 0.00097 0.00182
16 0.15540 0.15854 0.00132 0.00249
17 0.15737 0.15790 0.00239 0.00345
18 0.16196 0.16037 0.00428 0.00534
19 0.16351 0.16368 0.00775 0.00891
20 0.17372 0.16792 0.01338 0.01195
21 0.19686 0.19006 0.02439 0.02384
22 0.26057 0.26137 0.09740 0.10637
23 0.43137 0.43235 0.26274 0.26495
24 0.73790 0.71873 0.56391 0.56679
25 1.33550 1.33740 1.15508 1.16081
26 2.58018 2.60355 2.44414 2.43976
27 5.11979 5.15588 4.93943 4.95783
28 10.57878 10.61298 10.44146 10.46713

@scarrazza
Copy link
Member

Thanks for checking, this sounds like some for loop overhead. At some point, after completing the codes / libs for this exercise, one should go step by step and profile the function calls and identify where we loose performance.

@stavros11
Copy link
Member Author

Thanks for checking, this sounds like some for loop overhead. At some point, after completing the codes / libs for this exercise, one should go step by step and profile the function calls and identify where we loose performance.

I am not sure if this helps, but I tried profiling the benchmark script using cProfile and I noticed that the difference between the logged dry run time and simulation time is similar to the cumulative time of numba's Dispatcher.compile which is logged in the profiling result file. So I tried profiling for multiple qubit number and circuit configurations and it appears that there is some kind of agreement:

qft
nqubits dry run simulation numba compile dry run - compile - simulation
3 0.23906 0.00029 0.23 0.00877
4 0.23490 0.00042 0.226 0.00848
5 0.23044 0.00055 0.221 0.00889
6 0.23567 0.00072 0.226 0.00895
7 0.24020 0.00097 0.23 0.00923
8 0.24235 0.00115 0.231 0.01020
9 0.24071 0.00142 0.23 0.00929
10 0.24070 0.00167 0.229 0.01002
11 0.24643 0.00188 0.234 0.01056
12 0.24060 0.00237 0.227 0.01123
13 0.23433 0.00258 0.221 0.01076
14 0.24708 0.00324 0.232 0.01183
15 0.24257 0.00379 0.227 0.01178
16 0.24428 0.00498 0.227 0.01230
17 0.24666 0.00725 0.226 0.01341
18 0.26335 0.01618 0.234 0.01317
19 0.26668 0.01708 0.232 0.01760
20 0.26835 0.02764 0.228 0.01271
21 0.31629 0.04397 0.234 0.03831
22 0.45049 0.17176 0.23 0.04873
23 0.84555 0.57817 0.229 0.03838
24 1.60636 1.32032 0.23 0.05604
25 3.03952 2.83520 0.233 -0.02868
26 6.25979 5.97193 0.232 0.05586
variational
nqubits dry run simulation numba compile dry run - compile - simulation
3 0.22925 0.00031 0.221 0.00794
4 0.22741 0.00039 0.219 0.00802
5 0.23640 0.00047 0.227 0.00893
6 0.22941 0.00058 0.22 0.00883
7 0.22409 0.00059 0.215 0.00850
8 0.23157 0.00073 0.222 0.00884
9 0.23885 0.00082 0.229 0.00904
10 0.23040 0.00085 0.221 0.00855
11 0.23060 0.00091 0.221 0.00869
12 0.23717 0.00104 0.227 0.00913
13 0.23834 0.00113 0.227 0.01021
14 0.23341 0.00138 0.221 0.01102
15 0.22592 0.00147 0.215 0.00945
16 0.26394 0.00218 0.25 0.01175
17 0.23769 0.00374 0.221 0.01294
18 0.23274 0.00395 0.218 0.01079
19 0.24199 0.00633 0.223 0.01266
20 0.24995 0.01121 0.225 0.01374
21 0.27429 0.02264 0.229 0.02265
22 0.35193 0.10792 0.224 0.02001
23 0.51482 0.27490 0.223 0.01691
24 0.83086 0.57535 0.23 0.02551
25 1.38692 1.18283 0.225 -0.02091
26 2.80882 2.47557 0.222 0.11125
supremacy
nqubits dry run simulation numba compile dry run - compile - simulation
3 0.25817 0.00036 0.25 0.00781
4 0.26056 0.00043 0.252 0.00813
5 0.26628 0.00052 0.257 0.00877
6 0.26703 0.00059 0.258 0.00845
7 0.26437 0.00068 0.255 0.00868
8 0.27798 0.00084 0.268 0.00913
9 0.28658 0.00083 0.277 0.00875
10 0.26633 0.00093 0.256 0.00939
11 0.26570 0.00098 0.256 0.00872
12 0.27453 0.00115 0.264 0.00938
13 0.27708 0.00122 0.266 0.00986
14 0.28162 0.00146 0.271 0.00917
15 0.26878 0.00181 0.257 0.00997
16 0.27061 0.00226 0.259 0.00935
17 0.27021 0.00395 0.256 0.01025
18 0.27924 0.00699 0.261 0.01125
19 0.27765 0.00779 0.257 0.01287
20 0.27933 0.01420 0.253 0.01213
21 0.32220 0.04193 0.26 0.02027
22 0.40820 0.12379 0.259 0.02541
23 0.62679 0.35215 0.263 0.01164
24 1.00601 0.70773 0.263 0.03527
25 1.76145 1.48079 0.261 0.01966
26 3.42488 3.10351 0.261 0.06037

Here dry run and simulation are logged by the benchmark script, while the numba compile is read from the cProfile output as the cumulative time of the Dispatcher.compile calls. It appears that this function explains a large part of the dry run overhead. The only thing that I cannot really explain is the negative difference that appears in two cases for 25 qubits qft and variational.

@stavros11
Copy link
Member Author

Here are some single precision plots including tfq:

dry run time QFT

image

simulation time QFT

image

dry run time 20 qubits

image

simulation time 20 qubits

image

dry run time 28 qubits

image

simulation time 28 qubits

image

@scarrazza
Copy link
Member

Here dry run and simulation are logged by the benchmark script, while the numba compile is read from the cProfile output as the cumulative time of the Dispatcher.compile calls. It appears that this function explains a large part of the dry run overhead. The only thing that I cannot really explain is the negative difference that appears in two cases for 25 qubits qft and variational.

Thanks @stavros11, could you please rerun one of these examples by removing the cache=True flag? I find quite strange that we have a ~0.23s for loading, maybe this cache flag is not working?

@scarrazza scarrazza marked this pull request as ready for review March 14, 2022 15:14
@scarrazza scarrazza merged commit eb62385 into main Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QASM error with Supremacy circuit Include external codes benchmark
4 participants