Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta Schedule][XGBoost] Update the custom callback function of xgboost in meta schedule #12141

Merged
merged 16 commits into from
Sep 26, 2022

Conversation

shingjan
Copy link
Contributor

@shingjan shingjan commented Jul 19, 2022

This PR intends to update the custom callback function of xgboost in meta schedule.

This change is tested against xgboost==(1.2.0, 1.5.2 & 1.6.0) to ensure backwards compatibility on tests/python/unittest/test_meta_schedule_cost_model.py.

This is related to the second action item in #12009.

cc: @zxybazh @junrushao1994

@zxybazh zxybazh marked this pull request as ready for review July 22, 2022 22:45
Copy link
Member

@zxybazh zxybazh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Generally it looks good, some more unit tests are required, and a few nit picks. I would also expect some integration test locally with the tune_relay functions to make sure the tuning works fine with migrated cost model.

python/tvm/meta_schedule/cost_model/xgb_model.py Outdated Show resolved Hide resolved
python/tvm/meta_schedule/cost_model/xgb_model.py Outdated Show resolved Hide resolved
@shingjan shingjan force-pushed the meta_schedule_xgboost_callback branch from 2fda92d to 2f42667 Compare July 27, 2022 01:37
@shingjan shingjan requested a review from zxybazh July 27, 2022 01:39
@shingjan shingjan force-pushed the meta_schedule_xgboost_callback branch from 2f42667 to 2c80287 Compare July 27, 2022 03:21
Copy link
Member

@zxybazh zxybazh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes and feedbacks. The test looks ok to me and I will be fine with it as long as we can pass the CI. Let's finish the PR and get it in once the local integration tests are done!

@shingjan shingjan force-pushed the meta_schedule_xgboost_callback branch from be956d7 to 9444134 Compare July 27, 2022 19:01
@shingjan shingjan requested a review from zxybazh July 27, 2022 20:07
Copy link
Member

@zxybazh zxybazh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit, otherwise LGTM. Would you please also run some local integration tests and let me know the results?

@shingjan
Copy link
Contributor Author

Local integration test for resnet18/llvm:

 ID |                                                                        Name |      FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  0 |                                                         fused_nn_conv2d_add |  12870144 |      1 |       375.4210 |      34.2819 |               34.2819 |     32 |          Y 
  1 |                                                       fused_nn_conv2d_add_1 |  12895232 |      1 |       398.5375 |      32.3564 |               32.3564 |     32 |          Y 
  2 |                                                       fused_nn_conv2d_add_2 |  12945408 |      1 |       464.8020 |      27.8514 |               27.8514 |     32 |          Y 
  3 |                                                      fused_layout_transform |         1 |      1 |         0.0002 |       5.7608 |                5.7608 |      2 |          Y 
  4 |                                                 fused_nn_conv2d_add_nn_relu | 237633536 |      1 |       387.8015 |     612.7711 |              612.7711 |     32 |          Y 
  5 |                                                         fused_nn_max_pool2d |   1806336 |      1 |       157.2717 |      11.4854 |               11.4854 |     32 |          Y 
  6 |                                               fused_nn_conv2d_add_nn_relu_1 | 231612416 |      2 |       383.7106 |     603.6122 |             1207.2245 |     32 |          Y 
  7 |                                             fused_nn_conv2d_add_add_nn_relu | 231813120 |      2 |       442.1804 |     524.2501 |             1048.5002 |     32 |          Y 
  8 |                                               fused_nn_conv2d_add_nn_relu_2 | 115806208 |      1 |       362.1544 |     319.7703 |              319.7703 |     32 |          Y 
  9 |       fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu |  93227008 |      1 |       293.8712 |     317.2377 |              317.2377 |     32 |          Y 
 10 |   fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu |  93327360 |      2 |       281.1145 |     331.9906 |              663.9812 |     32 |          Y 
 11 |                                               fused_nn_conv2d_add_nn_relu_3 | 115705856 |      1 |       437.5283 |     264.4534 |              264.4534 |     32 |          Y 
 12 |     fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_1 |  98600960 |      1 |       330.2098 |     298.6010 |              298.6010 |     32 |          Y 
 13 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu_1 |  98651136 |      2 |       298.1799 |     330.8444 |              661.6887 |     32 |          Y 
 14 |                                               fused_nn_conv2d_add_nn_relu_4 | 115655680 |      1 |       381.0609 |     303.5097 |              303.5097 |     32 |          Y 
 15 |                                               fused_nn_conv2d_add_nn_relu_5 | 231261184 |      1 |       408.4514 |     566.1902 |              566.1902 |     32 |          Y 
 16 |                                           fused_nn_conv2d_add_add_nn_relu_1 | 231286272 |      2 |       332.2502 |     696.1209 |             1392.2417 |     32 |          Y 
 17 |                                                fused_nn_adaptive_avg_pool2d |     25600 |      1 |         5.7029 |       4.4890 |                4.4890 |     32 |          Y 
 18 |                                      fused_layout_transform_reshape_squeeze |         1 |      1 |         0.0003 |       3.6907 |                3.6907 |      1 |            
 19 |                                                          fused_nn_dense_add |   1025000 |      1 |       161.2829 |       6.3553 |                6.3553 |     32 |          Y 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Profiler table:

 ID |                                        Name | Time (min) | Percentage 
----------------------------------------------------------------------------
    |                                       Total |    61.9612 |   100.0000 
  1 |                   EvoSearch/Evolve/Mutation |    30.6629 |    49.4873 
  2 |              EvoSearch/SampleInitPopulation |     8.4259 |    13.5987 
  3 |                               SendToBuilder |     8.4254 |    13.5979 
  4 |                       EvoSearch/Evolve/Misc |     6.0477 |     9.7604 
  5 |     EvoSearch/Evolve/PredictNormalizedScore |     3.3436 |     5.3962 
  6 |                                SendToRunner |     2.3616 |     3.8115 
  7 |                            ApplyHistoryBest |     1.5547 |     2.5091 
  8 |                              TaskExtraction |     0.4576 |     0.7386 
  9 |             MeasureCallback/UpdateCostModel |     0.1178 |     0.1901 
 10 |                              InitializeTask |     0.1092 |     0.1762 
 11 |               MeasureCallback/AddToDatabase |     0.0181 |     0.0292 
 12 |                 EvoSearch/PickWithEpsGreedy |     0.0152 |     0.0245 
 13 |              EvoSearch/PickBestFromDatabase |     0.0149 |     0.0241 
 14 |              MeasureCallback/EchoStatistics |     0.0050 |     0.0081 
 15 |         MeasureCallback/RemoveBuildArtifact |     0.0009 |     0.0015 
 16 |                           JoinRunnerFutures |     0.0003 |     0.0005 
 17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads |     0.0000 |     0.0000 
----------------------------------------------------------------------------

@shingjan
Copy link
Contributor Author

@tvm-bot rerun

@shingjan
Copy link
Contributor Author

bert base llvm:

 ID |                                                              Name |      FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  0 |                                                        fused_take |         1 |      1 |         0.0002 |       5.2482 |                5.2482 |      1 |            
  1 |                                      fused_nn_dense_add_fast_tanh |   1204224 |      1 |       114.1531 |      10.5492 |               10.5492 |     32 |          Y 
  2 |                       fused_reshape_add_reshape_transpose_reshape |     49152 |     12 |         1.7456 |      28.1570 |              337.8840 |      1 |          Y 
  3 |                                                    fused_variance |    147520 |     25 |        17.0011 |       8.6771 |              216.9272 |     32 |          Y 
  4 |                                                        fused_mean |     49216 |     25 |         6.1650 |       7.9831 |              199.5783 |     32 |          Y 
  5 |                                               fused_cast_take_add |     49152 |      1 |         2.5341 |      19.3960 |               19.3960 |      2 |          Y 
  6 |                     fused_reshape_add_reshape_transpose_reshape_1 |     49152 |     24 |         4.7436 |      10.3617 |              248.6808 |      1 |          Y 
  7 |                                          fused_reshape_divide_add |     98304 |     12 |        12.9576 |       7.5866 |               91.0392 |      2 |          Y 
  8 |                                             fused_nn_fast_softmax |   4374528 |     12 |        89.9165 |      48.6510 |              583.8123 |     32 |          Y 
  9 |                                                     fused_reshape |         1 |     12 |         0.0000 |     117.9034 |             1414.8410 |      1 |          Y 
 10 |                                             fused_nn_batch_matmul |   6291456 |     24 |       141.6181 |      44.4255 |             1066.2123 |     32 |          Y 
 11 |                                   fused_reshape_transpose_reshape |         1 |     12 |         0.0000 |      29.6311 |              355.5735 |      1 |          Y 
 12 |                                                    fused_nn_dense |  75497472 |     48 |       191.4222 |     394.4030 |            18931.3435 |     32 |          Y 
 13 |                                                   fused_reshape_1 |         1 |     24 |         0.0001 |      10.0435 |              241.0438 |      1 |          Y 
 14 |                                                  fused_nn_dense_1 | 301989888 |     12 |       178.7608 |    1689.3522 |            20272.2265 |     32 |          Y 
 15 | fused_reshape_add_multiply_fast_erf_multiply_add_multiply_reshape |  15532032 |     12 |         4.0672 |    3818.8959 |            45826.7502 |      1 |          Y 
 16 |                                                  fused_nn_dense_2 | 301989888 |     12 |       240.5859 |    1255.2267 |            15062.7200 |     32 |          Y 
 17 |                                             fused_reshape_add_add |     98304 |     24 |        12.5405 |       7.8389 |              188.1338 |      2 |          Y 
 18 |                       fused_subtract_add_sqrt_divide_multiply_add |    196672 |     25 |        12.7443 |      15.4322 |              385.8043 |      2 |          Y 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

profiler table

 ID |                                        Name | Time (min) | Percentage 
----------------------------------------------------------------------------
    |                                       Total |    15.5728 |   100.0000 
  1 |             MeasureCallback/UpdateCostModel |     5.2182 |    33.5082 
  2 |     EvoSearch/Evolve/PredictNormalizedScore |     2.4700 |    15.8609 
  3 |                   EvoSearch/Evolve/Mutation |     2.1395 |    13.7387 
  4 |                                SendToRunner |     1.5999 |    10.2737 
  5 |                       EvoSearch/Evolve/Misc |     1.5694 |    10.0778 
  6 |                               SendToBuilder |     0.9345 |     6.0006 
  7 |              EvoSearch/SampleInitPopulation |     0.7653 |     4.9146 
  8 |                            ApplyHistoryBest |     0.5334 |     3.4250 
  9 |                              TaskExtraction |     0.1634 |     1.0490 
 10 |                              InitializeTask |     0.0280 |     0.1798 
 11 |                 EvoSearch/PickWithEpsGreedy |     0.0047 |     0.0304 
 12 |              EvoSearch/PickBestFromDatabase |     0.0036 |     0.0232 
 13 |               MeasureCallback/AddToDatabase |     0.0019 |     0.0125 
 14 |         MeasureCallback/RemoveBuildArtifact |     0.0004 |     0.0028 
 15 |              MeasureCallback/EchoStatistics |     0.0002 |     0.0010 
 16 |                           JoinRunnerFutures |     0.0002 |     0.0010 
 17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads |     0.0000 |     0.0000 
----------------------------------------------------------------------------

@shingjan
Copy link
Contributor Author

bert base cuda:

 ID |                                                              Name |      FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  0 |                                                        fused_take |         1 |      1 |         0.0005 |       2.1319 |                2.1319 |      5 |            
  1 |                                      fused_nn_dense_add_fast_tanh |   1204224 |      1 |        36.6140 |      32.8897 |               32.8897 |     32 |          Y 
  2 |                       fused_reshape_add_reshape_transpose_reshape |     49152 |     12 |        13.5008 |       3.6407 |               43.6879 |      6 |          Y 
  3 |                                                    fused_variance |    147520 |     25 |        65.9260 |       2.2377 |               55.9415 |     32 |          Y 
  4 |                                                        fused_mean |     49216 |     25 |        21.9872 |       2.2384 |               55.9597 |     32 |          Y 
  5 |                                               fused_cast_take_add |     49152 |      1 |        20.9740 |       2.3435 |                2.3435 |      6 |            
  6 |                     fused_reshape_add_reshape_transpose_reshape_1 |     49152 |     24 |        20.6382 |       2.3816 |               57.1585 |      6 |          Y 
  7 |                                          fused_reshape_divide_add |     98304 |     12 |        43.8752 |       2.2405 |               26.8864 |      6 |          Y 
  8 |                                             fused_nn_fast_softmax |   4374528 |     12 |      1141.5252 |       3.8322 |               45.9861 |     32 |          Y 
  9 |                                                     fused_reshape |         1 |     12 |         0.0005 |       2.1836 |               26.2035 |      6 |          Y 
 10 |                                             fused_nn_batch_matmul |   6291456 |     24 |       684.4451 |       9.1921 |              220.6093 |     32 |          Y 
 11 |                                   fused_reshape_transpose_reshape |         1 |     12 |         0.0005 |       2.1763 |               26.1151 |      6 |          Y 
 12 |                                                    fused_nn_dense |  75497472 |     48 |       918.1956 |      82.2237 |             3946.7393 |     32 |          Y 
 13 |                                                   fused_reshape_1 |         1 |     24 |         0.0005 |       2.1895 |               52.5487 |      6 |          Y 
 14 |                                                  fused_nn_dense_1 | 301989888 |     12 |      2381.8300 |     126.7890 |             1521.4682 |     32 |          Y 
 15 | fused_reshape_add_multiply_fast_erf_multiply_add_multiply_reshape |  15532032 |     12 |      4892.7944 |       3.1745 |               38.0936 |      6 |          Y 
 16 |                                                  fused_nn_dense_2 | 301989888 |     12 |      1758.6493 |     171.7170 |             2060.6034 |     32 |          Y 
 17 |                                             fused_reshape_add_add |     98304 |     24 |        39.4395 |       2.4925 |               59.8207 |      6 |          Y 
 18 |                       fused_subtract_add_sqrt_divide_multiply_add |    196672 |     25 |        72.4898 |       2.7131 |               67.8275 |      6 |          Y 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

profiler table:

 ID |                                        Name | Time (min) | Percentage 
----------------------------------------------------------------------------
    |                                       Total |    22.2403 |   100.0000 
  1 |     EvoSearch/Evolve/PredictNormalizedScore |     9.5203 |    42.8065 
  2 |                   EvoSearch/Evolve/Mutation |     3.3615 |    15.1146 
  3 |                               SendToBuilder |     2.3562 |    10.5943 
  4 |              EvoSearch/SampleInitPopulation |     2.3124 |    10.3975 
  5 |                       EvoSearch/Evolve/Misc |     2.1767 |     9.7870 
  6 |                                SendToRunner |     1.6900 |     7.5987 
  7 |                            ApplyHistoryBest |     0.3483 |     1.5662 
  8 |                              TaskExtraction |     0.2121 |     0.9535 
  9 |             MeasureCallback/UpdateCostModel |     0.0500 |     0.2248 
 10 |              EvoSearch/PickBestFromDatabase |     0.0158 |     0.0710 
 11 |                              InitializeTask |     0.0095 |     0.0429 
 12 |                 EvoSearch/PickWithEpsGreedy |     0.0069 |     0.0310 
 13 |               MeasureCallback/AddToDatabase |     0.0029 |     0.0130 
 14 |         MeasureCallback/RemoveBuildArtifact |     0.0008 |     0.0037 
 15 |              MeasureCallback/EchoStatistics |     0.0006 |     0.0028 
 16 |                           JoinRunnerFutures |     0.0003 |     0.0012 
 17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads |     0.0000 |     0.0000 
----------------------------------------------------------------------------

@shingjan
Copy link
Contributor Author

resnet18 cuda:

 ID |                                                                        Name |      FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  0 |                                                         fused_nn_conv2d_add |  12870144 |      1 |       965.5146 |      13.3298 |               13.3298 |     32 |          Y 
  1 |                                                       fused_nn_conv2d_add_1 |  12895232 |      1 |      1330.3102 |       9.6934 |                9.6934 |     32 |          Y 
  2 |                                                       fused_nn_conv2d_add_2 |  12945408 |      1 |      2103.2869 |       6.1548 |                6.1548 |     32 |          Y 
  3 |                                                      fused_layout_transform |         1 |      1 |         0.0002 |       5.0254 |                5.0254 |      6 |          Y 
  4 |                                                 fused_nn_conv2d_add_nn_relu | 237633536 |      1 |      6085.8811 |      39.0467 |               39.0467 |     32 |          Y 
  5 |                                                         fused_nn_max_pool2d |   1806336 |      1 |       328.9316 |       5.4915 |                5.4915 |     30 |          Y 
  6 |       fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu | 128651264 |      2 |      2512.3525 |      51.2075 |              102.4150 |     32 |          Y 
  7 |   fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu | 128851968 |      2 |      1360.9069 |      94.6810 |              189.3619 |     32 |          Y 
  8 |                                               fused_nn_conv2d_add_nn_relu_1 | 115806208 |      1 |      2482.7300 |      46.6447 |               46.6447 |     32 |          Y 
  9 |     fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_1 | 127045632 |      1 |      3352.8175 |      37.8922 |               37.8922 |     32 |          Y 
 10 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu_1 | 127145984 |      2 |      1854.8033 |      68.5496 |              137.0992 |     32 |          Y 
 11 |                                               fused_nn_conv2d_add_nn_relu_2 | 115705856 |      1 |      3359.4190 |      34.4422 |               34.4422 |     32 |          Y 
 12 |     fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_2 | 114903040 |      1 |      2106.9193 |      54.5360 |               54.5360 |     32 |          Y 
 13 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu_2 | 114953216 |      2 |      1723.1163 |      66.7124 |              133.4248 |     32 |          Y 
 14 |                                               fused_nn_conv2d_add_nn_relu_3 | 115655680 |      1 |      1007.9003 |     114.7491 |              114.7491 |     32 |          Y 
 15 |     fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_3 | 142132224 |      1 |      1615.3274 |      87.9897 |               87.9897 |     32 |          Y 
 16 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu_3 | 142157312 |      2 |      1053.2288 |     134.9729 |              269.9457 |     32 |          Y 
 17 |                                                fused_nn_adaptive_avg_pool2d |     25600 |      1 |         5.8995 |       4.3393 |                4.3393 |     32 |          Y 
 18 |                                      fused_layout_transform_reshape_squeeze |         1 |      1 |         0.0003 |       3.2615 |                3.2615 |      5 |            
 19 |                                                          fused_nn_dense_add |   1025000 |      1 |        68.4000 |      14.9854 |               14.9854 |     32 |          Y 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

profiler table:

 ID |                                        Name | Time (min) | Percentage 
----------------------------------------------------------------------------
    |                                       Total |    83.8914 |   100.0000 
  1 |                   EvoSearch/Evolve/Mutation |    48.0123 |    57.2314 
  2 |                               SendToBuilder |    13.7197 |    16.3541 
  3 |              EvoSearch/SampleInitPopulation |     7.9949 |     9.5300 
  4 |     EvoSearch/Evolve/PredictNormalizedScore |     4.0848 |     4.8691 
  5 |                                SendToRunner |     3.6898 |     4.3983 
  6 |                       EvoSearch/Evolve/Misc |     2.7077 |     3.2277 
  7 |             MeasureCallback/UpdateCostModel |     1.8705 |     2.2297 
  8 |                            ApplyHistoryBest |     0.7764 |     0.9254 
  9 |                              TaskExtraction |     0.5058 |     0.6030 
 10 |                              InitializeTask |     0.0267 |     0.0318 
 11 |               MeasureCallback/AddToDatabase |     0.0142 |     0.0170 
 12 |              EvoSearch/PickBestFromDatabase |     0.0131 |     0.0157 
 13 |                 EvoSearch/PickWithEpsGreedy |     0.0100 |     0.0119 
 14 |              MeasureCallback/EchoStatistics |     0.0037 |     0.0044 
 15 |         MeasureCallback/RemoveBuildArtifact |     0.0019 |     0.0023 
 16 |                           JoinRunnerFutures |     0.0005 |     0.0006 
 17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads |     0.0000 |     0.0000 
----------------------------------------------------------------------------

@shingjan
Copy link
Contributor Author

mobilenetv2 on cuda

 ID |                                   Name |     FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated 
-----------------------------------------------------------------------------------------------------------------------------------------------
  0 |                 fused_layout_transform |        1 |      1 |         0.0004 |       2.2798 |                2.2798 |      6 |            
  1 |               fused_nn_conv2d_add_clip | 22880256 |      1 |      3151.3187 |       7.2605 |                7.2605 |     32 |          Y 
  2 |             fused_nn_conv2d_add_clip_1 |  8429568 |      1 |      1285.2570 |       6.5587 |                6.5587 |     32 |          Y 
  3 |                    fused_nn_conv2d_add | 13045760 |      1 |      2104.8376 |       6.1980 |                6.1980 |     32 |          Y 
  4 |             fused_nn_conv2d_add_clip_2 | 42147840 |      1 |      2994.8494 |      14.0734 |               14.0734 |     32 |          Y 
  5 |             fused_nn_conv2d_add_clip_3 |  6322176 |      1 |       682.4610 |       9.2638 |                9.2638 |     32 |          Y 
  6 |                  fused_nn_conv2d_add_1 | 14525952 |      1 |      1936.7547 |       7.5002 |                7.5002 |     32 |          Y 
  7 |             fused_nn_conv2d_add_clip_4 |  9483264 |      1 |      1537.1009 |       6.1696 |                6.1696 |     32 |          Y 
  8 |                fused_nn_conv2d_add_add | 21826560 |      1 |      2005.0549 |      10.8858 |               10.8858 |     32 |          Y 
  9 |             fused_nn_conv2d_add_clip_5 | 23030784 |      2 |      1914.2627 |      12.0312 |               24.0623 |     32 |          Y 
 10 |             fused_nn_conv2d_add_clip_6 |  2370816 |      1 |       393.0634 |       6.0316 |                6.0316 |     32 |          Y 
 11 |                  fused_nn_conv2d_add_2 |  7250432 |      1 |       917.3106 |       7.9040 |                7.9040 |     32 |          Y 
 12 |             fused_nn_conv2d_add_clip_7 |  3161088 |      2 |       262.2023 |      12.0559 |               24.1118 |     32 |          Y 
 13 |              fused_nn_conv2d_add_add_1 |  9683968 |      2 |      1061.2357 |       9.1252 |               18.2504 |     32 |          Y 
 14 |             fused_nn_conv2d_add_clip_8 | 10085376 |      3 |       737.1134 |      13.6823 |               41.0468 |     32 |          Y 
 15 |             fused_nn_conv2d_add_clip_9 |   790272 |      1 |       170.2160 |       4.6428 |                4.6428 |     32 |          Y 
 16 |                  fused_nn_conv2d_add_3 |  4829440 |      1 |       957.4766 |       5.0439 |                5.0439 |     32 |          Y 
 17 |              fused_nn_conv2d_add_add_2 |  9658880 |      3 |       919.5057 |      10.5044 |               31.5133 |     32 |          Y 
 18 |            fused_nn_conv2d_add_clip_10 |  9859584 |      4 |      1410.7424 |       6.9889 |               27.9557 |     32 |          Y 
 19 |            fused_nn_conv2d_add_clip_11 |  1580544 |      4 |       361.8447 |       4.3680 |               17.4721 |     32 |          Y 
 20 |                  fused_nn_conv2d_add_4 | 14469504 |      1 |       739.5858 |      19.5643 |               19.5643 |     32 |          Y 
 21 |            fused_nn_conv2d_add_clip_12 |  2370816 |      2 |       503.2051 |       4.7114 |                9.4229 |     32 |          Y 
 22 |              fused_nn_conv2d_add_add_3 | 21713664 |      2 |      1405.4021 |      15.4501 |               30.9003 |     32 |          Y 
 23 |            fused_nn_conv2d_add_clip_13 | 22014720 |      3 |      2486.7910 |       8.8527 |               26.5580 |     32 |          Y 
 24 |            fused_nn_conv2d_add_clip_14 |   592704 |      1 |       125.2444 |       4.7324 |                4.7324 |     32 |          Y 
 25 |                  fused_nn_conv2d_add_5 |  9039520 |      1 |       410.3605 |      22.0282 |               22.0282 |     32 |          Y 
 26 |              fused_nn_conv2d_add_add_4 | 15068480 |      2 |       411.3220 |      36.6343 |               73.2685 |     32 |          Y 
 27 |            fused_nn_conv2d_add_clip_15 | 15193920 |      3 |      1503.2292 |      10.1075 |               30.3226 |     32 |          Y 
 28 |            fused_nn_conv2d_add_clip_16 |   987840 |      3 |       224.3443 |       4.4032 |               13.2097 |     32 |          Y 
 29 |                  fused_nn_conv2d_add_6 | 30121280 |      1 |      1749.6604 |      17.2155 |               17.2155 |     32 |          Y 
 30 |            fused_nn_conv2d_add_clip_17 | 40328960 |      1 |      2609.1046 |      15.4570 |               15.4570 |     32 |          Y 
 31 |           fused_nn_adaptive_avg_pool2d |    64000 |      1 |        16.9965 |       3.7655 |                3.7655 |     32 |          Y 
 32 | fused_layout_transform_reshape_squeeze |        1 |      1 |         0.0002 |       4.3296 |                4.3296 |      6 |          Y 
 33 |                     fused_nn_dense_add |  2561000 |      1 |        66.7119 |      38.3890 |               38.3890 |     32 |          Y 
-----------------------------------------------------------------------------------------------------------------------------------------------

profiler table

 ID |                                        Name | Time (min) | Percentage 
----------------------------------------------------------------------------
    |                                       Total |    82.0160 |   100.0000 
  1 |                   EvoSearch/Evolve/Mutation |    42.1468 |    51.3885 
  2 |                               SendToBuilder |    15.1365 |    18.4556 
  3 |              EvoSearch/SampleInitPopulation |     7.9139 |     9.6492 
  4 |                                SendToRunner |     6.4504 |     7.8648 
  5 |     EvoSearch/Evolve/PredictNormalizedScore |     2.7957 |     3.4087 
  6 |             MeasureCallback/UpdateCostModel |     2.7350 |     3.3348 
  7 |                       EvoSearch/Evolve/Misc |     2.6240 |     3.1994 
  8 |                            ApplyHistoryBest |     1.1672 |     1.4232 
  9 |                              TaskExtraction |     0.4503 |     0.5490 
 10 |                              InitializeTask |     0.0250 |     0.0304 
 11 |               MeasureCallback/AddToDatabase |     0.0198 |     0.0241 
 12 |                 EvoSearch/PickWithEpsGreedy |     0.0100 |     0.0122 
 13 |         MeasureCallback/RemoveBuildArtifact |     0.0034 |     0.0041 
 14 |              EvoSearch/PickBestFromDatabase |     0.0032 |     0.0039 
 15 |              MeasureCallback/EchoStatistics |     0.0027 |     0.0033 
 16 |                           JoinRunnerFutures |     0.0013 |     0.0016 
 17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads |     0.0000 |     0.0000 
----------------------------------------------------------------------------

@shingjan
Copy link
Contributor Author

bert base on llvm 20k trials:

 ID |                                                              Name |      FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  0 |                                                        fused_take |         1 |      1 |         0.0001 |      12.9686 |               12.9686 |      1 |          Y 
  1 |                                      fused_nn_dense_add_fast_tanh |   1204224 |      1 |        84.5479 |      14.2431 |               14.2431 |     32 |          Y 
  2 |                       fused_reshape_add_reshape_transpose_reshape |     49152 |     12 |         5.3101 |       9.2562 |              111.0749 |      1 |          Y 
  3 |                                                    fused_variance |    147520 |     25 |        21.8394 |       6.7548 |              168.8690 |    191 |          Y 
  4 |                                                        fused_mean |     49216 |     25 |        11.7478 |       4.1894 |              104.7344 |    159 |          Y 
  5 |                                               fused_cast_take_add |     49152 |      1 |         3.6734 |      13.3805 |               13.3805 |      2 |          Y 
  6 |                     fused_reshape_add_reshape_transpose_reshape_1 |     49152 |     24 |         0.4843 |     101.4931 |             2435.8337 |      1 |          Y 
  7 |                                          fused_reshape_divide_add |     98304 |     12 |        12.6803 |       7.7525 |               93.0296 |      2 |          Y 
  8 |                                             fused_nn_fast_softmax |   4374528 |     12 |       207.0953 |      21.1233 |              253.4791 |    288 |          Y 
  9 |                                                     fused_reshape |         1 |     12 |         0.0001 |      12.0269 |              144.3223 |      1 |          Y 
 10 |                                             fused_nn_batch_matmul |   6291456 |     24 |       462.0523 |      13.6163 |              326.7919 |    384 |          Y 
 11 |                                   fused_reshape_transpose_reshape |         1 |     12 |         0.0000 |      66.8140 |              801.7686 |      1 |          Y 
 12 |                                                    fused_nn_dense |  75497472 |     48 |       613.1287 |     123.1348 |             5910.4700 |   6656 |            
 13 |                                                   fused_reshape_1 |         1 |     24 |         0.0000 |      49.1952 |             1180.6855 |      1 |          Y 
 14 |                                                  fused_nn_dense_1 | 301989888 |     12 |       664.1287 |     454.7159 |             5456.5913 |   6144 |            
 15 | fused_reshape_add_multiply_fast_erf_multiply_add_multiply_reshape |  15532032 |     12 |        32.6868 |     475.1782 |             5702.1385 |      1 |          Y 
 16 |                                                  fused_nn_dense_2 | 301989888 |     12 |       662.0116 |     456.1701 |             5474.0410 |   6144 |            
 17 |                                             fused_reshape_add_add |     98304 |     24 |         1.3333 |      73.7283 |             1769.4793 |      2 |          Y 
 18 |                       fused_subtract_add_sqrt_divide_multiply_add |    196672 |     25 |         2.6162 |      75.1739 |             1879.3469 |      2 |          Y 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total trials: 20013
Total latency (us): 31853.2

profiler table

 ID |                                        Name | Time (min) | Percentage 
----------------------------------------------------------------------------
    |                                       Total |   359.8455 |   100.0000 
  1 |                                SendToRunner |   118.7806 |    33.0088 
  2 |     EvoSearch/Evolve/PredictNormalizedScore |    62.0087 |    17.2320 
  3 |                               SendToBuilder |    56.9247 |    15.8192 
  4 |             MeasureCallback/UpdateCostModel |    42.1284 |    11.7074 
  5 |                   EvoSearch/Evolve/Mutation |    40.9665 |    11.3845 
  6 |                       EvoSearch/Evolve/Misc |    21.9481 |     6.0993 
  7 |              EvoSearch/SampleInitPopulation |     7.9898 |     2.2203 
  8 |              EvoSearch/PickBestFromDatabase |     2.4416 |     0.6785 
  9 |                            ApplyHistoryBest |     0.5137 |     0.1428 
 10 |               MeasureCallback/AddToDatabase |     0.1833 |     0.0509 
 11 |                              TaskExtraction |     0.1798 |     0.0500 
 12 |                 EvoSearch/PickWithEpsGreedy |     0.0540 |     0.0150 
 13 |         MeasureCallback/RemoveBuildArtifact |     0.0453 |     0.0126 
 14 |                              InitializeTask |     0.0440 |     0.0122 
 15 |              MeasureCallback/EchoStatistics |     0.0310 |     0.0086 
 16 |                           JoinRunnerFutures |     0.0118 |     0.0033 
 17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads |     0.0116 |     0.0032 
----------------------------------------------------------------------------

@shingjan shingjan requested a review from zxybazh August 8, 2022 19:11
@shingjan shingjan force-pushed the meta_schedule_xgboost_callback branch from 087ed69 to 918ea89 Compare September 14, 2022 19:32
@zxybazh zxybazh force-pushed the meta_schedule_xgboost_callback branch from 918ea89 to 33eb891 Compare September 20, 2022 22:13
@shingjan
Copy link
Contributor Author

@zxybazh The pandas warning should be suppressed now with the last commit.

@zxybazh zxybazh force-pushed the meta_schedule_xgboost_callback branch from 46507b3 to 33eb891 Compare September 23, 2022 22:50
@shingjan shingjan force-pushed the meta_schedule_xgboost_callback branch from 33eb891 to 08663ae Compare September 23, 2022 23:09
Copy link
Member

@zxybazh zxybazh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shingjan for the hard work and @junrushao for reviewing!

Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zxybazh
Copy link
Member

zxybazh commented Sep 25, 2022

@tvm-bot rerun

@zxybazh zxybazh merged commit c8423a6 into apache:main Sep 26, 2022
junrushao pushed a commit that referenced this pull request Sep 30, 2022
Previous upgrade introduced a import of xgboost in meta_schedule, removed in current version by using a function to return the call back class.

We've recently introduced a XGBoost Model upgrade to support new xgboost version of callback class in #12141. However, in this PR it uses a function called `optional_xgboost_callback` that works to avoid compatibility issue (xgboost 1.5.2 v.s. 1.6.0). In this specific function, it tries to import the newly introduced xgboost callback class and create a new class using it as base class. This actually imported xgboost when meta_schedule is imported, which is not ideal because xgboost is not a dependency of tvm and meta_schedule, it should only be required when xgboost cost model is employed. This PR fixes the problem by moving the class and the function mentioned above under a function that returns this class when needed. In this way we avoided unwanted import of xgboost in meta_schedule.
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
…st in meta schedule (apache#12141)

* update the custom callback function of xgboost

* fix lint

* fix ci

* fix lint

* add unit test

* remote unused code

* fix lint

* add decorator

* address comment

* fix lint

* address comments

* fix mypy

* fix lint

* remove unused comments

* address comments

* Fix xgboost unit test import.

Co-authored-by: Xiyou Zhou <xiyou@octoml.ai>
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
Previous upgrade introduced a import of xgboost in meta_schedule, removed in current version by using a function to return the call back class.

We've recently introduced a XGBoost Model upgrade to support new xgboost version of callback class in apache#12141. However, in this PR it uses a function called `optional_xgboost_callback` that works to avoid compatibility issue (xgboost 1.5.2 v.s. 1.6.0). In this specific function, it tries to import the newly introduced xgboost callback class and create a new class using it as base class. This actually imported xgboost when meta_schedule is imported, which is not ideal because xgboost is not a dependency of tvm and meta_schedule, it should only be required when xgboost cost model is employed. This PR fixes the problem by moving the class and the function mentioned above under a function that returns this class when needed. In this way we avoided unwanted import of xgboost in meta_schedule.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants