Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI monitoring rotation schedule #11462

Closed
driazati opened this issue May 25, 2022 · 16 comments
Closed

CI monitoring rotation schedule #11462

driazati opened this issue May 25, 2022 · 16 comments
Labels

Comments

@driazati
Copy link
Member

driazati commented May 25, 2022

See the CI Monitoring Runbook for context.

This schedule tracks the CI monitoring rotation. If you would like to join, please comment on this issue or, if you are a committer, edit this issue directly.

Week (MM/DD) On-call
8/1 - 8/5 @driazati
8/8 - 8/12 @gigiblender
8/15 - 8/19 @shingjan
8/22 - 8/26 @cconvey
8/29 - 9/2 @sunggg
9/5 - 9/9 @driazati
9/12 - 9/16 @nverke
9/19 - 9/23 ---
9/26 - 9/30 ---
10/3 - 10/7 ---
@driazati driazati changed the title [ci] Monitoring rotation schedule CI Monitoring rotation schedule May 25, 2022
@driazati driazati changed the title CI Monitoring rotation schedule CI monitoring rotation schedule May 25, 2022
@Mousius Mousius pinned this issue May 26, 2022
@driazati
Copy link
Member Author

driazati commented Jun 1, 2022

This isn't part of the regular rotation but I thought I'd post a summary of my week so everyone has some more visibility into the process. Last week was lots of infra problems which hopefully are fixed now that we've increased capacity limits and fixed our cleanup logic.

@hpanda-naut
Copy link
Collaborator

@areusch @driazati I can take next week

@gigiblender
Copy link
Contributor

Happy to sign up for 6/27-7/1 if someone could edit the issue for me :)

@nverke
Copy link
Contributor

nverke commented Jun 27, 2022

Week of 6/20 - 6/27:

@sunggg
Copy link
Contributor

sunggg commented Jun 29, 2022

I can help on 7/11-7/15 :)

@gigiblender
Copy link
Contributor

gigiblender commented Jul 4, 2022

@sunggg
Copy link
Contributor

sunggg commented Jul 12, 2022

@yongwww
Copy link
Member

yongwww commented Jul 29, 2022

Summary for 7/25/22 - 7/29/22:

jenkins on eada707 - failure on android-rpc build
jenkins on 88bbb40 - failure on git push to https://github.com/apache/tvm-site.git
jenkins on 03aed78 - failure on ios-rpc
jenkins on 195e60b - failure on duplicate global packedfunc
jenkins on 9a4d80c - filed [Flaky Test] tests/python/unittest/test_custom_datatypes.py::test_myfloat #12238
jenkins on aeda760 - filed #12238
jenkins on 578ef03 - filed #12238

@driazati
Copy link
Member Author

driazati commented Aug 9, 2022

Summary for 2022-07-30 to 2022-08-06

Run Commit Mitigation
tvm-ci/branch 1f97f1f from #11809 Timeout fixed by #12334
tvm-ci/branch 485bfaf from #12306 Fixed by #12325
tvm-ci/branch 4158738 from #12301 Flaky test, filed #12311 and opened #12312 to improve reporting
tvm-ci/branch 2bfd52f from #12278 Fixed by #12282
tvm-ci/branch 12502cc from #12251 Fixed by #12268
tvm-ci/branch a231a1d from #12245 Added retries to fix: #12306
tvm-ci/branch fb87c21 from #12234 Fixed by #12306
tvm-ci/branch dff5c97 from #11037 Fixed by #12306
tvm-ci/branch db4380c from #12230 Fixed by #12268

Generated by https://gist.github.com/driazati/80cd48e86c6548cd90a6b39be010b921

@gigiblender
Copy link
Contributor

Summary for 2022-08-07 - 2022-08-14:
jenkins on fc411dc - Failure due to timeout in Cortex-M shards. Fixed in #12334.
jenkins on 9b86009 - Failure due to timeout in Cortex-M shards. Fixed in #12334.
jenkins on 8e133b1 - Sent interrupt signal to the CI.
jenkins on 7f800e4 - Fixed by #12341.
CI MacOS on 52d6b59 - Filed #12449.
jenkins on 2210206 - Failure due to linting the Jenkinsfile. Fixed by #12360.
jenkins on 5d72bc1 - Failure due to linting the Jenkinsfile. Fixed by #12360.
jenkins on 7f10015 - Failure due to linting the Jenkinsfile. Fixed by #12360.
jenkins on 52152e0 - Failure due to linting the Jenkinsfile. Fixed by #12387.
jenkins on 48354de - Failure due to linting the Jenkinsfile. Fixed by #12387.
jenkins on 5deb95a - Deploy docs failed to git push in the CI.
jenkins on c3c7c4c - Failed due to flaky test. #12451.
jenkins on 3eb6734 - Failed due to flaky test. #12451.
jenkins on 1737308 - Linting failed with exit code 4.

@shingjan
Copy link
Contributor

shingjan commented Aug 16, 2022

Summary for 2022-08-15 to 2022-08-22

Run Commit Mitigation
tvm-ci/branch d805ae3 from #12425 Internet errors - flaky test reported in #12465
tvm-ci/branch 1ba17fe from #12401 Timeout errors - flaky test reported in #12464
tvm-ci/branch bd56231 from #12443 Timeout errors - flaky test reported in #12464
tvm-ci/branch #12478 flaky test reported in #12511
tvm-ci/branch #12441 flaky test reported in #12511
tvm-ci/branch #12483 flaky test reported in #12511
tvm-ci/branch #12513 flaky test reported in #12511
tvm-ci/branch #12532 flaky test reported in #12511
tvm-ci/branch #12508 flaky test reported in #12511
tvm-ci/branch #12551 flaky test reported in #12511
tvm-ci/branch #12539 doc build failed

Generated by https://gist.github.com/driazati/80cd48e86c6548cd90a6b39be010b921

@cconvey
Copy link
Contributor

cconvey commented Aug 29, 2022

Summary for 2022-08-22 to 2022-08-29

Run Commit Mitigation
tvm-ci/branch 1afd059 from #12340 Checks API failure (#12602)
tvm-ci/branch 90b2f0d from #12557 maven problem (#12601)
CI / Android 8174d08 android_rpc build failure (#12599)
tvm-ci/branch 13ebbfb from #12562 Deploy docs failed to git push in the CI (#12600)
tvm-ci/branch 52779f1 from #12353 ethos-u failures (#12511)
tvm-ci/branch 3983a47 from #12543 ethos-u failures (#12511)
tvm-ci/branch d26bf80 from #12541 ethos-u failures (#12511)
tvm-ci/branch 5344128 from PR #12623 internal pytest failure during ethos-u testing (#12634)

Generated by https://gist.github.com/driazati/80cd48e86c6548cd90a6b39be010b921

@sunggg
Copy link
Contributor

sunggg commented Sep 2, 2022

Summary for 8/29 - 9/2

jenkins on 0de2219 - Unable to find image 'tlcpack/ci-lint:20220810-060142-fae79bbc3' locally
jenkins on 0de2219 - Segmentation fault (core dumped)
jenkins on 74988d3 - Failed unit tests
jenkins on 58ee935 - Failed unit tests
jenkins on a399e6c - Failed unit tests
jenkins on aa6c7123d0a2cdd93256c6a4576ff029008fd375- segfault in tests/scripts/setup-pytest-env.sh
jenkins on 50dad0d - failed to push due to merge conflcit
jenkins on eecb7fd - segfault in tests/scripts/setup-pytest-env.sh
jenkins on b2d6600 - ERROR tests/python/frontend/darknet/test_forward.py - urllib.error.HTTPError
jenkins on bb56f2a - http error 502 on ``tests/python/frontend/tflite/test_forward.py`
jenkins on 0549a08 - urllib.error.HTTPError: HTTP Error 503: Service Unavailable

@driazati driazati unpinned this issue Sep 19, 2022
@areusch areusch added the needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it label Oct 19, 2022
@hpanda-naut hpanda-naut added dev:ci and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Nov 15, 2022
@tqchen tqchen closed this as completed Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants