Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in tokio time wheel #1452

Closed
teor2345 opened this issue Dec 4, 2020 · 5 comments · Fixed by #1453 or #1511
Closed

Panic in tokio time wheel #1452

teor2345 opened this issue Dec 4, 2020 · 5 comments · Fixed by #1453 or #1511
Labels
C-bug Category: This is a bug

Comments

@teor2345
Copy link
Collaborator

teor2345 commented Dec 4, 2020

Analysis

This looks like tokio-rs/tokio#2789, which seems to happen due to a race condition in the multithreaded runtime.

Some users have reported that it happens during shutdown, but from the logs, it looks like Zebra was just syncing with a small number of peers.

Versions

zebrad commit ef7e91c on main, 2020-12-04 (#1443).

zebrad commit 548be51 in #1444, 2020-12-03.

This commit corresponds to rebased commit 7f823e7 partway through #1444, the most recent shared commit with main is c0bbac8 (#1441).

Platform

Linux ... 5.4.75 #1-NixOS SMP Thu Nov 5 10:43:38 UTC 2020 x86_64 GNU/Linux

Error

Message:  elapsed=55412452; when=55412452                                                                                                                                                      
Location: /home/dev/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.3.5/src/time/driver/wheel/mod.rs:291   

Metadata

key value
version 3.0.0-alpha.0
git commit 548be51
location /home/dev/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.3.5/src/time/driver/wheel/mod.rs:291:5

Logs

Click the triangle to show detailed logs:

Logs: ``` Dec 04 06:42:40.969 INFO {zebrad="548be51f"}:sync:obtain_tips: zebra_network::peer_set::set: network request with no ready peers: finding more peers, waiting for 2 peers to answer requests Dec 04 06:42:46.970 INFO {zebrad="548be51f"}:sync: zebrad::components::sync: exhausted prospective tip set Dec 04 06:42:46.970 INFO {zebrad="548be51f"}:sync: zebrad::components::sync: waiting to restart sync timeout=45s Dec 04 06:43:31.972 INFO {zebrad="548be51f"}:sync: zebrad::components::sync: starting sync, obtaining new tips Dec 04 06:43:31.972 INFO {zebrad="548be51f"}:sync:obtain_tips:state: zebra_state::util: created block locator tip_height=Height(1063879) min_locator_height=1063780 locators=[Height(1063879), Height(1063878), Height(1063877), Height(1063875), Height(1063871), Height(1063863), Height(1063847), Height(1063815), Height(1063780)] Dec 04 06:43:37.973 INFO {zebrad="548be51f"}:sync: zebrad::components::sync: exhausted prospective tip set Dec 04 06:43:37.973 INFO {zebrad="548be51f"}:sync: zebrad::components::sync: waiting to restart sync timeout=45s Dec 04 06:44:22.974 INFO {zebrad="548be51f"}:sync: zebrad::components::sync: starting sync, obtaining new tips Dec 04 06:44:22.974 INFO {zebrad="548be51f"}:sync:obtain_tips:state: zebra_state::util: created block locator tip_height=Height(1063879) min_locator_height=1063780 locators=[Height(1063879), Height(1063878), Height(1063877), Height(1063875), Height(1063871), Height(1063863), Height(1063847), Height(1063815), Height(1063780)] Dec 04 06:44:22.974 INFO {zebrad="548be51f"}:sync:obtain_tips: zebra_network::peer_set::set: network request with no ready peers: finding more peers, waiting for 2 peers to answer requests The application panicked (crashed). Message: elapsed=55412452; when=55412452 Location: /home/dev/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.3.5/src/time/driver/wheel/mod.rs:291 ```

SpanTrace:

   0: zebrad::components::inbound::downloads::download_and_verify
           with hash=0000000000a3a91dbb56391a0a60f3ea127bb9a99b1e9d9eb83c967b4efe2c17
             at zebrad/src/components/inbound/downloads.rs:107
   1: zebrad::components::inbound::inbound
             at zebrad/src/components/inbound.rs:141
   2: zebra_network::peer::connection::msg_as_req
           with msg=inv
             at zebra-network/src/peer/connection.rs:564
   3: zebra_network::peer::handshake::peer
           with addr=[2400:8902::f03c:91ff:fe1c:1859]:40211
             at zebra-network/src/peer/handshake.rs:198
   4: zebrad::commands::
           with zebrad="548be51f"
             at zebrad/src/commands.rs:56
@teor2345 teor2345 added C-bug Category: This is a bug S-needs-triage Status: A bug report needs triage labels Dec 4, 2020
@teor2345 teor2345 added this to the First Alpha Release milestone Dec 4, 2020
@teor2345
Copy link
Collaborator Author

teor2345 commented Dec 4, 2020

I just saw this panic again about 20 minutes after sync reached the tip.

Based on my testing of #1453, it looks like tokio-rs/tokio#3080 (or a similar change) made these panics a lot more frequent for us.

@teor2345 teor2345 reopened this Dec 6, 2020
@teor2345
Copy link
Collaborator Author

teor2345 commented Dec 6, 2020

This bug also occurs on tokio 0.3.4, but a lot less frequently on my machine (20 minutes vs 72 hours).

@teor2345 teor2345 removed the S-needs-triage Status: A bug report needs triage label Dec 7, 2020
bdonlan pushed a commit to bdonlan/tokio that referenced this issue Dec 7, 2020
Dropping a timer on the millisecond that it was scheduled for, when it was on
the pending list, could result in a panic previously, as we did not record the
pending-list state in cached_when.

Hopefully fixes: ZcashFoundation/zebra#1452
@carllerche
Copy link

We have a potential fix here: tokio-rs/tokio#3229 (review) Are you able to check this PR to see if it fixes your case?

carllerche pushed a commit to tokio-rs/tokio that referenced this issue Dec 9, 2020
Dropping a timer on the millisecond that it was scheduled for, when it was on
the pending list, could result in a panic previously, as we did not record the
pending-list state in cached_when.

Hopefully fixes: ZcashFoundation/zebra#1452
teor2345 added a commit to teor2345/zebra that referenced this issue Dec 14, 2020
Updates the tokio dependency to the commit that merged
tokio-rs/tokio#3229, which should fix the time wheel panic in ZcashFoundation#1452.
@teor2345
Copy link
Collaborator Author

We have a potential fix here: tokio-rs/tokio#3229 (review) Are you able to check this PR to see if it fixes your case?

I just opened #1511, and I'm testing it now.

We'll have a pretty good idea if it's better than tokio 0.3.5 after a few hours, and we'll know if it's better than tokio 0.3.4 after a few days. (Assuming there aren't any other panics in Zebra!)

dconnolly pushed a commit that referenced this issue Dec 14, 2020
Updates the tokio dependency to the commit that merged
tokio-rs/tokio#3229, which should fix the time wheel panic in #1452.
@teor2345
Copy link
Collaborator Author

I've been running zebrad for about a week, and I haven't seen any of these panics. So it looks like they're fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug
Projects
None yet
3 participants