Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replay gets stuck on some ships #717

Open
dosullivan opened this issue Aug 28, 2024 · 3 comments
Open

Replay gets stuck on some ships #717

dosullivan opened this issue Aug 28, 2024 · 3 comments

Comments

@dosullivan
Copy link
Collaborator

dosullivan commented Aug 28, 2024

I've seen a handful of ships that get ship on replay when starting up, and then never catch up. Their log output looks like this:

> urbit sampel-palnet
~
urbit 3.1
boot: home is /tmp/sampel-palnet
disk: loaded epoch 0i3624838
loom: mapped 2048MB
boot: protected loom
live: mapped: MB/669.286.400
live: loaded: KB/16.384
boot: installed 972 jets
---------------- playback starting ----------------
play: events 4295664-4295665

It will stay there for hours and never finish.

@pkova
Copy link
Collaborator

pkova commented Aug 28, 2024

Could you test replaying these ships with vere 3.0 just so we see whether that makes a difference. Replay was changed to happen in a subprocess in 3.1.

@dosullivan
Copy link
Collaborator Author

It's the same on vere 3.0. The cpu is stuck at 100% when this happens, and it remains stuck on that particular event.

@dosullivan
Copy link
Collaborator Author

Here's the info output:

loom: mapped 2048MB
boot: protected loom
live: mapped: MB/669.286.400
live: loaded: KB/16.384
boot: installed 972 jets
disk: loaded epoch 0i3624838

urbit: sigsed-pasfus at event 4295663
  disk: live=&, event=4295665

epocs:
  0i3161006
  0i3624838

lmdb info:
  map size: 1099511627776
  page size: 4096
  max pages: 268435456
  number of pages used: 39090
  last transaction ID: 669867
  max readers: 126
  number of readers used: 0
  file size (page): 160112640
  file size (stat): 160112640

It's like there's one problematic event. If I replay up to the event before, it's fine, but if I try to just play to that event itself, it hangs:

urbit play -n 4295663 sampel-palnet
disk: loaded epoch 0i3624838
loom: mapped 2048MB
boot: protected loom
live: mapped: MB/669.286.400
live: loaded: KB/16.384
boot: installed 972 jets
mars: already computed 4295663
      state=4295663, log=4295665
disk: snapshot (event 4295663) is out of date
      (latest event is 4295665
start/shutdown your pier gracefully first

@ # urbit play -n 4295664 sampel-palnet
disk: loaded epoch 0i3624838
loom: mapped 2048MB
boot: protected loom
live: mapped: MB/669.286.400
live: loaded: KB/16.384
boot: installed 972 jets
---------------- playback starting ----------------
play: event 4295665

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants