Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create new cursor file if existing one is corrupt #2

Open
juni-b-queer opened this issue Jul 11, 2024 · 2 comments
Open

Create new cursor file if existing one is corrupt #2

juni-b-queer opened this issue Jul 11, 2024 · 2 comments

Comments

@juni-b-queer
Copy link

I've had a number of times where, for some reason, my jetstream container stops abruptly without saving the cursor file properly. So when it restarts, the cursor file exists, but it's empty/corrupt, so jetstream throws an error when trying to start its subscription. This prevents the container from coming back online without manual intervention (removing the cursor file).

I'll likely implement this is my fork, but it would be helpful if

  1. Every so often, the current valid cursor file is backed up to a separate file
  2. If jetstream starts and the cursor file is corrupt, attempt to use the latest backup or create a new one. Don't block jetstream from starting

I'm going to try to implement this, but I have very very little go experience.

@juni-b-queer
Copy link
Author

I've implemented this in my fork: https://github.com/juni-b-queer/jetstream

With these updates,

  • it will create a cursor backup file every 5 minutes
  • when a backup is created, it will check how many backups exist and delete the oldest one to have a max of 10 backups
  • if jestream starts and the cursor file is corrupt or not present, it will attempt to read from the latest backup files until one loads successfully, but if none load successfully, it will delete the backups and cursor file and start from live (instead of shutting down and failing)

@ericvolp12
Copy link
Owner

This is neat, I'm glad to see people getting some use out of Jetstream and am sorry you're running into these issues.

TBH I think I'd rather figure out why/how the cursor file is being corrupted when saving and fix that rather than add a cursor backup feature at the moment. If possible I'd want to keep the complexity of the service down and I think backups feel like they address the symptoms of a reliability problem and it'd be neat to fix the reliability problem at the root.

I'll dig into it a bit and see if I can find anything but if you've got any logs from an improper shutdown or anything like that, I'd be interested to see if the cursor manager logs a failure or how the program might be getting killed in such a way that it can't take the few milliseconds it needs to shutdown safely.

If it's being interrupted mid-write it makes me thing somehow we're not waiting properly for the program to exit or something is hard killing the process while it's in the process of shutting down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants