background_jobs crashes makes other instances not receive federation updates #1820

dessalines · 2021-10-06T18:51:47Z

This seems to happen on lemmy intemittently, sometimes after a week or so of running fine. background_jobs will stop showing up in the logs.

cc @asonix @Nutomic

This issue happened again, twice in one day. A restart seemed to fix it again.

edit: I'm tailing the log right now:

sudo docker-compose logs -f --tail=1000 lemmy | grep activity_queue
lemmy_1     | [2021-10-06T23:13:25Z INFO  lemmy_apub::activity_queue] Sending activity https://lemmy.ml/activities/dislike/d1dbd77f-8e84-4367-ac94-92c3057f6681
lemmy_1     | [2021-10-06T23:13:29Z INFO  lemmy_apub::activity_queue] Sending activity https://lemmy.ml/activities/undo/8f646974-c257-4f01-aac7-24c3fdb765f2
lemmy_1     | [2021-10-06T23:14:52Z INFO  lemmy_apub::activity_queue] Sending activity https://lemmy.ml/activities/dislike/b6e8b8af-a6e0-484b-aac3-4496b8fe02bb
lemmy_1     | [2021-10-06T23:15:13Z INFO  lemmy_apub::activity_queue] Sending activity https://lemmy.ml/activities/undo/2846bd5c-a041-4b23-92a6-e7dcc93416be

The text was updated successfully, but these errors were encountered:

asonix · 2021-10-07T01:38:12Z

The only thing I can think of is the arbiter the jobs are running in is dying for some reason. You could maybe get around this by running the jobs on their own arbiter:

let arbiter = actix_web::rt::Arbiter::new();

WorkerConfig::new(|| MyState {
  client: Client::default(),
})
.register::<SendActivityTask>()
.start_in_arbiter(&arbiter, queue_handle.clone());

asonix · 2021-10-07T01:40:34Z

There's also a warn level log that emits when a worker closes cleanly: Worker {id} closing
I imagine these aren't closing cleanly, though.

asonix · 2021-10-07T02:23:20Z

I've just published a new version of background-jobs-actix which should warn when workers drop regardless of cleanliness, so you can try updating to that version to catch when they drop

background-jobs-actix = "0.9.3"

you should be able to pull it in automatically with cargo update

dessalines · 2021-10-07T14:45:09Z

Thx, I'll try to get new builds out for this today, including the arbiter fix. I'm not sure what's causing the crash, but unfortunately there's nothing in the logs. It crashed sometime last night, and new actions aren't ending up in the activity queue.

lemmy_1     | [2021-10-07T01:53:48Z WARN  lemmy_apub::activity_queue] error sending request for url (https://forum.purplerabbit.xyz/inbox): error trying to connect: tcp connect error: Operation timed out (os error 110)
lemmy_1     | [2021-10-07T01:54:02Z INFO  lemmy_apub::activity_queue] Sending activity https://lemmy.ml/activities/announce/4181cbf5-469c-431f-9372-5f0df07ef2db

dessalines · 2021-10-07T15:41:46Z

@asonix background-jobs and background-jobs-actix are still at 0.10.0: https://crates.io/crates/background-jobs/versions

Let me know when you get those updated, or if I can go to a git version.

dessalines · 2021-10-07T15:47:00Z

nm, I downgraded to 0.9.0, and that picked up the new one. Running checks now.

dessalines · 2021-10-07T17:08:58Z

Okay I deployed 0.13.1 on lemmy.ml with the arbiter fix and extra logging above. I'll keep tailing the log to see if there's any problems.

asonix · 2021-10-07T17:11:56Z

yeah 0.10 switched log out for tracing so i figured until y'all move your logging to tracing as well I'll release fixes for 0.9

dessalines · 2021-10-08T19:24:42Z

The arbiter seems to do it, I'll re-open if we encounter this again.

dessalines · 2021-10-29T18:35:27Z

This bug cropped up again after 3 weeks of running fine. I didn't run the correct log check to make sure its good, but its :

sudo docker-compose logs -f --tail=100 lemmy | grep background_jobs

I'll make sure to check that next time we have another issue.

Nutomic · 2021-11-01T11:09:19Z

Theres a new version of background-jobs which we should upgrade to.

kromonos · 2021-11-17T07:22:51Z

Can you please take a look again? Looks like my instance fapsi.be doesn't get any updates from lemmy.ml at the moment.

dessalines · 2021-11-17T13:56:31Z

Seems like it crashed yesterday, with nothing in the logs again:

sudo docker logs --since 24h lemmyml_lemmy_1 2>&1 | grep background_jobs

These messages are pretty common, but its the last one:

[2021-11-16T19:56:39Z INFO  background_jobs_core::processor_map] Job 395c85d9-139a-421a-a32c-cb8fbcfff6f3 SendActivityTask completed 0.361524
[2021-11-16T19:56:39Z INFO  background_jobs_core::processor_map] Job 32abbf6d-e034-495d-8000-888d1b915277 SendActivityTask errored Error performing job: Failed to send activity {"actor":["https://lemmy.ml/c/linux"],"to":["https://www.w3.org/ns/activitystreams#Public"],"object":{"actor":["https://lemmy.ml/u/VonMax"],"to":["https://www.w3.org/ns/activitystreams#Public"],"object":["https://lemmy.ml/comment/92138"],"cc":[["https://lemmy.ml/c/linux"]],"type":"Like","id":"https://lemmy.ml/activities/like/a843fb84-b192-4e40-803d-cab8aa4ca1ff","@context":["https://www.w3.org/ns/activitystreams",{"comments_enabled":{"type":"sc:Boolean","id":"pt:commentsEnabled"},"moderators":"as:moderators","matrixUserId":{"type":"sc:Text","id":"as:alsoKnownAs"},"stickied":"as:stickied","sc":"http://schema.org#","sensitive":"as:sensitive","pt":"https://join-lemmy.org#"},"https://w3id.org/security/v1"]},"cc":["https://lemmy.ml/c/linux/followers"],"type":"Announce","id":"https://lemmy.ml/activities/announce/0ebc2f61-3916-4442-9505-8b68817d2fa7","@context":["https://www.w3.org/ns/activitystreams",{"stickied":"as:stickied","sc":"http://schema.org#","sensitive":"as:sensitive","moderators":"as:moderators","pt":"https://join-lemmy.org#","comments_enabled":{"type":"sc:Boolean","id":"pt:commentsEnabled"},"matrixUserId":{"type":"sc:Text","id":"as:alsoKnownAs"}},"https://w3id.org/security/v1"]} to https://forum.purplerabbit.xyz/inbox 0.012036

I've saved the entire log now, and restarted lemmy.

Okay I've searched the log for a lot of different terms, and unfortunately it seems that background_jobs crashes without an error message. @asonix

kromonos · 2021-11-17T14:04:29Z

Can confirm, that the federation is working again.

dessalines · 2021-11-17T14:05:08Z

Sweet, sorry about this again.

asonix · 2021-11-17T17:27:34Z

@dessalines are you sure there's no background jobs log about the Ticker stopping? if there's not, then this doesn't look like a crash, it just looks like it stops doing anything

dessalines · 2021-11-17T17:47:39Z

No ticker or Ticker in the logs

asonix · 2021-11-17T18:01:18Z

@dessalines does it look like there could be jobs that started but didn't finish? Probably not since the last log line involves a job finishing but

kromonos · 2021-12-10T17:04:27Z

Is it possible, that this bug hit lemmy.ml again?
Lemmy.ml: https://lemmy.ml/post/110385
Same on fapsi.be: https://fapsi.be/post/13991

asonix · 2021-12-10T17:32:16Z

I wonder if this is related:

I think maybe a bit more logging is in order to confirm where the problem lies, though

dessalines · 2021-12-10T20:47:22Z

@kromonos k I just did a lemmy restart, it temporarily fixed it. I'll re-add restarts to our nightly cron job until we can figure out why this keeps happening.

Nutomic · 2022-06-21T11:34:53Z

I think this problem was caused by too low worker count and wrong implementation which meant that failed activity sends would not be retried. At least i havent heard of any similar problems since fixing those.

dessalines added the bug Something isn't working label Oct 6, 2021

dessalines added a commit that referenced this issue Oct 7, 2021

Trying a background_jobs fix. #1820

f001193

dessalines added a commit that referenced this issue Oct 8, 2021

Trying a background_jobs fix. #1820

66b51ae

Nutomic pushed a commit that referenced this issue Oct 8, 2021

Trying a background_jobs fix. #1820 (#1822)

53a2b6d

dessalines closed this as completed Oct 8, 2021

dessalines reopened this Oct 29, 2021

dessalines mentioned this issue Oct 29, 2021

Broken federation? #1869

Closed

dessalines added a commit that referenced this issue Nov 2, 2021

Upgrade background_jobs to 0.9.1 #1820

9bd3c95

Nutomic pushed a commit that referenced this issue Nov 2, 2021

Upgrade background_jobs to 0.9.1 #1820 (#1875)

d475304

dessalines added a commit that referenced this issue Nov 17, 2021

Upgrading background-jobs-core and actix. #1820

4166f30

Nutomic closed this as completed Jun 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

background_jobs crashes makes other instances not receive federation updates #1820

background_jobs crashes makes other instances not receive federation updates #1820

dessalines commented Oct 6, 2021 •

edited

Loading

asonix commented Oct 7, 2021 •

edited

Loading

asonix commented Oct 7, 2021

asonix commented Oct 7, 2021 •

edited

Loading

dessalines commented Oct 7, 2021

dessalines commented Oct 7, 2021 •

edited

Loading

dessalines commented Oct 7, 2021

dessalines commented Oct 7, 2021

asonix commented Oct 7, 2021

dessalines commented Oct 8, 2021

dessalines commented Oct 29, 2021

Nutomic commented Nov 1, 2021

kromonos commented Nov 17, 2021

dessalines commented Nov 17, 2021 •

edited

Loading

kromonos commented Nov 17, 2021

dessalines commented Nov 17, 2021

asonix commented Nov 17, 2021

dessalines commented Nov 17, 2021

asonix commented Nov 17, 2021

kromonos commented Dec 10, 2021

asonix commented Dec 10, 2021

dessalines commented Dec 10, 2021

Nutomic commented Jun 21, 2022

background_jobs crashes makes other instances not receive federation updates #1820

background_jobs crashes makes other instances not receive federation updates #1820

Comments

dessalines commented Oct 6, 2021 • edited Loading

asonix commented Oct 7, 2021 • edited Loading

asonix commented Oct 7, 2021

asonix commented Oct 7, 2021 • edited Loading

dessalines commented Oct 7, 2021

dessalines commented Oct 7, 2021 • edited Loading

dessalines commented Oct 7, 2021

dessalines commented Oct 7, 2021

asonix commented Oct 7, 2021

dessalines commented Oct 8, 2021

dessalines commented Oct 29, 2021

Nutomic commented Nov 1, 2021

kromonos commented Nov 17, 2021

dessalines commented Nov 17, 2021 • edited Loading

kromonos commented Nov 17, 2021

dessalines commented Nov 17, 2021

asonix commented Nov 17, 2021

dessalines commented Nov 17, 2021

asonix commented Nov 17, 2021

kromonos commented Dec 10, 2021

asonix commented Dec 10, 2021

dessalines commented Dec 10, 2021

Nutomic commented Jun 21, 2022

dessalines commented Oct 6, 2021 •

edited

Loading

asonix commented Oct 7, 2021 •

edited

Loading

asonix commented Oct 7, 2021 •

edited

Loading

dessalines commented Oct 7, 2021 •

edited

Loading

dessalines commented Nov 17, 2021 •

edited

Loading