You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
delivery:
backoffDelay: PT1S
backoffPolicy: linear
retry: 60
Sending random events and randomly deleting mt-broker-filter pods (to simulate failover on node-evictions or upgrades),
events are sometimes not delivered and retries are not attempted, with errors in the inmemorychannel-dispatcher logs like:
{"level":"error","ts":"2020-11-02T20:17:43.621Z","logger":"inmemorychannel-dispatcher","caller":"fanout/fanout_message_handler.go:189","msg":"Fanout had an error","error":"unable to complete request to http://broker-filter.knative-eventing.svc.cluster.local/triggers/imc-broker-retries-4/receiver-0/8fedd9ad-e54f-49fe-a4ac-116bc4fcb891: Post \"http://broker-filter.knative-eventing.svc.cluster.local/triggers/imc-broker-retries-4/receiver-0/8fedd9ad-e54f-49fe-a4ac-116bc4fcb891\": dial tcp 172.30.184.71:80: connect: connection refused","stacktrace":"knative.dev/eventing/pkg/channel/fanout.(*MessageHandler).dispatch\n\t/root/.gvm/pkgsets/go1.14/global/src/knative.dev/eventing/pkg/channel/fanout/fanout_message_handler.go:189\nknative.dev/eventing/pkg/channel/fanout.createMessageReceiverFunction.func1.1\n\t/root/.gvm/pkgsets/go1.14/global/src/knative.dev/eventing/pkg/channel/fanout/fanout_message_handler.go:143"}
{"level":"error","ts":"2020-11-02T18:46:00.131Z","logger":"inmemorychannel-dispatcher","caller":"fanout/fanout_message_handler.go:189","msg":"Fanout had an error","error":"unable to complete request to http://broker-filter.knative-eventing.svc.cluster.local/triggers/imc-broker-retries-2/receiver-0/003b2107-d672-4f01-b07a-f66b2f5c2f5c: Post \"http://broker-filter.knative-eventing.svc.cluster.local/triggers/imc-broker-retries-2/receiver-0/003b2107-d672-4f01-b07a-f66b2f5c2f5c\": EOF","stacktrace":"knative.dev/eventing/pkg/channel/fanout.(*MessageHandler).dispatch\n\t/opt/app-root/src/go/src/knative.dev/eventing/pkg/channel/fanout/fanout_message_handler.go:189\nknative.dev/eventing/pkg/channel/fanout.createMessageReceiverFunction.func1.1\n\t/opt/app-root/src/go/src/knative.dev/eventing/pkg/channel/fanout/fanout_message_handler.go:143"}
The problem is that checkRetry doesn't retry on network errors (and retries only when the HTTP request was successful, see
Expected behavior
connection refused and EOF can commonly happen when the target pod is being restarted due to eviction, node restart or upgrade, Channels/Brokers should retry in these situations when configured to do so.
The current behaviour also defeats the purpose of #4440 , as retrying won't always save events during upgrades.
To Reproduce
Hard to reproduce, seen only in a few of the "broker filter pod delete" events , I'll try to create a concise reproducer...
Knative release version
Reproduced on 0.17.2, but affects any version with the current checkRetry logic.
If a duplicate event is re-sent (e.g. due to a network error) it MAY have the same id. Consumers MAY assume that Events with identical source and id are duplicates.
The text was updated successfully, but these errors were encountered:
Describe the bug
Having a Broker with retries
Sending random events and randomly deleting mt-broker-filter pods (to simulate failover on node-evictions or upgrades),
events are sometimes not delivered and retries are not attempted, with errors in the inmemorychannel-dispatcher logs like:
The problem is that
checkRetry
doesn't retry on network errors (and retries only when the HTTP request was successful, seehttps://github.com/knative/eventing/blob/master/pkg/kncloudevents/message_sender.go#L192-L194
Expected behavior
connection refused and EOF can commonly happen when the target pod is being restarted due to eviction, node restart or upgrade, Channels/Brokers should retry in these situations when configured to do so.
The current behaviour also defeats the purpose of #4440 , as retrying won't always save events during upgrades.
To Reproduce
Hard to reproduce, seen only in a few of the "broker filter pod delete" events , I'll try to create a concise reproducer...
Knative release version
Reproduced on 0.17.2, but affects any version with the current checkRetry logic.
Additional context
Note that cloudevents spec allows resending the same events in case of network errors (see https://github.com/cloudevents/spec/blob/v1.0/spec.md#id ):
The text was updated successfully, but these errors were encountered: