Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Reliablity Guarantees section in README #403

Merged
merged 4 commits into from
Mar 29, 2018
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ yet ready for production applications._
1. [Comparison](#comparison)
1. [Architecture](#architecture)
1. [System Overview](#system-overview)
1. [Reliability Guarantees](#reliability-guarantees)
1. [Clustering](#clustering)
1. [Background](#background)

Expand Down Expand Up @@ -842,6 +843,18 @@ directly.
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```

## Reliability Guarantees

### Events are not durable

The event received by Event Gateway is stored only in memory, it's not persisted to disk during processing. This means that in case of hardware failure or software crash the event may not be delivered to the subscriber. For a synchronous subscription (`http` or `invoke` event) it can manifest by error message returned to the requester. If there is multiple subscribes to the same custom event type, in case of failure the event may not be delivered to all of them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the error code and error message returned. Are they the same under all failure conditions? Are these error codes and messages documented as such in our API documentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is hardware failure or software crash nothing is returned from EG because of the obvious reason. What I meant here was that client will get some kind of error which depends on the setup e.g. for locally hosted EG it will be simple connection error, for self hosted EG behind AWS ELB there will be error from ELB etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but we should have the codes and messages clearly documented - Is that in the API docs?


### Events are delivered _at most once_

Event Gateway tries to deliver an event only once, there is no retry mechanism. This and lack of durability implicates that event will be delivered to the subscriber _at most once_. Even though Event Gateway itself doesn't retry failed function invocations, AWS SDK used by few providers does that internally by default. The retry logic there happens only for very specific reasons and should not cause delivering the same event multiple times. Please find more information in [AWS documentation on API retries](https://docs.aws.amazon.com/general/latest/gr/api-retries.html).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would suggest rephrasing

"Event Gateway attempts delivery fulfillment for an event only once and consequently any event received successfully by the event gateway is guaranteed to be received by the subscriber at most once. That said, the nature of Event Gateway provider implementation could result in retries under specific circumstances, but these should not cause delivering the same event multiple times. For example, Providers for AWS Services that use the AWS SDK are subject to auto retry logic thats built into the SDK AWS documentation on API retries."


AWS Lambda provider uses `RequestResponse` invocation type which means that retry logic for asynchronous AWS events doesn't apply here. Among others it means, that failed deliveries of custom events are not sent to DLQ. Please find more information in [Understanding Retry Behavior](https://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html), "Synchronous invocation" section.

## Background

SOA came along with a new set of challenges. In monolithic architectures, it was simple to call a built-in library or
Expand Down