Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backend): manage webhook event lifecycle in webhook service #228

Merged
merged 15 commits into from
Feb 17, 2022

Conversation

wilsonianb
Copy link
Contributor

@wilsonianb wilsonianb commented Feb 7, 2022

Changes proposed in this pull request

  • Consolidate webhook event lifecycle management within webhook service. Failed delivery attempts are retried with a backoff.
  • Add depositEventLiquidity and withdrawEventLiquidity mutation resolvers for updating account liquidity based on webhook events. (Partially reverts fix(backend): manage payment liquidity on webhook response #214) Re-add OutgoingPaymentService.fund, but not requote or cancel yet since those may end up in the Open Payments API.
  • Add optional onCredit function to LiquidityAccount interface for triggering balance based webhook events. (This will be used for Web monetization liquidity withdrawal #220).

Context

Checklist

  • Related issues linked using fixes #number
  • Tests added/updated
  • Documentation added
  • Make sure that all checks pass

@github-actions github-actions bot added pkg: backend Changes in the backend package. type: source Changes business logic type: tests Testing related labels Feb 7, 2022
@wilsonianb wilsonianb marked this pull request as ready for review February 7, 2022 17:35
Comment on lines 47 to 63
return await Invoice.transaction(async (trx) => {
await this.$query(trx).patch({
active: false
})
const event = await InvoiceEvent.query(trx).insertAndFetch({
type: InvoiceEventType.InvoicePaid,
data: this.toData(balance),
// TODO:
// Add 30 seconds to allow a prepared (but not yet fulfilled/rejected) packet to finish before being deactivated.
processAt: new Date()
})
const error = await createWithdrawal({
id: event.id,
account: this,
amount: balance,
timeout: BigInt(RETRY_LIMIT_MS) * BigInt(1e6) // ms -> ns
})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is, there's the possibility of the withdrawal transfer being created and the postgres transaction commit failing.
In which case, you'd need onCredit to be called again after the withdrawal times out or count on the balance to be withdrawn when the invoice expires. But there's a chance the invoice expiration handling happens before this withdrawal times out...
Alternatives include:

  • the webhook service creates withdrawals on the first attempt to send the event
  • every service with liquidity balance triggered webhooks (including asset, peer, open payment accounts) has workers handling accounts with recently updated balances, like outgoing payment and invoice services do/did.

@wilsonianb
Copy link
Contributor Author

👆 I forgot to actually specify the withdrawal details when creating invoice and outgoing payments webhook events.
I'm also working on addressing this TODO:
https://github.com/interledger/rafiki/pull/228/files#diff-433531f3fc5977a5508f223cd70f14153302a878af7f9c2896901392360617aeR50-R52

If invoice event withdrawal hasn't been created, update the amount.
Otherwise, create new invoice event for subsequent received amounts.
@@ -87,47 +87,45 @@ An expired invoice that has never received money is deleted.

Rafiki sends webhook events to notify the wallet of payment lifecycle states that require liquidity to be added or removed.

Webhook event handlers must be idempotent.
Webhook event handlers must be idempotent and return `200`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "return 200 on success"? What happens if they don't return 200; is the webhook retried by Rafiki?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and yes

packages/backend/src/accounting/service.ts Outdated Show resolved Hide resolved
@@ -55,9 +58,15 @@ export interface Withdrawal extends Deposit {
timeout: bigint
}

export interface TransferAccount extends LiquidityAccount {
asset: LiquidityAccount['asset'] & {
asset: LiquidityAccount['asset']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for the nested asset property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needed for createTransfer to populate source/destination accounts an the asset liquidity account in order to call onCredit/onDebit with a LiquidityAccount.
https://github.com/interledger/rafiki/pull/228/files#diff-72a5530c2c89f105f75e071479a382e8f8408459d37968fb93b546cca4783b31R233-R251
(It happens to also prevent createTransfer from being called with an asset liquidity account.)
That said we're not currently calling onCredit/onDebit for asset liquidity accounts...

if (account.onDebit) {
const balance = await getAccountBalance(deps, account.id)
assert.ok(balance !== undefined)
await account.onDebit(balance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is onDebit implemented? I grepped the code and didn't find it. Or is it just for symmetry with onCredit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will likely be used for asset and peer liquidity webhooks, but is currently unused and could be removed.

Comment on lines 337 to 341
for (const account of destinationAccounts) {
if (account.onCredit) {
const balance = await getAccountBalance(deps, account.id)
assert.ok(balance !== undefined)
await account.onCredit(balance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a getAccountBalance or onCredit fails?

Right now:

  • commit's promise will reject.
  • Any other onCredit functions later in the loop were never executed.

Is that safe here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

07661cf makes this only call onCredit for createTransfer's single destinationAccount.
@matdehaast mentioned elsewhere that we don't need balance-triggered liquidity webhook events for assets or peers in the short term. This should suffice for invoice and web monetization events for now.

table.timestamp('createdAt').defaultTo(knex.fn.now())
table.timestamp('updatedAt').defaultTo(knex.fn.now())

table.index(['accountId', 'createdAt', 'id'])

table.index('processAt')
table.index('expiresAt')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In packages/backend/src/open_payments/invoice/service.ts, processNextInvoice queries the next "job" via expiresAt < now AND active=true. I think this index should be table.index(['expiresAt', 'active']) so that it doesn't need to scan all expired invoices to find the active ones.

And eventually the active=true part could be a partial index.

Remove onDebit.
Remove TransferAccount.
Update invoices index.
Update transaction-api.md docs.
@wilsonianb
Copy link
Contributor Author

@matdehaast
@sentientwaffle and I were discussing these changes and wondering about the following:

  • We're thinking the wallet should be able to query and handle undelivered webhook events. But if querying events is possible, does Rafiki need to bother retrying delivering webhook events at all? Does it depends on how promptly the wallet wants to withdraw liquidity?
  • I'm imagining some kind of bulk operation could be used for time-based web monetization balance withdraws (Web monetization liquidity withdrawal #220). If prompt withdrawals weren't a priority, could something similar be used to handle leftover payment account balances? (in order to have simpler webhook management)
  • In order to avoid handling subsequent received amounts for invoices or web monetization after a withdrawal amount has been baked into a webhook event's data, could webhook events contain the latest cumulative received amount which the wallet would pass to Rafiki's withdrawEventLiquidity?

@matdehaast
Copy link
Collaborator

@wilsonianb so interesting you bring it up. When I was first looking at the webhook stuff there were proposals by some people that actually advocate for a pull based model rather than pushed based. There is an interesting HN thread on it.

From our last call as I understood, a successful 200 would withdraw/add liquidity to the webhook. I haven't reviewed this PR yet so not sure if that has changed. How does that work in a query based model? ie if I query the event and get the webhooks, is that just assumed to be processed? If we are sticking to auto liquidity with a webhook, we can't do events IMO.

I'm imagining some kind of bulk operation could be used for time-based web monetization balance withdraws (#220). If prompt withdrawals weren't a priority, could something similar be used to handle leftover payment account balances? (in order to have simpler webhook management)

Can you elaborate more here? ie, who does the bulk operation and how is it triggered?

@wilsonianb
Copy link
Contributor Author

wilsonianb commented Feb 11, 2022

From our last call as I understood, a successful 200 would withdraw/add liquidity to the webhook. I haven't reviewed this PR yet so not sure if that has changed. How does that work in a query based model? ie if I query the event and get the webhooks, is that just assumed to be processed? If we are sticking to auto liquidity with a webhook, we can't do events IMO.

This pr changed it so Rafiki just retries sending events if there isn't a successful 200 response. The wallet is expected call withdrawEventLiquidity mutation resolver after receiving a withdrawal webhook event. So supporting events queries is on the table.

Can you elaborate more here? ie, who does the bulk operation and how is it triggered?

For time-based web monetization withdrawals, either the wallet or a (kubernetes) cron or Rafiki internally could trigger a query in Rafiki for accounts with a web monetization balance (in both postgres and tigerbeetle). That could either produce one event per account or we could support the wallet taking the result of that query and having Rafiki withdraw from all of those balances at once.

Looking at that HN thread, I'm leaning towards one attempt to send a webhook event and support event querying. However, it seems like querying for events could actually just be querying for accounts, invoices, outgoing payments with outstanding balances to be withdrawn (as opposed to Rafiki storing point in time events in postgres). (@sentientwaffle suggested that we could store hints on postgres rows about whether there's a non-empty tigerbeetle balance to lookup.)
I had the thought the query response could be centered on accounts, as in here's a list of accounts with their invoices and/or web monetization and/or outgoing payments that have balances to be withdrawn.

Comment on lines +299 to +302
.catch((err) => {
this.logger.warn({ error: err.message }, 'processWebhook error')
return true
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a single one fails, does this whole process crash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think returning true in the .catch causes the .then 👇 to be called

@@ -70,6 +72,8 @@ export const resolvers: Resolvers = {
createPeerLiquidityWithdrawal: createPeerLiquidityWithdrawal,
createAccountWithdrawal,
finalizeLiquidityWithdrawal: finalizeLiquidityWithdrawal,
rollbackLiquidityWithdrawal: rollbackLiquidityWithdrawal
rollbackLiquidityWithdrawal: rollbackLiquidityWithdrawal,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the equivalent of rejecting the 2 phase commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it rejects a withdrawal transfer from createAssetLiquidityWithdrawal/createPeerLiquidityWithdrawal/createAccountWithdrawal

outcome: {
amountSent: amountSent.toString()
},
balance: balance.toString()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the balance remaining in the payment account?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️ yes

@matdehaast
Copy link
Collaborator

This pr changed it so Rafiki just retries sending events if there isn't a successful 200 response. The wallet is expected call withdrawEventLiquidity mutation resolver after receiving a withdrawal webhook event. So supporting events queries is on the table.

Is it expected they call that whilst processing the event? Or can that be called after the 200?

For time-based web monetization withdrawals, either the wallet or a (kubernetes) cron or Rafiki internally could trigger a query in Rafiki for accounts with a web monetization balance (in both postgres and tigerbeetle). That could either produce one event per account or we could support the wallet taking the result of that query and having Rafiki withdraw from all of those balances at once.

Ok so we should keep this internal if we do go this way so its agnostic to how people are running Rafiki, ie more self-contained. Ideally they handle each one atomically individually as doing bulk operations like that can be difficult for ledgers and replayability etc.

Looking at that HN thread, I'm leaning towards one attempt to send a webhook event and support event querying. However, it seems like querying for events could actually just be querying for accounts, invoices, outgoing payments with outstanding balances to be withdrawn (as opposed to Rafiki storing point in time events in postgres). (@sentientwaffle suggested that we could store hints on postgres rows about whether there's a non-empty tigerbeetle balance to lookup.)
I had the thought the query response could be centered on accounts, as in here's a list of accounts with their invoices and/or web monetization and/or outgoing payments that have balances to be withdrawn.

Single attempt with query support seems good. We just need to make sure the client knows where they are and if they need to go a query to catchup. How would that mechanism work to sync it? Seems tricky to write code to handle that nicely.

(@sentientwaffle suggested that we could store hints on postgres rows about whether there's a non-empty tigerbeetle balance to lookup.)
I had the thought the query response could be centered on accounts, as in here's a list of accounts with their invoices and/or web monetization and/or outgoing payments that have balances to be withdrawn.

Again I'm trying to think how we handle this from the client side consuming this. It feels like you can easily make a mistake known if an account needs to be withdrawn/deposit liquidity based if its completed incoming payment or failed outgoing payment etc. One nice thing about events is it makes it easier to switch and do logic against those cases.

Happy to discuss some more

@sentientwaffle
Copy link
Contributor

Is it expected they call that whilst processing the event? Or can that be called after the 200?

It shouldn't make a difference from Rafiki's point of view. The "event" (invoice update) isn't considered "delivered" until the balance has been withdrawn. Though as mentioned later, Rafiki only makes one push attempt, after which it assumes the invoice will be picked up by periodic queries from the wallet.

Single attempt with query support seems good. We just need to make sure the client knows where they are and if they need to go a query to catchup. How would that mechanism work to sync it? Seems tricky to write code to handle that nicely.

I think the client (wallet) should just make catchup queries constantly in the background (i.e. every few seconds or minutes, depending on how much delay in withdrawal is acceptable).

I'm not quite sure what you refer to by mechanism .. to sync it. I'm imagining a flow like this:

  1. One of the following occurs:
    • Rafiki sends an invoice I₁ to the wallet.
    • The wallet queries a list of invoices with money to withdraw, which includes I₁.
  2. The Rafiki invoice includes the total amount (I₁.amount) that the invoice has received (including any money that has already been withdrawn — this is key to preserve idempotency).
  3. The wallet executes a transaction in its own database:
    • Select the invoice record I₂ with the id of I₁ (from the wallet's database, not Rafiki's).
    • Let amountToWithdraw = I₁.amount - I₂.amount.
    • If amountToWithdraw != 0:
      • Update I₂.amount = I₁.amount.
      • Increase the user's account balance by amountToWithdraw.
    • Commit transaction.
  4. Send a withdrawEventLiquidity mutation to Rafiki with I₁.amount. If amountToWithdraw == 0, then this mutation was already sent, but must have failed. Since all amounts are cumulative, it can be safely (idempotently) retried.
  5. Rafiki transfers money from the invoice's account in TB such that its new balance is exactly totalAmountReceivedByInvoice - I₁.amount.
  6. Rafiki updates the invoice in Postgres to mark whether there is or isn't any money left to withdraw.

@wilsonianb
Copy link
Contributor Author

The wallet queries a list of invoices with money to withdraw, which includes I₁.

Should Rafiki support one general events query or several payment type specific queries for invoices or outgoingPayments (etc) with liquidity ready for withdrawal?
Which would result in either withdrawEventLiquidity or withdrawInvoiceLiquidity, withdrawOutgoingPaymentLiquidity, etc.

@wilsonianb
Copy link
Contributor Author

👆 We discussed that a single events query would be easier to use for the wallet.

@wilsonianb
Copy link
Contributor Author

@sentientwaffle @matdehaast
I've tried to reduce the changes in this pr to address the original linked issues, and have opened a separate issue for querying events:

@@ -52,7 +54,7 @@ export interface Deposit {
}

export interface Withdrawal extends Deposit {
timeout: bigint
timeout?: bigint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and other) timeout parameters should be named with the unit; e.g. timeoutNano, timeoutMilli.
JS timeouts are typically milliseconds (e.g. setTimeout) but TigerBeetle uses nanoseconds. Since we need to use both (in different contexts), I think its worth disambiguating them in the property name so that there's no confusion.

Copy link
Contributor Author

@wilsonianb wilsonianb Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we kept the name timeout but keep it consistent as milliseconds and convert to nano-seconds where passing to tigerbeetle-node?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 That works.


expect(response.success).toBe(true)
expect(response.code).toEqual('200')
expect(response.error).toBeNull()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verify that the deposit was created?

sentientwaffle
sentientwaffle previously approved these changes Feb 16, 2022
@wilsonianb wilsonianb merged commit 8efb70b into main Feb 17, 2022
@wilsonianb wilsonianb deleted the bw-webhook-events branch February 17, 2022 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: backend Changes in the backend package. type: source Changes business logic type: tests Testing related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consolidate webhook lifecycle management Duplicate transfer id on webhook retries
3 participants