Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle stop of price submission in the oracle gracefully #851

Closed
nud3l opened this issue Jan 13, 2023 · 10 comments · Fixed by #1119
Closed

Handle stop of price submission in the oracle gracefully #851

nud3l opened this issue Jan 13, 2023 · 10 comments · Fixed by #1119
Assignees
Labels
bug Something isn't working

Comments

@nud3l
Copy link
Member

nud3l commented Jan 13, 2023

Describe the bug
When the oracle stops submitting prices for an asset, the parachain goes into an error state ORACLE_OFFLINE. However, there is a valid reason to stop submitting prices for an asset: The asset is delisted from the bridge or lending protocol, so the prices are no longer needed.

Also, it's possible that a single asset could just be removed from our oracle source. Delisting might happen due to issues with the underlying asset, lack of demand, or more.

Expected behavior

  1. When a single oracle asset is not reported anymore, we should not stop the whole bridge and the whole lending market. Instead, the affected assets should be paused.

  2. Delisting assets requires governance. Within the same governance to delist a market or a collateral asset from the bridge, governance should be able to delist the oracle price reporting of an asset without the parachain entering the ORACLE_OFFLINE state.

@nud3l nud3l added the bug Something isn't working label Jan 13, 2023
@nud3l nud3l added this to the Security milestone Jan 13, 2023
@sander2
Copy link
Member

sander2 commented Jan 23, 2023

When a single oracle asset is not reported anymore, we should not stop the whole bridge and the whole lending market. Instead, the affected assets should be paused.

This can most be achieved by just removing the error state altogether. That would let everything continue as normal, except that any calls that do a currency conversion to that collateral will fail. We might also want disable rewards for vault with that collateral. However, what would we want to do when an asset is delisted? Would we liquidate all vaults and borrows? There wouldn't be a straightforward exit path

@nud3l
Copy link
Member Author

nud3l commented Feb 2, 2023

If we remove the error state, how would the downstream components like the UI or squid know that we stopped submitting that asset? Possibly, we need still some state change but instead of Error it would be Error(Asset) so the UI could check that.

For delisting, what other protocols are doing is setting the ceiling to 0. I'm not sure if that would break our code though. In these other protocols, setting the ceiling to 0 allows vaults/borrowers to withdraw their assets but not deposit more collateral. Likewise, the capacity for borrows/bridging for that asset would be 0.

@gregdhill gregdhill modified the milestones: Security, Kintsugi De-Fi Hub Feb 2, 2023
@nud3l
Copy link
Member Author

nud3l commented Feb 3, 2023

  • Only handle oracle going offline for one or more assets in this issue.
  • Move the delisting of assets to separate issue.

For oracle going offline:

  • Assume asset price will be available again after timeout period.

Questions:

  • How would that affect liquidations? If price crashes when oracle is coming online again, vaults/borrowers will get liquidated. Liquidations are not possible in the bridge while oracle is offline. Check how lending protocol handles that, likely liquidations also would not work.
  • Vault stake: should the vault stake be set to 0 while the oracle is offline, i.e., should vaults receive rewards while users cannot bridge from/to that asset?
  • Lending: can users repay their loans?
  • Remove all security states? (Running, Error, Shutdown) -> replaced by tx-pause.
  • Active block number: this is an issue. Active block number is not "asset aware". Either oracle works or it does not. When one asset is not reported anymore, we should give users/vaults more time - not count active blocks - for that asset. But not for the other assets.

Plan of action:

  • Remove OracleOffline error state to not halt the entire bridge when one or more assets are not reported by oracle anymore.

@nud3l nud3l removed this from the Kintsugi De-Fi Hub milestone Feb 3, 2023
@sander2
Copy link
Member

sander2 commented May 12, 2023

  • Active block number: this is an issue. Active block number is not "asset aware". Either oracle works or it does not. When one asset is not reported anymore, we should give users/vaults more time - not count active blocks - for that asset. But not for the other assets.

So let's just make ActiveBlockNumber a map of CurrencyId -> ActiveBlockNumber. In the on_initialize hook, we can just iterate over the currencies and increment those that are still active. The on_initialize hook can return a dynamic weight so that shouldn't be a problem. We could also optimize this in the future by only recording a (ActiveBlockNumber, BlockNumber, IsOnline) tuple when there is a state change between [online, offline]. The client can then add (currentBockNumber - recordedBlockNumber) to ActiveBlockNumber to determine the current value. But that's premature optimization, let's stick with the loop in on_initialize for now.

@gregdhill
Copy link
Member

Seems like a substantial escalation of complexity but I haven't reviewed the problem in-depth to understand if there is an alternate solution. Which value would you propose we use for issue, redeem and replace requests?

@sander2
Copy link
Member

sander2 commented May 19, 2023

Which value would you propose we use for issue, redeem and replace requests?

I was thinking for issue/redeem, the active block number associated with the collateral currency. But actually, now that I think about it.. Do we need the active block number at all? Right now we need it because you can't execute when the oracle is offline. But if we remove that limitation, I think we may actually be able to just execute if the oracle is down. Probably not all codepaths would work - for issue overpayment we need to check vault minting capacity, so we need the oracle to be online.. But it may be acceptable to not mint extra tokens tokens in that case. I'll investigate this idea further

@sander2
Copy link
Member

sander2 commented May 22, 2023

Which value would you propose we use for issue, redeem and replace requests?

I was thinking for issue/redeem, the active block number associated with the collateral currency. But actually, now that I think about it.. Do we need the active block number at all? Right now we need it because you can't execute when the oracle is offline. But if we remove that limitation, I think we may actually be able to just execute if the oracle is down. Probably not all codepaths would work - for issue overpayment we need to check vault minting capacity, so we need the oracle to be online.. But it may be acceptable to not mint extra tokens tokens in that case. I'll investigate this idea further

What should work as normal:

  • execute_issue of pending issue with amount less than or equal to expected amount
  • execute_redeem (although the vault in question wouldn't get rewards since stake would be 0)
  • execute_replace of pending request
  • cancel_issue

What wouldn't work (or work differently):

  • execute_issue of pending issue with greater amount than expected: they wouldn't get more KBTC than they issued
  • execute_issue of cancelled issue would not work (can't check collateralization rate of vault)
  • cancel_redeem: requires price to determine compensation
  • request_replace
  • accept_replace
  • execute_replace of cancelled request: needs price for new_vault's collateral currency

All in all, I think this would be acceptable - there is no need to extend the time allocated for execution if the oracle goes down. All actors are able to process open requests as normal, as long as they act correctly. By that I mean that we can't provide some of the mitigations in place for user error (e.g. issue overpayment). But since these mitigations are "best effort" anyway, I think that's acceptable. The only problem is cancel_redeem, which requires the oracle to determine the compensation amount. I think it's acceptable to require users to wait with cancellation in the event of the oracle being down - that should be a very exception case anyway. If required, though, we can think about changes to the cancel_redeem s.t. it still works, but it comes at the cost of increased complexity and I'm not sure that's worth it

@gregdhill
Copy link
Member

@sander2 so the new proposal is to remove the error status and active block handling? For the latter I would assume we would still want to increase the global counter for now pending some future migration to revert back to the system height. I can't recall if there was another reason to extend the time allocated for execution but I'm open to this approach.

@sander2
Copy link
Member

sander2 commented May 23, 2023

@sander2 so the new proposal is to remove the error status and active block handling? For the latter I would assume we would still want to increase the global counter for now pending some future migration to revert back to the system height. I can't recall if there was another reason to extend the time allocated for execution but I'm open to this approach.

Yea that's right. Basically, we'd allow every function that doesn't fail in the oracle conversion to be callable.

Thanks for your reply, I'll discuss with Dom as well when he's back

@nud3l
Copy link
Member Author

nud3l commented May 29, 2023

I can't recall if there was another reason to extend the time allocated for execution but I'm open to this approach.

Yes, in case we would pause the parachain via SHUTDOWN state or the tx-pause module. In that case, users and vaults would not be able to execute requests.

However, this is not currency-specific. The tx-pause/SHUTDOWN would come into play when governance decide if there's emergency action to be taken.

So, I'd be in favor of the following change:

  • Remove the ORACLE_OFFLINE state
  • Stop counting the active blocks when the parachain is in SHUTDOWN state/tx-pause is active (not sure there's a nice way to combine the two?)
  • As Sander said, allow every function that doesn't fail in the oracle conversion to be callable
  • Handle oracle errors upstream: (1) The UI should show a validation error when users try to call functions that err when called. (2) The vault client should log that it can't call certain functions due to the oracle being offline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants