Skip to content

Latest commit

 

History

History
190 lines (137 loc) · 13.6 KB

emergency.md

File metadata and controls

190 lines (137 loc) · 13.6 KB

Emergency Procedures for Porter Finance

Introduction

This document details the procedures and guidelines that should take place in the event of an emergency situation. Its purpose is to minimize the risk for loss of funds for Porters's users, Treasury, and Smart Contracts.

Definitions and Examples of Emergencies

For the purposes of this document, an emergency situation is defined to be:

Any situation that may lead to a considerable amount of loss of funds for Porter users, Porter's Treasury, or Smart Contracts deployed by Porter.

This is a non exhaustive list of possible emergency scenarios:

  1. Bug/Exploit in Bond code that can cause a loss of funds for users
  2. Potential exploit discovered by team or bounty program researcher
  3. Active exploit / hack in progress discovered by unknown party

Roles

In the event of an emergency situation, the following roles should be assigned to Porter team working to resolve the situation. Preferred assignees are left to right.

  • Facilitator (@RusseII, @jordanalexandermeyer)
  • Multi-sig Herder (@RusseII, @Geczy, @jordanalexandermeyer)
  • Strategist Lead (@RusseII, @jordanalexandermeyer)
  • Core Dev Lead (@Namaskar-1F64F, @RusseII)
  • Web Lead (@Geczy, @Namaskar-1F64F, @RusseII)
  • Ops (@jordanalexandermeyer, @RusseII)

A contributor may be assigned up to three of these roles concurrently.

Facilitator

Facilitates the emergency handling and ensures the process described in this document is followed, engaging with the correct stakeholders and teams in order for the necessary decisions to be made quickly. A suitable Facilitator is any person familiar with the process and is confident that they can drive the team to follow through. It's expected that the person assigned to this role has relevant experience either from having worked real scenarios or through drill training.

Multi-sig Herder

The entire team needs contacted if there is an issue. This person is responsible for gathering all individuals into our war room, and ensuring that different Porter teams' Multi-sig wallets (i.e. bookland.porterfinance.eth, jordan.porterfinance.eth, namaskar.porterfinance.eth) are able to execute transactions in a timely manner during the emergency.

Main responsibilities:

Strategist Lead

In charge of coordinating quick changes to management and strategist roles during the emergency, including but not limited to:

  • Prepare and Execute Strategist Multi-sig transactions and operations
  • Update issuer list
  • Update allow list

Core Dev Lead

Coordinates quick changes to Governance and Guardian roles during the emergency, including but not limited to:

  • Any smart contract changes
  • Any graph changes
  • Any backend changes

Web Lead

Coordinates quick changes to UI and Websites as required, including but not limited to:

  • Disable redeem/convert/createBond through the UI
  • Display alerts and banners
  • Other UI related work

Ops

In charge of coordinating comms and operations assistance as required:

  • Clear with War Room what information and communication can be published during and after the incident
  • Coordinate Communications
  • Take note of timelines and events for disclosure

Emergency Steps

Also see Check list and Tools.

This acts as a guideline to follow when an incident is reported requiring immediate attention.

The primary objective is minimized the loss of funds, in particular for Porter's users. All decisions made should be driven by this goal.

  1. Open a private chat room (War Room) with a voice channel and invite only the team members that are online that can cover the roles described above. The War Room is limited to members that act in the capacities of the designated roles, as well as additional persons that can provide critical insight into the circumstances of the issue and how it can best be resolved.
  2. All the information that is gathered during the War Room should be considered private to the chat and not to be shared with third parties. Relevant data should be pinned and updated by the Facilitator for the team to have handy.
  3. The team's first milestone is to assess the situation as quickly as possible: Confirming the reported information and determine how critical the incident is. A few questions to guide this process:
    • Is there confirmation from several team members/sources that the issue is valid? Are there example transactions that show the incident occurring? (Pin these in the War Room)
    • Is the Strategist that knows the most about the code in the War Room? Can the Strategist in question be reached? If not, can we reach the backup Strategist?
    • Are funds presently at risk? Is immediate action required?
    • Is the issue isolated or does it affect several Bonds? Can the affected contracts be identified? (Pin these in the War Room)
    • Which Multi-sig will require signing to address the issue? The Multi-sig Herder should begin to notify signers and clear the queue in preparation for emergency transactions.
    • If there is no immediate risk for loss of funds, does the team still need to take preventive action or some other mitigation?
    • Is there agreement in the team that the situation is under control and that the War Room can be closed?
  4. Once the issue has been confirmed as valid, the next stop is to take immediate corrective action to prevent further loss of funds. If root cause requires further research, the team must err on the side of caution and take emergency preventive actions while the situation continues to be assessed. A few questions to guide the decisions of the team:
    • Disable createBonds for the affected contracts? Should createBond, convert, redeem, withdraw, pay be removed from the UI?
    • Are multiple Team members able to confirm the corrective actions will stop the immediate risk through local hardhat fork testing? Strategist and Core Dev main roles in particular to confirm this step.
  5. Once corrective measures are in place and there is confirmation by multiple sources that funds are no longer at risk, the next objective is to identify the root cause. A few questions/actions during this step that can help the team make decisions:
    • What communications should be made public at this point in time?
    • Can research among members of the War Room be divided? This step can be open for team members to do live debug sessions sharing screens to help identify the problem using the sample transactions.
  6. Once the cause is identified, the team can brainstorm to come up with the most suitable remediation plan and its code implementation (if required). A few questions that can help during this time:
    • In case there are many possible solutions can the team prioritize by weighing each option by time to implement and minimization of losses?
    • Can the possible solutions be tested and compared to confirm the end state fixes the issue?
    • Is there agreement in the War Room about the best solution? If not, can the objections be identified and a path for how to reach consensus on the approach be worked out, prioritizing the minimization of losses?
    • If a solution will take longer than a few hours, are there any further communications and preventive actions needed while the fix is developed?
    • Does the solution require a longer term plan? Is there identified owners for the tasks/steps for the plan's execution?
  7. Once a solution has been implemented, the team will confirm the solution resolves the issue and minimizes the loss of funds. Possible actions needed during this step:
    • Run in hardhat fork simulations of end state to confirm the proposed solution(s)
    • Coordinate signatures from multi-sig signers and execution
    • Enable UI changes to normalize operations as needed
  8. Assign a lead to prepare a disclosure (should it be required), preparing a timeline of the events that took place.
  9. The team agrees when the War Room can be dismantled. The Facilitator breaks down the War Room and sets reminders if it takes longer than a few hours for members to reconvene.

Emergency Checklist

This checklist should be complemented with the steps

  • Create War room with audio
  • Assign Key Roles to War Room members
  • Add Strategist or other Expert (or their backup) to the War Room
  • Clear related Multi-sig queues
  • Disable createBond and/or redeem/convert/withdraw/pay as needed in the web UI
  • Confirm and identify issue
  • Take immediate corrective/preventive actions in order to prevent (further) loss of funds
  • Communicate the current situation internally and externally (as appropriate)
  • Determine the root cause
  • Propose workable solutions
  • Implement and validate solutions
  • Prioritize solutions
  • Reach agreement in Team on best solution
  • Execute solution
  • Confirm incident has been resolved
  • Assign ownership of security disclosure report
  • Disband War Room
  • Conduct immediate debrief
  • Schedule a Post Mortem

Tools

List of tools and alternatives in case primary tools are not available during an incident.

Description Primary Secondary
Code Sharing Github HackMd, CodeShare
Communications* Discord Telegram
Transaction Details Etherscan EthTxInfo
Debugging Hardhat Tenderly
Transaction Builder gnosis-safe Backup if gnosis safe Api is not working?
Screen Sharing* Discord Google Hangouts

Facilitator is responsible to ensure no unauthorized persons enter the War Room or join these tools via invite links that leak.

Incident Post Mortem

A Post Mortem should be conducted after an incident to gather data and feedback from War Room participants in order to produce actionable improvements for Porter processes such as this one.

Following the dissolution of a War Room, the Facilitator should ideally conduct an immediate informal debrief to gather initial notes before they are forgotten by participants.

This can then be complemented by a more extensive Post Mortem as outlined below.

The Post Mortem should be conducted at the most a week following the incident to ensure a fresh recollection by the participants.

It is key that most of the participants of the War Room are involved during this session in order for an accurate assessment of the events that took place. Discussion is encouraged. The objective is to collect constructive feedback for how the process can be improved, and not to assign blame on any War Room participants.

Participants are encouraged to provide inputs on each of the steps. If a participant is not giving inputs, the Facilitator is expected to try to obtain more feedback by asking questions.

Post Mortem Outputs

  • List of what went well
  • List of what be improved
  • List of questions that came up in the Post Mortem
  • List of insights from the process
  • Root Cause Analysis along with concrete measures required to prevent the incident from ever happening again.
  • List of action items assigned to owners with estimates for completion.

Post Mortem Steps

  1. Facilitator runs the session in a voice channel and shares a screen for participants to follow notes.
  2. Facilitator runs through an agenda to obtain the necessary outputs.
  3. For the Root Cause Analysis part, the Facilitator conducts an exercise to write the problem statement first and then confirm with the participants that the statement is correct and understood.
  4. Root Cause Analysis can be identified with following tools:
  5. Once Root Causes have been identified, action items can be written and assigned to willing participants that can own the tasks. It is recommended that an estimated time for completion is given. A later process can track completion of given assignments. Note: The action items need to be clear, actionable and measurable for completion
  6. The Facilitator tracks completion of action items. The end result of the process should be an actionable improvement in the process. Some possible improvements:
    • Changes in the process and documentation
    • Changes in code and tests to validate
    • Changes in tools implemented and incorporated into the process