Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover historical ledger secrets from the ledger #2200

Merged

Conversation

jumaffre
Copy link
Contributor

@jumaffre jumaffre commented Feb 16, 2021

Resolves #1648

After recovering from a snapshot, a node doesn't have the history of ledger secrets required to deserialise historical ledger entries. This PR adds support for historical ledger secrets recovery from the ledger.

This is done by hoping through the ledger backwards, at the exact seqno at which the encrypted previous ledger secret was written to the public:ccf.internal.historical_encrypted_ledger_secret table, decrypting the previous ledger with the last recovered one, etc. The historical state cache keeps track of all recovered ledger secrets in network.historical_ledger_secrets so that further requests won't need to recover those ledger secrets (*).

To do so, each LedgerSecret now keeps track of the seqno at which the previous ledger secret was written to the store. This is fairly trivial for new ledger secrets on rekey but isn't for the first ledger secret after recovery. To solve this, I've had to introduce a new hook on the public:ccf.internal.historical_encrypted_ledger_secret table which only triggers once, on all nodes, after the private recovery is complete and the new ledger secret has been written to the ledger.

Edit: Also updated the schema for the public:ccf.internal.encrypted_ledger_secrets table which is now keyed at 0. This prevents infinite growth as more nodes are added to the service and simplifies the LedgerSecrets class which no longer needs to know about the node id.

(*) For now, the main and historical ledger secrets aren't compacted/garbage collected (follow-up work: #2199)

@jumaffre jumaffre requested a review from a team as a code owner February 16, 2021 17:00
search = changes.find(ccf::Tables::ENCRYPTED_PAST_LEDGER_SECRET);
if (search != changes.end())
{
success = ApplyResult::PASS_ENCRYPTED_PAST_LEDGER_SECRET;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we write to this table in transactions that touch multiple tables? This return scheme originally worked for PASS_SIGNATURES, which worked because signatures is a special table with a bunch of other requirements on it. I'm wary of repeating this pattern for too many tables, especially if we're not checking that there's only one table/value written.

Probably out of scope for this PR but: since we now split deserialise from application, and its returned value is really the raw changes, we could remove all of these enum values and just return PASS/FAIL, and leave the "did this produce some tables I care about" decision up to the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the broader store's deserialise() refactor (I've raised #2208).

Currently, the encrypted past ledger secret map isn't the only map touched by this transaction but the other maps touched aren't considered in the deserialise return code, so I think this change is safe for now. If in the future another map of interest is touched by the rekey transaction, the unit test for historical queries will fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't this be done with a hook?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new hook interface, it seems that we can use them but I'd rather do that in a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate PR is fine of course, it's the trend I'm worried about.

std::list<consensus::Index> recent_requests;

// To trust an index, we currently need to fetch a sequence of entries
// around it - these aren't user requests, so we don't store them, but we do
// need to distinguish things-we-asked-for from junk-from-the-host
std::set<consensus::Index> pending_fetches;

ccf::VersionedLedgerSecret get_first_known_ledger_secret()
{
auto tx = network.tables->create_read_only_tx();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid creating this tx here, and instead have get_latest/get_first variants that don't take a Tx and don't have a transactional dependency, iff doing this non-transactionally is really safe.

At the very least, we should create 2 different transactions for passing to the different stores: The assert means this Tx holds views over 2 different tables of the same name on different stores (in Debug only!). This is completely unexpected behaviour, and its probably doing something wrong under-the-hood.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transaction created here is read-only and never commits so it is only created to match the current interface. Unless I'm missing something, there's only one store involved here though (the main one, i.e. network.tables) so I think it is preferable to keep the current interface as it is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right, sorry I was way off here. I thought LedgerSecrets had a reference to the Store and was getting secrets from that store's tables, but of course this doesn't happen - the tables are only referred to by name, they're not per-Store.

I'm tempted to say that get_first doesn't need a Tx though, and then this tx is still unnecessary? get_latest is a cache/interpretation of the current value, so needs a transactional dependency to make sure its still 'current'. But first is an independent cache unrelated to the transaction - if first changes, it doesn't need to invalidate any in-flight transactions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I've now removed the tx argument to get_first().

src/node/secret_broadcast.h Outdated Show resolved Hide resolved
tests/infra/network.py Outdated Show resolved Hide resolved
"Node id should be set before taking dependency on secrets table");
}
secrets->get(self.value());
secrets->get(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unable to grok how this relates to the rest of the PR - why do we no longer store secrets per-node?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still do. The previous schema of the secrets table was NodeId to EncryptedLedgerSecrets. This meant two things: LedgerSecrets had to know their node ID and the table was forever growing, even when will we delete retired nodes from the store. Passing the node ID to the historical ledger secrets was awkward so I've changed the schema for the table so that we only store secrets at key 0.

@achamayou achamayou merged commit af6836b into microsoft:main Feb 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for historical queries after recovery from snapshot
4 participants