Recover historical ledger secrets from the ledger #2200

jumaffre · 2021-02-16T17:00:42Z

Resolves #1648

After recovering from a snapshot, a node doesn't have the history of ledger secrets required to deserialise historical ledger entries. This PR adds support for historical ledger secrets recovery from the ledger.

This is done by hoping through the ledger backwards, at the exact seqno at which the encrypted previous ledger secret was written to the public:ccf.internal.historical_encrypted_ledger_secret table, decrypting the previous ledger with the last recovered one, etc. The historical state cache keeps track of all recovered ledger secrets in network.historical_ledger_secrets so that further requests won't need to recover those ledger secrets (*).

To do so, each LedgerSecret now keeps track of the seqno at which the previous ledger secret was written to the store. This is fairly trivial for new ledger secrets on rekey but isn't for the first ledger secret after recovery. To solve this, I've had to introduce a new hook on the public:ccf.internal.historical_encrypted_ledger_secret table which only triggers once, on all nodes, after the private recovery is complete and the new ledger secret has been written to the ledger.

Edit: Also updated the schema for the public:ccf.internal.encrypted_ledger_secrets table which is now keyed at 0. This prevents infinite growth as more nodes are added to the service and simplifies the LedgerSecrets class which no longer needs to know about the node id.

(*) For now, the main and historical ledger secrets aren't compacted/garbage collected (follow-up work: #2199)

…ter_recovery_snapshot

…:jumaffre/CCF into historical_query_after_recovery_snapshot

eddyashton · 2021-02-17T15:53:46Z

src/kv/deserialise.h

+      search = changes.find(ccf::Tables::ENCRYPTED_PAST_LEDGER_SECRET);
+      if (search != changes.end())
+      {
+        success = ApplyResult::PASS_ENCRYPTED_PAST_LEDGER_SECRET;


Do we write to this table in transactions that touch multiple tables? This return scheme originally worked for PASS_SIGNATURES, which worked because signatures is a special table with a bunch of other requirements on it. I'm wary of repeating this pattern for too many tables, especially if we're not checking that there's only one table/value written.

Probably out of scope for this PR but: since we now split deserialise from application, and its returned value is really the raw changes, we could remove all of these enum values and just return PASS/FAIL, and leave the "did this produce some tables I care about" decision up to the caller.

Agreed on the broader store's deserialise() refactor (I've raised #2208).

Currently, the encrypted past ledger secret map isn't the only map touched by this transaction but the other maps touched aren't considered in the deserialise return code, so I think this change is safe for now. If in the future another map of interest is touched by the rekey transaction, the unit test for historical queries will fail.

Why can't this be done with a hook?

With the new hook interface, it seems that we can use them but I'd rather do that in a separate PR.

Separate PR is fine of course, it's the trend I'm worried about.

eddyashton · 2021-02-17T16:00:31Z

src/node/historical_queries.h

    std::list<consensus::Index> recent_requests;

    // To trust an index, we currently need to fetch a sequence of entries
    // around it - these aren't user requests, so we don't store them, but we do
    // need to distinguish things-we-asked-for from junk-from-the-host
    std::set<consensus::Index> pending_fetches;

+    ccf::VersionedLedgerSecret get_first_known_ledger_secret()
+    {
+      auto tx = network.tables->create_read_only_tx();


I think we should avoid creating this tx here, and instead have get_latest/get_first variants that don't take a Tx and don't have a transactional dependency, iff doing this non-transactionally is really safe.

At the very least, we should create 2 different transactions for passing to the different stores: The assert means this Tx holds views over 2 different tables of the same name on different stores (in Debug only!). This is completely unexpected behaviour, and its probably doing something wrong under-the-hood.

The transaction created here is read-only and never commits so it is only created to match the current interface. Unless I'm missing something, there's only one store involved here though (the main one, i.e. network.tables) so I think it is preferable to keep the current interface as it is?

Yeah you're right, sorry I was way off here. I thought LedgerSecrets had a reference to the Store and was getting secrets from that store's tables, but of course this doesn't happen - the tables are only referred to by name, they're not per-Store.

I'm tempted to say that get_first doesn't need a Tx though, and then this tx is still unnecessary? get_latest is a cache/interpretation of the current value, so needs a transactional dependency to make sure its still 'current'. But first is an independent cache unrelated to the transaction - if first changes, it doesn't need to invalidate any in-flight transactions?

Yep. I've now removed the tx argument to get_first().

src/node/historical_queries.h

src/node/secret_broadcast.h

tests/infra/network.py

eddyashton · 2021-02-17T16:43:54Z

src/node/ledger_secrets.h

-          "Node id should be set before taking dependency on secrets table");
-      }
-      secrets->get(self.value());
+      secrets->get(0);


I'm unable to grok how this relates to the rest of the PR - why do we no longer store secrets per-node?

We still do. The previous schema of the secrets table was NodeId to EncryptedLedgerSecrets. This meant two things: LedgerSecrets had to know their node ID and the table was forever growing, even when will we delete retired nodes from the store. Passing the node ID to the historical ledger secrets was awkward so I've changed the schema for the table so that we only store secrets at key 0.

Co-authored-by: Eddy Ashton <ashton.eddy@gmail.com>

…:jumaffre/CCF into historical_query_after_recovery_snapshot

…ter_recovery_snapshot

…:jumaffre/CCF into historical_query_after_recovery_snapshot

Julien Maffre added 29 commits February 8, 2021 15:28

WIP

e64843d

Store previous secret stored version in ledger secrets

207a354

Doesn't work with snapshots

4d2ce9f

WIP

d42bec9

Merge remote-tracking branch 'upstream/main' into historical_query_af…

f3b8b64

…ter_recovery_snapshot

Works for past secrets

d32cbb1

Write previous version in latest secret entry

08a1058

Recovery of all secrets previous stored version on primary

def2af4

WIP: start to fetch historical ledger secrets

99d5886

Versions are set right again!

e38533d

Copy recover ledger secrets to map

965aad4

End-to-end test works

49a024e

Add special one off hook to adjust ledger secret

c7d51bb

Unit test works again

c5983c9

Unit test WIP

22d83ba

WIP unit test

a314f2c

WIP

8d5912f

Works for first idx

017d6b3

Recover all the historical secrets!

dff77c1

Refactor #1

712a95d

More refactor

fec26bb

Ledger secrets are shared pointers

99bbfd3

Cleanup

7ebdfa3

Cleanup

cad5962

More cleanup

3cee257

Fix issue with node id

ff79f0c

Format

f448d8f

Simplify tests

c4a93da

More cleanup

c0fdd82

jumaffre requested a review from a team as a code owner February 16, 2021 17:00

Julien Maffre added 4 commits February 17, 2021 11:54

Oops

f882104

Merge branch 'historical_query_after_recovery_snapshot' of github.com…

1df20ab

…:jumaffre/CCF into historical_query_after_recovery_snapshot

Fix node frontend unit test

c008c9f

Don't default to recovering from snapshot in test suite

e43e812

eddyashton reviewed Feb 17, 2021

View reviewed changes

src/node/historical_queries.h Outdated Show resolved Hide resolved

jumaffre and others added 3 commits February 17, 2021 16:14

Merge branch 'main' into historical_query_after_recovery_snapshot

291b73c

Merge branch 'main' into historical_query_after_recovery_snapshot

c724457

Don't check for tx_id from response

d2b5c24

eddyashton reviewed Feb 17, 2021

View reviewed changes

jumaffre and others added 12 commits February 17, 2021 16:52

Update tests/infra/network.py

46cca2e

Co-authored-by: Eddy Ashton <ashton.eddy@gmail.com>

Do not throw if a ledger secret isn't present

95632a0

Check that the transaction is still marked as committed

4bd26bb

Merge branch 'historical_query_after_recovery_snapshot' of github.com…

d8a1b40

…:jumaffre/CCF into historical_query_after_recovery_snapshot

Decompose

18c942f

Merge branch 'main' into historical_query_after_recovery_snapshot

3423c17

Simplify serialization

eaabfcb

Remove tx in ls->get_first()

b783e35

Format

87f4ac3

Merge remote-tracking branch 'upstream/main' into historical_query_af…

097d7fe

…ter_recovery_snapshot

fmt

9d8c697

Format cmake

ee52dbb

eddyashton approved these changes Feb 18, 2021

View reviewed changes

jumaffre and others added 3 commits February 18, 2021 17:23

Merge branch 'main' into historical_query_after_recovery_snapshot

a89d3df

Changelog

a25bd0f

Merge branch 'historical_query_after_recovery_snapshot' of github.com…

dd40527

…:jumaffre/CCF into historical_query_after_recovery_snapshot

achamayou approved these changes Feb 18, 2021

View reviewed changes

Merge branch 'main' into historical_query_after_recovery_snapshot

8ffae17

achamayou merged commit af6836b into microsoft:main Feb 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recover historical ledger secrets from the ledger #2200

Recover historical ledger secrets from the ledger #2200

jumaffre commented Feb 16, 2021 •

edited

Loading

eddyashton Feb 17, 2021

jumaffre Feb 18, 2021

achamayou Feb 18, 2021

jumaffre Feb 18, 2021

achamayou Feb 18, 2021

eddyashton Feb 17, 2021

jumaffre Feb 18, 2021

eddyashton Feb 18, 2021

jumaffre Feb 18, 2021

eddyashton Feb 17, 2021

jumaffre Feb 17, 2021

Recover historical ledger secrets from the ledger #2200

Recover historical ledger secrets from the ledger #2200

Conversation

jumaffre commented Feb 16, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jumaffre commented Feb 16, 2021 •

edited

Loading