Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fastx] How to facilitate Bulk Sync between authorities, authorities and full replicas #194

Closed
gdanezis opened this issue Jan 17, 2022 · 17 comments
Assignees
Labels
Priority: High Very important task, not blocking but potentially delaying milestones or limiting our offering sui-node

Comments

@gdanezis
Copy link
Collaborator

Authorities need re-assurance at some point they have ingested all updates from other authorities up to a point. Similarly, full replicas (ie. nodes that try to replicate all state, but do not validate yet) also need to track the full state of the system, and to do so need to know they have received all updates from authorities up to a point. This design task is about designing this mechanism.

@gdanezis
Copy link
Collaborator Author

gdanezis commented Jan 17, 2022

@huitseeker has written on some ingredients we can use to solve this, here: https://docs.google.com/document/d/1LXVNqhHSbY499G-seenTYCPRHXormZLK9BXzMSmG7L4/edit?usp=sharing

Key readings:

  • Eppstein, David, Michael T. Goodrich, Frank Uyeda, and George Varghese. "What's the difference? Efficient set reconciliation without prior context." ACM SIGCOMM Computer Communication Review 41, no. 4 (2011): 218-229.
  • Ozisik, A. Pinar, Gavin Andresen, George Bissias, Amir Houmansadr, and Brian Levine. "Graphene: A new protocol for block propagation using set reconciliation." In Data Privacy Management, Cryptocurrencies and Blockchain Technology, pp. 420-428. Springer, Cham, 2017.

@lxfind lxfind changed the title [fastx] For to facilitate Bulk Sync between authorities, authorities and full replicas (Design Task) [fastx] For to facilitate Bulk Sync between authorities, authorities and full replicas Jan 21, 2022
@gdanezis gdanezis changed the title [fastx] For to facilitate Bulk Sync between authorities, authorities and full replicas [fastx] How to facilitate Bulk Sync between authorities, authorities and full replicas Jan 24, 2022
@gdanezis
Copy link
Collaborator Author

gdanezis commented Jan 24, 2022

Here are my current thoughts on Sync design:
https://docs.google.com/document/d/10SpQluu2Rpc9loUBbxyaCfXqCtkB-FwIZuucnDF_DSQ/edit?usp=sharing
@sblackshear , @huitseeker any comments welcome.

@huitseeker
Copy link
Contributor

huitseeker commented Jan 31, 2022

The core of the argument against the way you're implementing the push-pull gossip (and then using it to argue that the byzantine authorities never flood anybody else) is that:

  • if, as an authority, I only communicate with a small, slowly-evolving constant set of peers, then I'm vulnerable to eclipse attacks. The same issue, seen at scale, would show that Byzantine authorities are able to implement a min-cut of the peerage graph to split the network into two connected components, breaking BFT assumptions.
  • hence there must be out-of-peerage communications. The entire causality chain of that stream of communication cannot be strictly initiated by the recipient (though they can finish it in a "pull"), because the recipient may not be aware (through the silencing method above) there is something to "pull" in the first place.
  • hence as a honest authority I must accept out-of-peerage messages pushed to me from random other authorities: the intent is that another honest (but out-of-peerage) sender authority must be able to make me aware of a new interesting message to pull. That "awareness" push does not have to be the message itself, it can just be a pointer to it. The rest of the message can be pulled afterwards. The overall message propagation pattern looks like a Levy flight.
  • the only way I can "control" these initial awareness out-of-peerage messages can be sorted in two broad groups:
    • a priori, on the form (e.g. messages which size is limited by a constant, which admission is conditioned on fair resource allocation across senders...)
    • a posteriori, on the semantics : a new message is only re-transmitted further (through the peerage graph) if I, as the initial recipient authority, can vouch for its validity (e.g. it's not a duplicate, has not yet been seen by all my peers, etc ...)
  • hence there must be enough data transmitted for me, as an authority, to check the message is not a duplicate. In our case, the only thing that I, as an authority, can compare to my own knowledge, is the transaction hashes that are claimed to be in this new message.
  • to summarize, I must, as an honest authority:
    • be open to receive a claim to an interesting payload (e.g. a block containing new transactions)
    • from out-of-peerage authorities
    • and I must dereference this claim at least up to knowing which transactions it purports to contain (through hashes).

QED: honest authorities must be open to receive, up to fairness, a stream of transaction hashes claimed to be newly produced by any other authority, including Byzantine ones. Those messages must be received as long as they are fairly received (i.e. a byzantine authority does not produce more data than any other would) and be of a size at least equal to that amount of transaction hashes produced by the median honest authority over the same period of time.

If an honest authority produces V TPS, and the corresponding stream of transactions hashes are a V/n data stream, then honest authorities must therefore be open to receiving V/n bytes per second from a Byzantine authority, without constraining those hashes to match valid or new transactions. As a consequence, considering there are f Byzantine authorities, the amount of ingress that the honest node must accept is V(f/n), which should be sufficient to cause a significant amount of slowdown when f is concretely large (average TPS: 512 bytes, average has: 32 bytes, ratio n = 16).

@huitseeker
Copy link
Contributor

There's quite a bit more on IBLT-based solutions, notably:

@gdanezis
Copy link
Collaborator Author

Here I discuss one more ingredient we can use for end of epoch or state commitment sync, to get a agreed checkpoint with most of the work done asynchronously and a tiny bit synchronously: https://docs.google.com/presentation/d/1TSs1FPC9INZMpipP1vP2dK7XMmOXtHk5CLWFKO6s7p8/edit?usp=sharing

@huitseeker
Copy link
Contributor

Here's a new design that aims at producing a checkpoint as well, but hints at limited amounts of state synchronization along the way:
https://docs.google.com/document/d/17KjGGLl8L8MSkJ0RBS5au-jR4Jp475BFi_1-rJCS6mk/edit?usp=sharing

@huitseeker
Copy link
Contributor

huitseeker commented Feb 24, 2022

In a neighboring PR, @gdanezis was asking:

I am still unsure how a facility for a (say byzantine) node to commit to its sequence, and to allow clients to ask [for stuff] can lead to any kind of DoS or resource exhaustion.

It's not about the pull-based logic

The notion of "asking once you've seen the commitments", like any pull-based logic, is not a subject of concern. In fact, it's very likely that the commitments themselves, if they point me to finite-size payloads of data that each carry authentication, make me safe in asking for unseen data productively. Concretely, once I ask the server using a hash it signed over, it had better deliver me the correct and valid payload, or I'll have the cryptographic assets to reveal it as Byzantine.

The issue is that upstream of that, commitments without throughput limits in themselves are a "damned if you do, damned if you don't" logic: if as a node I don't subscribe to enough commitment streams, I'm likely to be unable to succeed in syncing with the network, and if I do subscribe to Byzantine authorities, I'm likely to be bogged down into improductive resource consumption (through the commitments alone) until I can determine for sure that the commitment data I'm receiving from a subscribee is not valuable.

Replay attacks

The problem is that determining the value of commitment data itself (without committing to further resource consumption) is very hard, and quickly becomes a cat-and-mouse game of how much ingress throughput a subscribee can steal from the subscriber without revealing it's Byzantine. That's because there's not much I can do with commitments except read them and see if I already have something for them, by which time the damage is done. The typical attack here is not a DDoS, it's a replay attack, where the replay is information that's 💯 valid but known by the Byzantine node to not make the client node progress.

In itself, this commitment replay attack is an amplification of the resource consumption of the sync process, and because it occurs at the expense of non-sync activities of the node (e.g. ingesting new transactions), it becomes a global throughput limit on the whole system.

One simple heuristic for clamping down on these shenanigans is to say that for a given unit of time, a client must know if there are interesting new elements to download anywhere in the system after having seen an amount of commitments of size s per server such that (3f+1) s is well below the ingress limits of the node.

It's of course a heuristic: I'm not saying it's impossible to have a protocol that works without a hard global cap on the throughput of commitments. For example, the cap could be adaptative and be readjusted periodically depending on load, just as the difficulty is for PoW blockchains. It just gets complicated quickly, and needs to account for f actors serving a commitment replay attack at the highest throughput they can get away with.

Absolutely zero structure on batch production (no max batch size, no batch frequency), though, is an additional risk because it lets any Byzantine node do whatever they want with this, and lets the subscribers only have network-based QoS as a defense (evenly divide throughput dedicated to commitment download among subscribees). Network-based QoS possible, but it's not easy to implement, and the implementations are not efficient on the most common network infrastructures today.

How to fit things into a constant?

I don't have the full story, but one thing that helps is this:

If a node has recently seen a batch of transactions, a lot of them are probably referring to transactions in the same batch, depending on each other.
e.g. if I have seen (TX1) and (TX2, which depends on TX1) within the same batch, I don't need to commit to TX1 within the batch. If I commit to TX2 alone, a honest node that has neither TX1 nor TX2 will ask me for it, check TX2 is valid, and come back with an ask for TX1 (through causal pointers).

How likely is that to occur within the same batch? The Utreexo paper, Figure 2 makes me think this is a pretty frequent case for Bitcoin at least.

The question I'm asking myself is: can we structure the presentation of commitments so that we transmit commitments to useful data more sparsely?

@gdanezis
Copy link
Collaborator Author

I got a few more resources from Martin kleppmann:

Martin Kleppmann martin.kleppmann@cl.cam.ac.uk
Thu 24/02/2022 11:18
Hi George,

That sounds good! On the problem of exchanging differences between sets with lots of overlap, there has been some interesting work on BFT set reconciliation algorithms:
https://arxiv.org/abs/1905.10518
https://github.com/sipa/minisketch

If you're willing to organise the elements of the set into a hash graph, you can also use a graph reconciliation algorithm, which I've worked on:
https://martin.kleppmann.com/2020/12/02/bloom-filter-hash-graph-sync.html
https://arxiv.org/abs/2012.00472

Best wishes,
Martin

@huitseeker
Copy link
Contributor

huitseeker commented Feb 25, 2022

I'm very familiar with Erlay, and have worked on extending + fuzzing rust bindings for minisketch. They're a nice constant-factor improvement on IBLTs that works along the same principles (this is what I was mentioning @asonnino), which we already integrated in our thinking.

I'm also very familiar with the hash graph Martin is suggesting (@gdanezis I believe I mentioned it before I joined - you then shared you had attended Martin's presentation in-person in 2019), but it works by re-constituting causal broadcast from scratch, and has a prohibitive communication complexity cost.

@gdanezis
Copy link
Collaborator Author

gdanezis commented Feb 28, 2022

It feels like we now have a critical mass of eyes on this issue, as well as design options / ingredients / proposals, so lets try to make a plan to make incremental progress, while still thinking / designing the best mechanisms today and down the line.

Priorities

In my mind we have the following priorities:

  • [Priority A - GDC] Enable explorers and other services in the short term to read the set of transactions processed by one or multiple authorities, although this may not be a globally complete set of processed transactions, either because these authorities are not entirely up to date with all transactions or they are faulty / byzantine. Some solutions to this can be part of a solution the next one, priority B.
  • [Priority B - Dev/Testnet] Enable honest authorities to directly or indirectly learn quickly, securely and cheaply of new transactions processed by other honest authorities, to reduce the burden on clients (particularly clients that are less reliable, or can only keep less state) dealing with many honest authorities that are not up to date. Honest authorities having a fuller set of processed transaction also helps with the next priority C.
  • [Priority C - Testnet] Enable checkpoints and state commitments for ever increasing global state across authorities to support a variety of functions: epoch reconfiguration, light clients doing total reads, new authority on-boarding, better replica / services mirroring of global state (ie. priority A).

I think that so far we (well I first of all, mea culpa) have interchangeably referred to all these with variants of sync / synchronization names (the S-word), leading to confusion. These functions are interrelated, and solving one well can support the others. In fact sometimes it feels we are going in circles since (as suggested in the bullet points above) there can be a cyclical dependency between their solutions. But I think it helps to mentally separate these needs if only to plan their development.

Summary of proposals and match to priority

I think that this is how the different stands of proposals fit into the above priorities:

  • PR [authority sync] Provide an interface for explorers to sync with single authority #509 narrowly addresses Priority A. And leaves open some potential to speak to Priority B, with natural impact on Priority C. Specifically, it implements in each authority a local sequence of processed transactions, with the aim for services to read this local sequence in the past, and get updates of the local sequence as it is updated. The batches currently only provide an index into this local sequence (A). These should be hashed + signed to enable secure caching. But the comments suggest that they can be augmented with structures (Merkle Trees, IBLT, what not) to support B down the line.

  • Using event-time / cert bitmaps addresses Priority B, and therefore supports naturally Priority C (https://docs.google.com/document/d/1RF8YkbwMSvQmhkdHjRJc0Oi1s_eq2AE9hDMK39Bl-AM/edit?usp=sharing). Promoting a quasi-consistent sequence on all honest authorities, and then using efficient representations as bit maps to allow authorities / third parties to detect and fill in disparities can support authorities learning of transactions other authorities have processed. Once authorities are more, rather than less, up to date with each other they should find it easier to agree on state for C.

  • Our various earlier explorations around using authority local Merkle Trees for state, IBLTs, bloom filters, and push-pull gossip, also largely address priority B (see: https://docs.google.com/document/d/1LXVNqhHSbY499G-seenTYCPRHXormZLK9BXzMSmG7L4/edit?usp=sharing, https://docs.google.com/document/d/10SpQluu2Rpc9loUBbxyaCfXqCtkB-FwIZuucnDF_DSQ/edit?usp=sharing). These are mechanisms to disseminate cheaply (in O(n) rather than O(n^2) for gossip) processed transactions, as well as to allow efficient ( O(diff) for IBLT / Merkle Trees) discovery of discrepancies between authorities for them to exchange more complete information. Merkle Trees on agreed state also speak to C (state commitments in various forms), and if we have one implementation we can re-use it.

  • Proposal for "lazy" state agreement (my words, to avoid using the S-word, see https://docs.google.com/document/d/17KjGGLl8L8MSkJ0RBS5au-jR4Jp475BFi_1-rJCS6mk/edit?usp=sharing) speak to priority C. This mechanism leverages an effective mechanism for B + rough timestamps, to reach agreement on somewhat old state without the need to re-transmit all transactions that everyone should already have. Of course this speaks to A, B for cases where the information necessary is somewhat older, and can use a global state commitment as a reference, rather than a partial view (A) or a real-time view (B).

  • The deck on spanning tree checkpointing similarly also addresses priority C (https://docs.google.com/presentation/d/1TSs1FPC9INZMpipP1vP2dK7XMmOXtHk5CLWFKO6s7p8/edit?usp=sharing). It leverages mechanisms speaking to B to pairwise update authorities, and then use these pairwise relations and an agreement protocol to get checkpoints. It naturally speaks to A and B, if the information of interest is older than checkpoints rather than real-time.

Design decisions, and their nature

My aim here is to separate design decisions that impact the protocol -- namely that all authorities / clients need to agree upon for the system to function. Versus design decisions that are more flexible, in that authorities and clients can show a certain amount of discretion, and as a result we can afford to incrementally design the best techniques now, until testnet/mainnet and beyond. This is an important distinction for decentralised platform protocols: we need to impose the smallest amount of obligations upon all, and maximise the flexibility for peer innovation down the line.

Core Protocol decisions:

  • Certificate timestamps: both event-time / cert bitmaps and "lazy" state agreement directions rely on all honest authorities adding timestamps when they sign transactions. Such timestamps can also be used as part of push-pull gossip designs. Should we have these? What is their cost? What are their benefits? What are their semantics? Do they require fundamental architectural changes? (reminder: these are about B and C)
  • State agreement: by its very nature this is a protocol that requires all honest authorities to participate and contribute to. Here we need to discuss how frequently this happens (the less frequently, the less we can rely on this to supplement A and B), as well as the minimum functions that each authority should perform to make it work. By the nature of this problem we can only have one state agreement mechanism at once. (This is firmly C, but can support less real time needs for B and A).

Non-core Protocol decisions:

  • What mechanism should each authority provide for services and replicas to sync with it (this is priority A). This is largely a peering decision, and one that can evolve other time. It's ok to have a v0 protocol (that say works for smaller volumes) and then provide a V1 protocol that supports more features / is more efficient. This impacts the relationship an authority and its clients. Of course, we prefer something that many will find useful as what we come out on mainnet with is likely to be the default for some time.
  • What mechanism are authorities using amongst each other to share in a pairwise or group-wise manner recent transactions (priority B). Assuming transactions contain some agreed information (are signed, timestamps, etc), authorities can innovate in terms of the additional structures they support to facilitate priority B. More importantly, improvements can be rolled out incrementally as long as the transaction structure / authenticators supports some base shared information (see the core protocol decision about timestamps). Again, we want to have something that works for our initial mainnet, and some time after. But choosing one or two mechanism for this, does not preclude having more / better ones down the line.

Call to action: @huitseeker , @velvia , @lxfind , @laura-makdah , @sblackshear -- this is my thinking, lets have a meeting this week of the next to ensure we are all on the same page; make a schedule of when / how to make the core protocol decisions above; and also agree on what decisions should be core, and which we can just be more flexible about (non-core); and then move forward.

@huitseeker
Copy link
Contributor

huitseeker commented Feb 28, 2022

Inverted index:

Then come the hybrid proposals, both based on timestamping, to categorize some state as "behind us", and therefore help both sync (there's some state we no longer have to talk about) and agreement (there's some part of state we're confident we're done with):

  • proposal logical event time & cert bitmaps uses logical timestamping to help 2 authorities compare their state and address state sync (B & C)
  • proposal median timestamping uses median timestamping to put rough timestamps on all events and drive state agreement (B & C)

There's also IIUC a topology proposal:

  • proposal spanning trees speaks to (B&C) as a supplement in that it takes a pairwise sync and transforms it into a mechanism for global sync.

Personal note

Here are cross-cutting concerns I have with many of the proposals above, very much including some of mine:

  • complexity: we're reaching very fast for very complex tools. Gossip protocols and IBLTs are not easy to build or deploy, there are few production-grade tools for them. For example I no longer think using IBLTs is sound (and would rather use BCH sketches which are more production ready).
  • synchronization without any state agreement: some form of state commitment (even on very old state), means we can leave the actual synchronization of that part of the state to a third party (full node, service provider) without trust assumptions. If we do not have any, it means that at any moment, validators in the system are responsible at all times for synchronizing all the state since genesis between themselves, and with the outside world.
    • that's much harder to scale: since each authority could originate new state, everybody needs to read all their signals (even if they don't read their data)
    • we're dealing with Byzantine authorities which can troll us by sending useless signals about the state they have,
    • the resulting designs do not prove they make progress, since there is no portion of state w.r.t which we could measure this progress,
    • by progress above I mean decreasing the amount of un-replicated state over time, even though this should be trivial, since every validator is supposed to receive the same broadcasts as everyone else
  • monitoring & protocol impact: the obvious challenge will be not to impact the main bandwidth of the nodes (devoted to voting and transaction ingestion) and it will be a concern to both convince ourselves that what we field addresses this challenge, and monitor its impact over time.

@lanvidr
Copy link
Contributor

lanvidr commented Feb 28, 2022

At Fastly, the first way we handled syncing of distributed rate limiting data was using bloom filters to detect differences of state. But that ended up being far more expensive than we imagined and a more naive solution that sends the interesting set on a regular basis was much faster in practice. Although we did have the advantage of idempotency, so merging data that was already known about was not an issue. The larger global cache network would communicate using a variation of push-pull gossip, and that allowed a communication to transmit globally at a speed that was indeed very fast.

@velvia
Copy link
Contributor

velvia commented Feb 28, 2022

@gdanezis thank you for a very comprehensive summary, that really helps. Will try to read through the spanning tree proposal before the meeting tomorrow.

What mechanism should each authority provide for services and replicas to sync with it (this is priority A). This is largely a peering decision, and one that can evolve other time. It's ok to have a v0 protocol (that say works for smaller volumes) and then provide a V1 protocol that supports more features / is more efficient. This impacts the relationship an authority and its clients. Of course, we prefer something that many will find useful as what we come out on mainnet with is likely to be the default for some time.

What I'm hoping for from data architecture perspective is that

  1. Through successive state agreements, we can have an increasingly large portion of the older state that is globally agreed, and
  2. This agreed upon older state can live outside of authorities, like in warm or cold storage somewhere. Thus, we want a tiered storage architecture
  3. which allows the bulk of analysis, replicas, and state to be synced to happen without the involvement of authorities, which can just focus on the latest transactions.

I know this is a tangential concern but just noting it

@velvia
Copy link
Contributor

velvia commented Feb 28, 2022

BTW I think @gdanezis the spanning tree checkpointing is brilliant. I like it much more than using modulo math to n-way do set reconciliation. I wonder who keeps track of the graph etc. though.

There is one mode of failure I haven't seen discussed much. I suspect that, much more frequently than byzantine attacks, will be the scenario that some authority was offline for like a few minutes or whatever, during which there will be a big "hole" in its data. That would need to be rectified first to make the O(diff) algorithms efficient.

@gdanezis
Copy link
Collaborator Author

gdanezis commented Mar 1, 2022

Edit from @huitseeker : I backed up this long comment copy-pasting the chat bar of one of our discussions from March 1st here:
https://gist.github.com/huitseeker/90ea22a19309b207ee8015c1cd8fbfa4

@huitseeker
Copy link
Contributor

huitseeker commented Mar 4, 2022

State Synchronization discussions summary

The pre-game summary of George at this comment is a good setup for where we were (esp. proposals) before the talks on March 1st.

Problems

In a nutshell, we agree we are going to have to solve all of three problems:

  • problem A, the Follower: a subscriber wants to stream all the transaction information generated by a server. Eventually, the server side may be a pool of machines1, but as of today, this issue is mostly defined by the need for the block explorer, and the server is one authority.
    • The demand is for Certified transactions and their effects,
    • Transactions can be served one by one, there's no client-side need to "batch" them,
    • The default is to subscribe to all the data processed by the server
    • It would be great to have the ability to register for a historical subscription, i.e. to ask for all the data the node ever processed since genesis,
    • The subscription, in so far as it returns some transactions, can only be "late" in a loose sense. It will in particular never return something "contradictory" or "speculative".
      • there's fine-print on what late means,
      • obviously, very late is a possibility,
      • that fine-print also includes stale reads, intermediate reads, and non-monotonic reads (with conditions).
  • problem B, the Reconciliation: two machines want an exhaustive and fresh view of what processed certified transactions each other know, and communicate to achieve this. The natural extension is to consider the same problem for $n > 2$ machines. This is motivated mainly by authorities that must recover from a partial broadcast from client that only "hit" some replicas.
    • It's a liveness requirement to have such a functionality,
    • It also lets us drop the current cumbersome behavior of the client "back-filling" authorities when they lack a fragment of state,
    • We expect there will eventually be other use cases, including the "full node" role (covering autidting the chain),
    • In a "happy" world where all broadcasts in the protocol are complete and never fail, there's no need for this: all honest authorities receive the same broadcast and have precisely the same state,
  • problem C, the Agreement: the machines in the network want to build an Accumulator iteratively. That Accumulator is a continuously maintained subset of the transactions in the system, and the maintenance consists in growing that subset by increments, while keeping the invariant that all authorities agree on what is or isn't in the set. The motivation is to serve at least that agreement at the epoch change, and possibly to help with problem B above, and augment the scalability of exhaustive blockchain reads.
    • This requires consensus, nearly by definition,
    • however, off-the-shelf consensus protocols usually bundle data dissemination with the process of carving consensual increments to an accumulator,
    • As mentioned above, most of our data is already disseminated.
    • We note that generating periodic consensus through solving problem C is not a systematic substitute for all reads on the system: once somebody shows up with the latest transaction on a particular account, we know how to read anything that led to that transaction. Waiting for consensus only really makes sense when trying to get a view of the whole system.

We note that while a Follower can afford to trust its server, the reconciliation and agreement problems have to be designed for a Byzantine environment.

Solutions

Solving problem A is relatively uncontroversial, in that there's only one proposal to go about it (which will benefit from some of the above insight).

For problems B & C, we agree that from the moment that we know the hash of a new transaction from a certain sender, we know how to safely2 synchronize everything that is a causal antecedent of it, including all prior transactions for this sender. So the main obstacle is an exchange of metadata: the hashes of the latest transactions from each account.

In solving B & C, we have are several categories of proposals:

  • the spanner: it says "assume we have a two-way synchronization protocol, how do we make it work n-way?": [1]
  • the timestampers: those that say "let's add more structure to the data (i.e. time stamps of some form) so that it becomes easier to compare portions of the state": . [2] [3]
  • the matchmakers: those that focus on B and say "assume most data is already reconciled, how do we go fast about figuring out what precious fragment remains?": [4] [5]

The challenge in these protocols is to make sure they are parcimonious in their bandwidth usage (by all rights, we should exchange but a tiny amount of data).

In the short term, we can just focus on just the proposals that require modifications to the core protocol (better engineered early). Only the timestamper approaches have that requirement.

The core advantage of these approaches is they can define rough segments of state that is produced at a limited rate in the first place:

  • any time we speak of events within a rough timestamp interval (e.g. "time stamped between Tuesday at 2pm and Tuesday at 3pm"), we are talking about a reasonable and finite subset of the state (all certificates in the system),
  • finite is because authorities have only produced so many events in that interval,
  • reasonable is because to claim an inordinate proportion of the events in the system come from that interval would require making authorities produce the timestamps at that interval in the first place.

Where this helps reconciliation is that instead of entering an open-ended reconciliation problem over all data from genesis, we can now have sub-protocols where we address timestamp ranges one after the other, and make synchronization incremental.

Where this helps agreement is that we can agree over fragments of the past we commit to have seen: if we assume a reconciliation mechanism that ensures delivery within some time, all honest authorities can make sets of their certificates way older than the time it takes to disseminate them, put these in a state commitment, and try to get agreement on it (2f+1 signatures).

We then discussed some details of proposal [3], namely:

  • the proposal is a specific form of timestamping called median timestamping,
  • there is no unique timestamp in the system: for a same event, different authorities have a different timestamp,
  • however, their timestamps do not differ by more than a bound,
  • that bound (valid with high probability) is the longest time to send a message between any two authorities,
  • in practice, that's about 1/2 second max,
  • it requires authorities to have well-synchronized clocks,
  • the timestamps are not used to order transactions, as we already have a causal order for that,
  • where some impact on execution may be useful in the future is (approximate) MEV mitigation in those transactions affecting shared state.

At the end of the meeting, we have a positive outlook on integrating some time stamps to the protocol.

Footnotes

  1. e.g. the pool of machines could publish their collective data over a gossip network,

  2. in a way resistant to Byzantine participants

@gdanezis
Copy link
Collaborator Author

This is the background discussion to the #1099 issue.

@gdanezis gdanezis added this to the Pre Testnet milestone Apr 19, 2022
@gdanezis gdanezis added Priority: High Very important task, not blocking but potentially delaying milestones or limiting our offering sui-node labels Apr 19, 2022
@lxfind lxfind removed this from the Pre Testnet milestone May 9, 2022
@gdanezis gdanezis added this to the Pre Testnet milestone May 26, 2022
@MystenLabs MystenLabs locked and limited conversation to collaborators Jun 16, 2022
@gdanezis gdanezis converted this issue into discussion #2588 Jun 16, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Priority: High Very important task, not blocking but potentially delaying milestones or limiting our offering sui-node
Projects
None yet
Development

No branches or pull requests

7 participants