Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[authority sync] Provide an interface for explorers to sync with single authority #509

Merged
merged 36 commits into from
Mar 3, 2022

Conversation

gdanezis
Copy link
Collaborator

@gdanezis gdanezis commented Feb 21, 2022

We augment the authority:

  • A table with a sequence of executed certificates / effects.
  • A table of blocks batching executed certificates / effects.
  • Create batches (asynchronously).
  • Chain hash and sign batches.

Next PR:

  • Offer a client interface to download a block / part of the sequence.
  • Offer a client interface to stream updates of executed certificates / effects.

@gdanezis gdanezis marked this pull request as draft February 21, 2022 18:05
@lxfind
Copy link
Contributor

lxfind commented Feb 21, 2022

A few high-level questions to help me understand the blocks concept:

  1. What would be some of the differences between the block here and the blocks of other blockchains? For instance, in our case, it's OK to have different block order/content among different authorities, right? We don't care about branching because the individual orders guarantee the validity of them already.
  2. Why do we need the digest of the previous block in our block? Is it only used for ordering by the client, or does it have other uses?
  3. Would Batch be a better name, if the concept of block here is very different from that in most block-based blockchains?
  4. In production deployment, do we expect an explorer to create a channel with multiple authorities and try to join the information somehow? If so, will we be creating the channels in AuthorityAggregator that's responsible for aggregating blocks? And in that case, what is the reason for the authorities to form blocks instead of just sending out individual orders and let authority aggregator on the client side to form blocks?

@gdanezis
Copy link
Collaborator Author

Very good questions @lxfind , some going way beyond this PR:

What would be some of the differences between the block here and the blocks of other blockchains? For instance, in our case, it's OK to have different block order/content among different authorities, right? We don't care about branching because the individual orders guarantee the validity of them already.

Yes, these are a per-authority account of what this authority has processed in an order that allows all causal dependencies to be satisfied. The first use case is the explorer. Other use cases are full replicas as well as authority sync, authority checkpointing mechanisms, and end of epoch checkpointing.

Why do we need the digest of the previous block in our block? Is it only used for ordering by the client, or does it have other uses?

I want all authority structures to be self authenticating: this way we can host data as static files / items on a CDN and have their users authenticate them by hash chain & signature. I also want the option for anyone to gossip the structures to for efficient network architectures (sparser). If they are not hashed & signed we clients and other users must access them directly from the authority.

Would Batch be a better name, if the concept of block here is very different from that in most block-based blockchains?

Agreed Block is getting a little overloaded in a blockchain context. I will call them Batch indeed.

In production deployment, do we expect an explorer to create a channel with multiple authorities and try to join the information somehow? If so, will we be creating the channels in AuthorityAggregator that's responsible for aggregating blocks? And in that case, what is the reason for the authorities to form blocks instead of just sending out individual orders and let authority aggregator on the client side to form blocks?

The client we currently have is a light client. And explorer is more likely to run a full replica, that downloads all certs from one or more authorities and re-executes them to reconstruct all state, and likely even keep more of the historical objects around to display. So I am not yet sure how much to re-use authority aggregator. As per the above the blocks are needed to authenticate authority data even if the channel is not authenticated and they are cached / gossiped. They are also needed to hold data that facilitates sync between authorities, or authorities and replicas.

For more context on the discussions around sync see here:
#194

@gdanezis gdanezis force-pushed the explorer-authority-data branch from 36c3e5d to 722776e Compare February 23, 2022 21:12
@gdanezis gdanezis marked this pull request as ready for review February 23, 2022 21:15
@gdanezis gdanezis changed the title [authority sync] Provide an interface for explorers to sync with single authority (WIP) [authority sync] Provide an interface for explorers to sync with single authority Feb 23, 2022
Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for use in a BFT-resistant context?

If so, it seems the current thinking is to let the current authority decide the size of a batch (through a floor), and the frequency of batch (branching point whenever there's an execution), based on local conditions. Shouldn't this instead be decided by global convention?

If not, is it possible to make the trust assumptions on batch listening clear?

sui_core/src/authority/authority_store.rs Outdated Show resolved Hide resolved
sui_core/src/authority/authority_store.rs Outdated Show resolved Hide resolved
sui_core/src/authority/authority_store.rs Outdated Show resolved Hide resolved
sui_core/src/authority_batch.rs Outdated Show resolved Hide resolved
sui_core/src/authority_batch.rs Outdated Show resolved Hide resolved
Comment on lines +229 to +225
while loose_transactions.contains_key(&next_sequence_number) {
let next_item = (next_sequence_number, loose_transactions.remove(&next_sequence_number).unwrap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would, I suspect, be cheaper with a binary heap. But that may be a premature concern at this point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very likely that the bounds on the sequence numbers coming in are very small, and this is just used for low level reordering. A BTree might not be that bad, but maybe having an Array based queue where one can insert in arbitrary order, like a PriorityQueue, would be fine. You mostly want to remove at one end anyways.

Also what happens if the next sequence number has not arrived yet? That doesn't seem to be handled here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what happens if the next sequence number has not arrived yet? That doesn't seem to be handled here.

The loop exists, and we check whether we make a batch / exit, then wait for more. Lets double check this logic is correct.

sui_core/src/authority_batch.rs Show resolved Hide resolved
Comment on lines 48 to 56
/// Either a freshly sequenced transaction hash or a batch
#[derive(Eq, PartialEq, Ord, PartialOrd, Copy, Clone, Hash, Debug, Serialize, Deserialize)]
pub enum UpdateItem {
Transaction((usize, TransactionDigest)),
Batch(AuthorityBatch),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't the Batches supposed to absolve me from the task of following every transaction?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if you care about historical data. But you might want to care about the very low-latency current events. And by you here I mean a number of actors: (1) the authority sync component that needs to know when a transaction was processed by itself without polling the DB; (2) Other authorities / replicas that are replicating this authority in real-time.

Otherwise if the batch time is 5 sec, you need to wait 5 sec to get a notification, then download the batch, then download the transactions, then sync.

Copy link
Contributor

@huitseeker huitseeker Feb 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that makes sense.

How do we let clients with different needs select the updates they want? It sounds like -in channel terms- we want a pub/sub system (rather than an everything-to-everyone broadcast), or -in async terms- an observable pattern?

Now, it sounds like:

  • an actual observable is out of scope for this PR,
  • the per-TX update, while valuable, isn't exactly fitting the block explorer need,

Would it be simpler if for now, if we had a single receiver for all the updates, and then make that receiver dispatch on everything that's a batch to a downstream broadcast channel? Then, in a second pass we can figure out how to let subscribers register their interest for the two types of UpdateItems and that same receiver would dispatch accordingly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the above, and probably this is the logic an event indexing / subscribing system should provide ie #297

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought #297 was for events only, not certificates, but I guess they are kinda the same? (Tx effects = events)

I also agree that a single receiver is better. In fact I would favour a super simple abstraction: Just 1 stream of ordered Tx's only. The receiver then can be layered to transform them into batches as needed. This is just an API design advice to keep the base layer super simple and build batches or another abstraction as an additional layer (so it is a stream as follows:

  1. Unordered Tx's are sent in
  2. A TxReceiver then orders the Tx's, as in the BatchReceiver code in this PR. Sends it on in another channel
  3. The receiver of channel from 2 can then batch it

The above is exactly the observable / streaming transformation pattern that is so useful.

Comment on lines +222 to +221
match item_option {
None => {
make_batch = true;
exit = true;
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this exit supposed to capture a notion of "at quiescence"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Th exit here is very mundane: when the authority state closes (the only thing that should really have a sender for the channel) then the batcher makes the last batch and also closes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many such names for those handles throughout the code base (e.g. the complete of the SpawnedServer). It would be great to use a single one for the concept. The one I've seen most frequently used for this is a "cancellation" channel.

@gdanezis
Copy link
Collaborator Author

gdanezis commented Feb 24, 2022

Many thanks for the feedback @huitseeker -- fixing many of the above.

Is this for use in a BFT-resistant context?

This is a facility for one authority to record, and notify (in this PR Itself) the sequence of transactions executed. It also batches them in continuous blocks so that down the line it can sign the sequence, and provide facilities for doing sync one way of the other based on the block meta-data. This is to support a replica or another authority (or anyone) that wants to download the current or historic transactions processed by this authority, either to replicate it, audit it or what not. The immediate use-case is for the explorer to use.

There is no notion of BFT-resistance, since there is only one authority involved.

@gdanezis
Copy link
Collaborator Author

If so, it seems the current thinking is to let the current authority decide the size of a batch (through a floor), and the frequency of batch (branching point whenever there's an execution), based on local conditions. Shouldn't this instead be decided by global convention?
If not, is it possible to make the trust assumptions on batch listening clear?

I am not sure what you are asking for here?

Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this mechanism is meant to be exposed to people not trusting the authority (e.g. sent through a gossip mechanism, used for sync, , then the total throughput (in bytes per second) of this mechanism should be under a global limit that is not at the discretion of the authority.

That means that all three metrics below need to be bounded by a global configuration constant:

  • size of a batch (in TX digests),
  • number of batches per second (in Hz),

At the moment, the code defines those metrics purely adaptively:

  • the size of a batch is whatever the node has received since the last block1
  • the block production frequency is the first time the node notices the number of pending transactions being bigger than the min_block_size or, it flows past an interval tick, whichever comes first

Footnotes

  1. this alone makes IBLT / sketching approaches moot, or more exactly makes them flow really quickly into an inefficient regime where their use needs to rely on iterative doubling. That's because all those techniques rely on an estimate of the size of the relative difference which is now impossible to compute but globally.

@gdanezis
Copy link
Collaborator Author

gdanezis commented Feb 24, 2022

That means that all three metrics below need to be bounded by a global configuration constant:
size of a batch (in TX digests),
number of batches per second (in Hz),

If we bound globally both size of batch & number of batches per second, this defines a global throughput limit on the whole system. I think this concern here echos the concerns in the comment on issue #194 #194 (comment)
I am still unsure how a facility for a (say byzantine) node to commit to its sequence, and to allow clients to ask for it and read old transactions executed, or ask for a stream of the latest transaction, can lead to any kind of DoS or resource exhaustion. Clients are welcome to drop the connection when they want.

@huitseeker
Copy link
Contributor

huitseeker commented Feb 24, 2022

To answer the somewhat academic point (because within the bounds of this PR, I'm fully satisfied by this component having a big rustdoc that says "This connects to a single authority and hence assumes the subscriber trusts it"), I added a comment on issue 194.

Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the whole this looks good, here are the points that are still on top of my mind:

  • fleshing out a level of indirection w.r.t recipients of the UpdateItems,
  • fleshing out the AuthorityBatch format a bit more,

I left comments on both. Let me know if you want to take on any of the above in the current PR, or if we should just stamp and iterate.

@gdanezis
Copy link
Collaborator Author

These are all fair comments, so lets flesh this out a bit before we merge. This is not on the critical path for the moment.

@velvia
Copy link
Contributor

velvia commented Feb 25, 2022

Hi @gdanezis , this is really interesting work. The functionality of replicas, witnesses getting state from authorities is definitely critical and this is one step towards that. I'm diving into the details, but I have concerns on the overall approach of using locally defined blocks ("batches"). I know many of these comments really dovetail with #194 which I need to catch up on, and I'll add my comments there as well.

The client we currently have is a light client. And explorer is more likely to run a full replica, that downloads all certs from one or more authorities and re-executes them to reconstruct all state, and likely even keep more of the historical objects around to display.

My concern is that an approach where a full client / replica just replicates from one authority which locally defines blocks, is that a single authority does not have the full state. Thus, in order to get an accurate replica of the state, it is necessary for such a replica to hear from all authorities, which is not practical.

While state agreement/sync will eventually solve this problem in that the state of an authority catches up to others, it is still important for any potential witness or replica to understand, well how caught up is this authority, or what is the state of an authority?

The fundamental issue is that "batches" are locally defined, therefore they cannot be easily compared between authorities.

I'm thinking that an essential property of a data model for authorities, one that allows replicas and witnesses to accurately assess on what state can be replicated and what state is left to replicate, is the following:

  1. The API for getting subsets of the overall state should use a measure that is globally comparable across authorities.
  2. Such an API (not needed now, but later) also needs to define, for some range of the measure, what has been agreed to globally.

I favour using event time as a measure that is globally comparable (assuming that each transaction can return a consistent event time). The API can then be to fetch certificates in a range of event times, and with state agreement and sync we can ascertain what ranges of event times have been globally agreed on.

Once we have state agreement, a consistent snapshot of certificates / permanent state does not need to reside with different authorities, they can be offloaded to cloud storage, IPFS etc. and a definitive, single, authoritative copy can be used. This would offload replication entirely from authorities.

Thus, a globally comparable data model is, I believe, an important fundamental towards our end goals of sustainable witness/replicas.

@velvia
Copy link
Contributor

velvia commented Feb 25, 2022

@gdanezis one proposal I would have which dovetails with my comments above:

Suppose, instead of having each "batch" be locally defined, batch intervals were instead defined as globally agreed upon intervals of an event clock (one based on a consistent, single event time per transaction). Then, batches could be easily compared between authorities.

@gdanezis
Copy link
Collaborator Author

Hey @todd-mystenlabs this is the PR I mentioned in our call.

Copy link
Contributor

@velvia velvia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change, as I understand it, basically atomically assigns an increasing sequence number to each transaction, logs (seq no -> tx digest) to a new table, sends them out via a channel, and uses a BatchManager to order and group transactions into batches. The end goal is to have listeners listen to this stream of transactions and batches.

I think the parts that line up well with other ideas @huitseeker and I are thinking are the sequencing of transactions and sending them into a stream/channel.

I think we should hold off on the batching part of it though and think through it a bit more. I'm not sure I get the utility of the batches from an architectural and state/data model perspective. It's easy to add batching later. However the code in BatchManager that resequences the tx based on seq ID is useful so we can send out an ordered stream. Maybe we can just send out an ordered stream for now and leave out the batch send for later?

sui_core/src/authority/authority_store.rs Outdated Show resolved Hide resolved

// Tables used for authority block structure
/// A sequence on all executed certificates and effects.
pub executed_sequence: DBMap<u64, TransactionDigest>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to typedef the u64 as like TxSequenceNumber or something like that, so in case we need to upgrade it (say to u128) or use a different struct it would be easier

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call!

pub executed_sequence: DBMap<u64, TransactionDigest>,

/// A sequence of blocks indexing into the sequence of executed transactions.
pub batches: DBMap<u64, AuthorityBatch>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the u64 here a batch number? Would be good to comment it.
I think it might be more useful, if the tx sequence number is the universal measure used within an authority, to just use the starting Tx sequence number instead of a separate batch number.
(Basically the starting event time of the batch)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep now I used a typedef for it

/// A sequence of blocks indexing into the sequence of executed transactions.
pub batches: DBMap<u64, AuthorityBatch>,

/// The size of the executed transactions sequence, used to timestamp the next
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be the size of the sequence. Over time this would get really huge and we'll need to lop off older ones for sure. I think better to just say it's the next sequence number.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we process 2^64 transactions we are winning big, and this will be known as the SUI-Seq2^64 problem, and the whole world will devote resources to solving it :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main point is comment is more accurate if it says it is the next sequence number to be used. Size won't be accurate if there are gaps.

sui_core/src/authority/authority_store.rs Outdated Show resolved Hide resolved
) -> Result<tokio::task::JoinHandle<()>, SuiError> {
let last_batch = self.init_from_database().await?;

let join_handle = tokio::spawn(async move {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a forever running thing? What's the point of waiting for the handle? Maybe just use a dedicated thread?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spawn creates a separate talsk which indeed now is independently cooperatively scheduled by the tokio runtime. This is our concurrent framework that allows us to handle IO efficiently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what a task is, but isn't the task itself a forever loop? So it is just occupying one thread forever? Or does it periodically reschedule itself? That would require the loop to terminate periodically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning the join handle does not put a burden on the caller to join the handle. It affords the caller the opportunity to poll the task to react to a panic or completion ... or to drop the handle, detach the task, and let the thing run.

Here's the standard issue tokio example of the later behavior: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=80eeb1d8944285bb297caab95b470faf


if transactions.len() as u64 != total_seq - last_batch.total_size {
// NOTE: The database is corrupt, namely we have a higher maximum transaction sequence
// than number of items, which means there is a hole in the sequence. This can happen
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "number of items since the end of the last batch"

// Delete all old transactions
let db_batch = db_batch.delete_batch(
&self.db.executed_sequence,
transactions.iter().map(|(k, _)| *k),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an iterator being passed in?
This is going to blow up and fail when the number of transactions grows into the millions, which it will very quickly.
We need to use a range API here, to delete from last_batch.total_size (which is not named correctly btw, it should be like end_seq_no + 1 or something) to total_seq

Copy link
Collaborator Author

@gdanezis gdanezis Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable transactions here is the sequence of transactions that are not in a batch when we restart. This should be a small number compared with the total sequence. I think this is ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the frequency of the batches?

.zip(transactions.into_iter().map(|(_, v)| v))
.collect();

let db_batch = db_batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get what deleting the transactions and rewriting them solves. The transactions were read out of the DB, they are deleted and re-written, and the contents are not going to be any different than the original transactions which were read. If there was a hole, there will still be a hole now. Rewriting the next seq number might fix a discrepancy there but I don't get what this does. It seems if we really wanted to recreate the transactions list since last batch, we should reconstitute them from a list of transactions instead (but for that we need a sequence number in the transactions themselves.... which is where event time comes in. :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote the following comment above it, since others will also have the same question:

                // Update transactions
                //
                // Here we re-write the transactions in the same order but using a new sequential 
                // sequence number (the range) to ensure the sequence of transactions contains no
                // gaps in the sequence.

sui_core/src/authority_batch.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@lanvidr lanvidr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To echo what others have already mentioned, when we use the reverse bloom filter for comparing batches from different authorities, I imagine it will be useful to have a constant batch size among all authorities so that we can match up batches for comparison that happened at similar time intervals, thus ensuring the conditions of mostly similar and few different entries for batches more further back in time, that have already perhaps been partially synced, and maximizing the benefits that those data structures promise. If authority A has a batch size of 100 and authority B has a batch size of 200, when they compare their sets, it will always appear for B that it has items that are unknown by A, although A may have those items in a different batch.

// Delete all old transactions
let db_batch = db_batch.delete_batch(
&self.db.executed_sequence,
transactions.iter().map(|(k, _)| *k),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of issues arise if there is a hole in the sequence? Is it possible we shift the sequence instead of doing deletions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question, may be we can do away with the need for sequential numbers, and instead tolerate gaps.


let db_batch = self.db.executed_sequence.batch();

// Delete all old transactions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By old transactions, do you mean the most recent transactions that did not yet get added to a batch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is confusing, not it reads:

// Delete the transactions that we read above, that are out of a batch and we are about
// to re-sequence with new numbers.

@gdanezis
Copy link
Collaborator Author

gdanezis commented Mar 1, 2022

I have now dropped the comment below from the description of batches, since it confuses things between priorities A and B.

    TODO: Add the following information:
    - Authenticator of previous block (digest)
    - Authenticator of this block header + contents (digest)
    - Signature on block + authenticators
    - Structures to facilitate sync, eg. IBLT or Merkle Tree.
    - Maybe: a timestamp (wall clock time)?
 

@gdanezis gdanezis force-pushed the explorer-authority-data branch from 8eb1fc9 to dd4f5b5 Compare March 1, 2022 15:09
sui_core/src/authority_batch.rs Outdated Show resolved Hide resolved
sui_core/src/authority_batch.rs Outdated Show resolved Hide resolved
sui_core/src/authority.rs Outdated Show resolved Hide resolved
sui_core/tests/format.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think there are still simplifications to be found in there, the main one is @laura-makdah 's comment on relaxing the contiguity requirement for the sequence numbers (especially now that the AuthorityBatch now emits counts that we can use to sanity-check what happened).

I tried to point out an (arguable) one on the BatchManager's constructor, where I think we can hide 1/2 of the BroadcastPair.

I like the whole PR, it LGTM, and I'll be happy with whatever simplifications you find the time to do from where it is now.

I'm looking into the CI unit test issue you're having (which I suspect is down to a file encoding concern now 😓 ).

Comment on lines +90 to +101
) -> (BatchSender, BatchManager, BroadcastPair) {
let (tx_send, tx_recv) = channel(capacity);
let (tx_broadcast, rx_broadcast) = tokio::sync::broadcast::channel(capacity);
let sender = BatchSender { tx_send };
let manager = BatchManager {
tx_recv,
tx_broadcast: tx_broadcast.clone(),
db,
};

(sender, manager, (tx_broadcast, rx_broadcast))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to have this function return

-> (BatchSender, BatchManager, tokio::sync::broadcast::Receiver<UpdateItem>)

instead, and make BroadcastPair private?

If not, Is there another need the caller might have for the broadcast sender, that I'm not envisioning here?

If so, could that need be better served by having a method on the BatchManager instance allowing the caller to get a copy of the sender after initialization?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the subtlety: the Receiver is not Clone. Instead the Sender has a function called .subscribe() that gives you new receivers. So we sadly need the sender in order to instantiate receivers that we will use in tasks serving updates to clients.

) -> Result<tokio::task::JoinHandle<()>, SuiError> {
let last_batch = self.init_from_database().await?;

let join_handle = tokio::spawn(async move {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning the join handle does not put a burden on the caller to join the handle. It affords the caller the opportunity to poll the task to react to a panic or completion ... or to drop the handle, detach the task, and let the thing run.

Here's the standard issue tokio example of the later behavior: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=80eeb1d8944285bb297caab95b470faf

Comment on lines 330 to 303
pub struct AuthorityBatch {
// TODO: Add epoch
/// The total number of items executed by this authority.
total_size: u64,

/// The number of items in the previous block.
previous_total_size: u64,

/// The digest of the previous block, if there is one
previous_digest: Option<BatchDigest>,

// The digest of all transactions digests in this batch
transactions_digest: [u8; 32],
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know what? I like it. I think combined with @laura-makdah 's idea of not being too strict with sequence gaps, this AuthorityBatch is simple and makes sense as a periodic unit of accounting for monitoring the flow of authority transactions.

Copy link
Contributor

@velvia velvia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor places where u64 -> TxSequenceNumber? Otherwise looks fine to me

@@ -708,9 +741,10 @@ impl AuthorityState {
async fn update_state(
&self,
temporary_store: AuthorityTemporaryStore,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extraneous line can be removed

/// A sequence of blocks indexing into the sequence of executed transactions.
pub batches: DBMap<u64, AuthorityBatch>,

/// The size of the executed transactions sequence, used to timestamp the next
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main point is comment is more accurate if it says it is the next sequence number to be used. Size won't be accurate if there are gaps.

@@ -330,7 +369,7 @@ impl AuthorityStore {
temporary_store: AuthorityTemporaryStore,
certificate: CertifiedTransaction,
signed_effects: SignedTransactionEffects,
) -> Result<TransactionInfoResponse, SuiError> {
) -> Result<(u64, TransactionInfoResponse), SuiError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a TxSequenceNumber?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

signed_effects: Some(signed_effects),
})
// Safe to unwrap since the "true" flag ensures we get a sequence value back.
let seq: u64 = self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, TxSequenceNumber?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@huitseeker
Copy link
Contributor

huitseeker commented Mar 3, 2022

@gdanezis I'm 80% sure i've fixed your issue on the format test with the first commit of #626, now merged. If rebasing on main does not fix your test failure, just #[ignore] the test, open an issue & assign to me, I'll work on it until it performs reliably cross-platform.

@gdanezis gdanezis force-pushed the explorer-authority-data branch from 6429ee1 to 3c94310 Compare March 3, 2022 12:36
@gdanezis
Copy link
Collaborator Author

gdanezis commented Mar 3, 2022

Many thanks again all on the comments. I have now simplified the code to not require complete sequence numbers (gaps allowed if we crash) and simplified the respective code as @laura-makdah suggested.

@gdanezis gdanezis merged commit 793e260 into main Mar 3, 2022
@gdanezis gdanezis deleted the explorer-authority-data branch March 3, 2022 14:00
@lanvidr
Copy link
Contributor

lanvidr commented Mar 3, 2022

Nice! :)


if !transactions.is_empty() {
// Make a new batch, to put the old transactions not in a batch in.
let last_signed_batch = SignedBatch::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a size limit on one batch? What if there are a lot of unbatched transactions?
A gigantic batch could cause problems down the line when we are requesting batches/transactions of a range.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@lxfind lxfind Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function runs before that, right? It's batching all transactions since the last batch from db into one batch first, before starting the service.

&*secret,
authority_name,
);
self.db.batches.insert(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if, some new transactions just got generated in-between the above db read at line 155 and here?
Would these transactions be forever missing from any batch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batch is generated (besides crashes) once we got a complete sequence of transactions from the previous batch to the end of this one. Any newer transactions should have a higher txSequenceNumber, they will be sent over the channel, and inserted in the next batch.

I would be thankful if you check this is the case. I have an out of order test as well to check that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we haven't started listening/receiving on the channel yet at this point.
Maybe I misunderstood the intention of the function init_from_database?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants