Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full chain archive sync at protocol level #3092

Open
johndavies24 opened this issue Oct 8, 2019 · 10 comments
Open

Full chain archive sync at protocol level #3092

johndavies24 opened this issue Oct 8, 2019 · 10 comments

Comments

@johndavies24
Copy link

The fact that there is no trustless, permissionless method of full blockchain archive sync creates an unfair data disparity of valuable information. At the very least, this creates a barrier if entry to building blockchain tools. At the very worst, this data has value that can be used to the benefit or detriment of users.

Describe the solution you'd like
Archive nodes should have to opt-out of sharing rather than opt-in. As in only archive nodes not participating in outgoing communications should be able to avoid sharing their blocks. Maybe refusal to share archive blocks should result in protocol level blacklisting of the node and all archive syncs should be reconciled to the largest dataset so all archive nodes have the same data.

Describe alternatives you've considered
Until there is time/bandwidth to code this a full archive snapshot could be hosted by grin or grin community members.

@antiochp
Copy link
Member

antiochp commented Oct 9, 2019

Thanks for opening this issue.

Archive nodes should have to opt-out of sharing rather than opt-in.

Agreed. 👍 When we were discussing opt-in/opt-out I was thinking in terms of archive/non-archive. I was assuming that archive nodes would opt-in by default to sharing historical blocks. But you would need to explicitly opt-in to being an archive node in the first place.

all archive syncs should be reconciled to the largest dataset so all archive nodes have the same data.

Agreed. I think this will naturally happen if we make the sync robust enough. i.e. You keep asking other archive nodes for missing blocks and eventually they propagate through the network. If I receive a block that I was originally missing I can now make this available to others etc.

Maybe refusal to share archive blocks should result in protocol level blacklisting of the node

We may want to do this but it may be hard to do reliably until a majority of archive nodes have reconciled to the largest data set. Otherwise we cannot reliably differentiate between refusal to provide these blocks and inability to provide them as they are missing.

We don't ban regular nodes for refusing to provide blocks as this may be for a variety of reasons. We only ban currently if peers provide "bad" data, i.e. invalid blocks.

@johndavies24
Copy link
Author

I'm not exactly sure how nodes store data during any of the dandelion phases. But one of the things I meant by largest dataset is to synchronize data that a node might have captured prior to any tx aggregation would have to share that information. But I really don't understand how it works or if this idea is valid at all

@antiochp
Copy link
Member

You want to consider including tx data in scope for "archive"?
Interesting - definitely needs some more thought around this.

I think it would be technically possible to do this for "fluffed" (post Dandelion) broadcast transactions.

I suspect we would not want to consider Dandelion stem phase txs "in scope" here - it would defeat one of the aims of Dandelion. Only a limited subset of nodes (on the stem path) see a particular unaggregated tx and that is by design.

If archive nodes were to start archiving these and sharing this information we would quickly find that non-archive nodes would simply refuse to relay to archive nodes when stemming transactions.
Archive nodes would be excluded from participating in Dandelion and would never see these txs.

@DavidBurkett
Copy link
Contributor

I doubt anyone other than chainalysis and the NSA would even still have the original tx boundaries. Likely more hassle than it's worth. I like the idea of providing archive sync because it allows anyone to run a block explorer, validate the history, etc. But I don't think we should go out of our way to leak more privacy than a typical block explorer would.

@johndavies24
Copy link
Author

Only a limited subset of nodes (on the stem path) see a particular unaggregated tx and that is by design.

Do these nodes store this data? Obviously they can with custom code but is it possible with the software offered by grin github? If they dont then I dont want this information to become stored in future releases.

I guess this phase is prior to making it into a block so it shouldnt create a situation where one archive node has different data than another archive node.

If archive nodes were to start archiving these and sharing this information we would quickly find that non-archive nodes would simply refuse to relay to archive nodes when stemming transactions.
Archive nodes would be excluded from participating in Dandelion and would never see these txs.

This would be a good idea if any of this data is stored without custom code/scripts. We cant really stop people from finding non-supported solutions for acquiring more data and it's pointless to try because "non-archive" nodes could just build custom archive solutions from the data their nodes sees.

My issue proposal is only within the scope of "officially" supported data storage for archive nodes.

@antiochp
Copy link
Member

Do these nodes store this data?

Ok no - official nodes only maintain the txpool (and stempool) in memory. So if you only want to keep things in scope for "archive" that are stored (presumably on disk) then we only really care about block data.

@MCM-Mike
Copy link
Contributor

I am also in favor or having the possibility to run an archive node as @DavidBurkett said "I like the idea of providing archive sync because it allows anyone to run a block explorer, validate the history, etc."

We had the idea of posting monthly snap-shots but would make more sense have it build in to opt-in for a full archive-node.

@JustAResearcher
Copy link

I am also in favor or having the possibility to run an archive node as @DavidBurkett said "I like the idea of providing archive sync because it allows anyone to run a block explorer, validate the history, etc."

We had the idea of posting monthly snap-shots but would make more sense have it build in to opt-in for a full archive-node.

Agreed.

@MCM-Mike
Copy link
Contributor

A general decision has been made by the dev meeting today: mimblewimble/grin-pm#248

Out of scope

9 P? Node Full chain archive sync at protocol level #3092 No dev taking task for 4.0.0. Not consensus breaking, can be done in future release.

Lets try it again at next release

.
General question:
As its not implemented in Grin v4.0.0, is it even technically possible to sync older blocks once its being implemented?

@lehnberg
Copy link
Collaborator

Lets try it again at next release

In the meanwhile, a strong way to improve the chances of getting this implemented is to begin an RFC writing process, with motivation, alternatives, high level requirements, pros and cons, and try to begin building consensus around a particular approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants