Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validator can't handle alive accounts spanning just hours-worth slots #8931

Closed
ryoqun opened this issue Mar 18, 2020 · 3 comments
Closed

Validator can't handle alive accounts spanning just hours-worth slots #8931

ryoqun opened this issue Mar 18, 2020 · 3 comments
Labels
security Pull requests that address a security vulnerability
Milestone

Comments

@ryoqun
Copy link
Member

ryoqun commented Mar 18, 2020

Problem

Currently, 1 created-and-forgot alive account for each slot consumes 1 AccountsDB/AppendVec/mmap indefinitely. This means we can handle only 65530 (default /proc/sys/vm/max_map_count) slots.
So, just send small rent-exempt (it needs to be?) lamports to each of around 70000 random accounts for each new slot (~10 hour);

Then, the cluster dies unless specifically configured (this isn't advised in the docs), meaning this is DoS vulnerability.

Previously, we encountered this error: #5432
But, it wasn't regarded important at the time due to being caused by unrooted banks.

But, as I can demonstrate at the unit test code and integration test (#8932); this threat is real.

Also, ad-hoc test is running here: https://metrics.solana.com:3000/d/V5LPmn_Zk/testnet-monitor-edge-ryoqun?orgId=2&from=now-3h&to=now&var-datasource=Solana%20Metrics%20(read-only)&var-testnet=testnet-dev-ryoqun&var-hostid=All

Proposed Solution

So, increasing /proc/sys/vm/max_map_count is one of mitigation. And indeed, various other mmap-based famous middlewares do so (see refs).

Nevertheless, to mitigate days-spanning attacks, I think we just need to introduce LRU eviction of old AppendVecs or similar other mechanism to avoid remote-controllable unbounded mmap usage.
Further, background old slot aggregation service can also be conceivable. But dunno the added complexity justifies for such cases in addition of LRU eviction.

refs

128000: MongoDB: https://docs.mongodb.com/manual/administration/production-checklist-operations/
262144: ElasticSearch: https://www.elastic.co/guide/en/elasticsearch/reference/master/docker.html#docker-prod-prerequisites
524288: Varnish: https://image.slidesharecdn.com/fastlyvarnishnycmeetupfinal-140804141851-phpapp02/95/fastly-inaugural-nyc-varnish-meetup-25-638.jpg?cb=1407162101

@ryoqun ryoqun added the security Pull requests that address a security vulnerability label Mar 18, 2020
@mvines mvines added this to the v1.1.0 milestone Mar 18, 2020
@sakridge
Copy link
Member

We could add to docs and sys-tuner in the short-term.

@sakridge
Copy link
Member

sakridge commented May 15, 2020

@ryoqun Want to prioritize this? and maybe increase rent as a solution.

@ryoqun
Copy link
Member Author

ryoqun commented May 18, 2020

@sakridge Yeah I think we want to do so. :)

So how about closing this issue (because this is fixed by #8940 and #9527) and create another issue titled Increase rent or like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
security Pull requests that address a security vulnerability
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants