-
Notifications
You must be signed in to change notification settings - Fork 699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PVF: Compromised artifact file integrity can lead to disputes #677
Comments
For 30-40mb artifacts (and they can get much larger), 6866 MiB/s would take about 4-6 ms. This is negligible compared to preparation, which is on the order of seconds, but not compared to execution which is on the order of 10 ms. Considering that such disk corruptions are quite rare, this hardly seems worth checking before every execution. So probably the most realistic solution here is to hash after preparation, but only check it on node restart and not before every execution. @s0me0ne-unkn0wn suggested keeping prepared artifacts in RAM only, which would mitigate the issue of disk/FS corruptions. At first blush it doesn't seem feasible, especially with on-demand coming, but it could be worth considering... |
If you're up to some experiment you can try to compress them with |
@s0me0ne-unkn0wn Hmm that would add decompression overhead to each execution, definitely worth considering though. |
Not sure I follow. You mean, hash a prepared artifact right after the preparation, include its hash into its name, and check the hash before execution? But how do you know the name of the artifact you're going to open for execution, then? Sounds like a chicken and egg situation. |
I was thinking on node startup, we just list all the artifacts in the cache path, check consistency with the hash, and load them into the artifacts table? As mentioned above, hashing before each execution would be too much of a performance hit. |
Aha, thanks for the elaboration, that does make sense. But that would protect from accidental artifact corruption but not from malicious action. On the other hand, I don't see any easy way to protect against malicious artifacts while still keeping them on disk anyway. (Sign them with the validator key maybe? But that would require some effort) |
How big are artifacts? A few megabytes? blake2 claims 3.08 cycles per byte, so 2/1000th of a second? You;d need to load the PVF only once of course, but that'd avoid a race condition too. It's otoh just one of many reasons validataors need to avoid their machines being compromised, so whatever. |
Please help me understand. What would happen if someone deliberately swaps the cache after preparation? And what if they do that in a large scale? Is there any incentive of doing that? |
@burdges I did a quick analysis in #677 (comment). Using the published numbers of blake3, there would be a ~5ms performance hit which I'm not sure is acceptable in PVF execution. What do you mean by a race condition? Loading all the artifacts once into memory is an idea we had, but probably not feasible with on-demand.
Yeah, though a malicious attack in this way seems really unlikely now that the PVF processes have filesystem sandboxing. They are only able to access the artifacts they need to. My main concern is avoiding unneeded disputes due to a hardware failure (corrupted disk) - but if that happens while the node is running, there's probably not much we can do. OTOH, checking the hashes on node startup is acceptable and may bring some small benefit for no real cost.
@eagr The validator would vote wrongly, so I don't think there's any incentive to doing that deliberately or a possible attack vector there. They would just get slashed. |
These cause invalid disputes, which disables the validator. We need not punish those too heavily, unless correlated across validators, so maybe the solution is just to take this into consideration in the slashing level? |
@burdges Related to that, another idea we had with @eskimor is to only hash the file if a candidate is found to be invalid, after the fact. Reason being that this is a rare case so this additional check shouldn't affect performance too much. If an artifact was corrupted it is very unlikely to be found valid, so we shouldn't need to check file integrity in that case. |
@mrcnski sounds like a really good idea! |
It's actually more important that the validator be hard to hack, but yeah sure. |
I guess what's left is to check integrity on execution errors and act accordingly? |
Yes, I filed a follow-up here: #2399 (comment) Please coordinate with @jpserrat as he commented there, I'm not sure if he started on this yet or not! |
Yes blake3 is fast, but it's speed beyond blake2 comes from merkle trees being more parallelizable, including both SIMD and multiple threads. The SIMD is a clear win, but polkadot should be using those other CPU cores, although blake3 improves cache locallity maybe. Anyways, we might still prefer to do this check only when some validation fails. |
ISSUE
Overview
If we try to execute a corrupt PVF artifact it will likely lead to a dispute. See the comment below for an example of this.
A simple mitigation is to just put the hash of the file contents in the filename. Artifacts are not that big, on the order of several dozen MB (I think?). If we use something like https://github.com/BLAKE3-team/BLAKE3, which claims 6866 MiB/s throughput, then the time spent hashing would barely register.
Original comment
As far as I understand, the main concern about not clearing the artifact cache is:
prepare()
and they are not just some random files put there by someone.Not that we can guarantee nobody will overwrite them during the node run time. I believe the cache pruning was implemented as a measure to mitigate those unwanted outcomes. But I don't have a strong opinion if it makes any sense. If someone wants to screw things up, he will.
_Originally posted by @s0me0ne-unkn0wn in #685
The text was updated successfully, but these errors were encountered: