-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite dirty files discovery #45
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems correct to me, after a thorough read. Love the approach and solution, and it does seem accurate—we should be marking as dirty all files which import, either directly or indirectly, a dirty file, and I think we've seen this out in the wild, with cleaning the cache being the solution.
I reviewed the code and it seems good, just some small nits, but I'll leave approval & merging to @mattsse
|
||
// This is a repro and a regression test for https://github.com/foundry-rs/compilers/pull/45 | ||
#[test] | ||
fn dirty_files_discovery() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if we can have another test or two, that explicitly check for proper recompilation of only dirty files? And maybe also a test like this, but it's all libraries but the top contract.
Co-authored-by: Enrique <hi@enriqueortiz.dev>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great!
makes sense!
left a suggestion re recursion
/// Reverse of `edges`. That is, `rev_edges[0]` is the set of incoming edges for `nodes[0]`. | ||
rev_edges: Vec<Vec<usize>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, that makes sense
for file in self.edges.importers(file) { | ||
// If file is marked as dirty we either have already visited it or it was marked as | ||
// dirty initially and will be visited at some point later. | ||
if !dirty_files.contains(file) { | ||
dirty_files.insert(file.to_path_buf()); | ||
self.populate_dirty_files(file, dirty_files); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this recursion will cause stack issues, but I'd rather be more cautious here and solve this iteratively if possible.
I guess this could be solved with a stack: VecDeque
or similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous solution was also recursive fwiw, but I do prefer an iterative approach to reduce any stack usage and so we can fail more gracefully in the weird case something happens during the DFS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the previous solution was also recursion without any issues, so this should be fine
wdyt @klkvr ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously recursion couldn't approach depth of more than 2 the limit of recursion depth is 128, right? this can only be reached if there is an import path with 128 files in a project, which shouldn't happen for most of the codebases |
Motivation
Currently, dirty files discovery is not going deep enough into the graph, thus it often fails to discover all files that have dirty imports of some depth > 1. In worst case we may only recompile directly changed files and files directly importing them.
This PR changes this, so all files that have dirty files at some depth of their imports are marked as dirty.
This should solve a lot of weird issues with verification because it is very simple to occur such bug and it can result in, for example, script having and deploying outdated bytecode of contract which does not match the actual artifact.
Solution
If we would consider that problem as a graph theory problem, than we are dealing with oriented graph of
N
nodes representing files andM
oriented edges between them representing imports. Some ofN
nodes are marked as dirty initially and we want to mark as dirty all nodes from which a path exists to any of the dirty nodes.Before the fix, the incorrect solution complexity was around
O(M)
, so I didn't want to increase it too much.One possible solution without much changes to code could be to run deeper recursion from each node and try to reach dirty node from it. However, such solution would have a complexity of
O(N*M)
in worst case.The solution I've implemented is working by precomputing
rev_edges
to store reversed edges, thus simplifying the problem to just finding all nodes that are reachable from the dirty ones in the reversed graph. The solution works by running DFS from all nodes that are known to be dirty in the beginning and keeping track of visited (dirty) nodes, to avoid visiting the same node twice. The complexity of such approach isO(M)
in the worst case because we never use the same edge twice.