Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite dirty files discovery #45

Merged
merged 4 commits into from
Jan 19, 2024

Conversation

klkvr
Copy link
Member

@klkvr klkvr commented Jan 19, 2024

Motivation

Currently, dirty files discovery is not going deep enough into the graph, thus it often fails to discover all files that have dirty imports of some depth > 1. In worst case we may only recompile directly changed files and files directly importing them.

This PR changes this, so all files that have dirty files at some depth of their imports are marked as dirty.

This should solve a lot of weird issues with verification because it is very simple to occur such bug and it can result in, for example, script having and deploying outdated bytecode of contract which does not match the actual artifact.

Solution

If we would consider that problem as a graph theory problem, than we are dealing with oriented graph of N nodes representing files and M oriented edges between them representing imports. Some of N nodes are marked as dirty initially and we want to mark as dirty all nodes from which a path exists to any of the dirty nodes.

Before the fix, the incorrect solution complexity was around O(M), so I didn't want to increase it too much.

One possible solution without much changes to code could be to run deeper recursion from each node and try to reach dirty node from it. However, such solution would have a complexity of O(N*M) in worst case.

The solution I've implemented is working by precomputing rev_edges to store reversed edges, thus simplifying the problem to just finding all nodes that are reachable from the dirty ones in the reversed graph. The solution works by running DFS from all nodes that are known to be dirty in the beginning and keeping track of visited (dirty) nodes, to avoid visiting the same node twice. The complexity of such approach is O(M) in the worst case because we never use the same edge twice.

@DaniPopes DaniPopes requested a review from mattsse January 19, 2024 10:57
Copy link
Member

@Evalir Evalir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems correct to me, after a thorough read. Love the approach and solution, and it does seem accurate—we should be marking as dirty all files which import, either directly or indirectly, a dirty file, and I think we've seen this out in the wild, with cleaning the cache being the solution.

I reviewed the code and it seems good, just some small nits, but I'll leave approval & merging to @mattsse


// This is a repro and a regression test for https://github.com/foundry-rs/compilers/pull/45
#[test]
fn dirty_files_discovery() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we can have another test or two, that explicitly check for proper recompilation of only dirty files? And maybe also a test like this, but it's all libraries but the top contract.

src/cache.rs Outdated Show resolved Hide resolved
Co-authored-by: Enrique <hi@enriqueortiz.dev>
Copy link
Member

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great!

makes sense!
left a suggestion re recursion

Comment on lines +75 to +76
/// Reverse of `edges`. That is, `rev_edges[0]` is the set of incoming edges for `nodes[0]`.
rev_edges: Vec<Vec<usize>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, that makes sense

Comment on lines +751 to +756
for file in self.edges.importers(file) {
// If file is marked as dirty we either have already visited it or it was marked as
// dirty initially and will be visited at some point later.
if !dirty_files.contains(file) {
dirty_files.insert(file.to_path_buf());
self.populate_dirty_files(file, dirty_files);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this recursion will cause stack issues, but I'd rather be more cautious here and solve this iteratively if possible.

I guess this could be solved with a stack: VecDeque or similar

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous solution was also recursive fwiw, but I do prefer an iterative approach to reduce any stack usage and so we can fail more gracefully in the weird case something happens during the DFS.

Copy link
Member

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the previous solution was also recursion without any issues, so this should be fine

wdyt @klkvr ?

Copy link
Member

@Evalir Evalir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this lgtm. wdyt about a foundry companion PR to run CI over this and check with other repos @mattsse / @klkvr ?

@klkvr
Copy link
Member Author

klkvr commented Jan 19, 2024

previously recursion couldn't approach depth of more than 2

the limit of recursion depth is 128, right? this can only be reached if there is an import path with 128 files in a project, which shouldn't happen for most of the codebases

@mattsse mattsse merged commit 76bacac into foundry-rs:main Jan 19, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants