Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend VFS to watch files #433

Closed
matklad opened this issue Jan 5, 2019 · 17 comments
Closed

Extend VFS to watch files #433

matklad opened this issue Jan 5, 2019 · 17 comments
Labels
E-hard fun A technically challenging issue with high impact

Comments

@matklad
Copy link
Member

matklad commented Jan 5, 2019

Currently, VFS reads files once at startup, and then only uses text-editor notifications to change the state of files.

This works surprisingly well in practice, but we do miss some file system changes (when, for example, switching branches in git). We should add file-system watching to vfs.

This is tricky in itself, and it is not made easier by the fact that the main filesystem-watching library, https://github.com/passcod/notify/, is transitioning from v4 (which is frozen) to v5 (which is not released yet). v5 does not seem like it would be done in the nearish term, so we should perhaps rely on v4 for the time being.

The code for vfs is here:

https://github.com/rust-analyzer/rust-analyzer/blob/53ffa2a030be52ad6ddcf3c038f08c337894fbe7/crates/ra_vfs/src/lib.rs

@matklad matklad added E-hard fun A technically challenging issue with high impact labels Jan 5, 2019
@flodiebold
Copy link
Member

Shouldn't we use the LSP capabilities for that? The LSP spec says

Servers are allowed to run their own file watching mechanism and not rely on clients to provide file events. However this is not recommended due to the following reasons:

@vemoo
Copy link
Contributor

vemoo commented Jan 5, 2019

Shouldn't we use the LSP capabilities for that? The LSP spec says

Servers are allowed to run their own file watching mechanism and not rely on clients to provide file events. However this is not recommended due to the following reasons:

It looks like it wouId be easier that way. I can try to implement it.

It looks like it's just adding a new block on https://github.com/rust-analyzer/rust-analyzer/blob/44e42edc8b7729716e0b410ced2e96c33573e2c0/crates/ra_lsp_server/src/main_loop.rs#L315-L396

@matklad
Copy link
Member Author

matklad commented Jan 5, 2019

I've saw that, and I explicitly disagree :-)

I think we should primarily rely on our own file-watching implementation, for the following reasons:

  • avoiding vendor lock-in: I think not every client will implement file-watching, and I think no two clients will have exactly same watching semantics.

  • performance: we need to watch ~/.cargo/registry because that's where our dependencies are, but the client need not be aware of those folders at all. Threading all deps via (potentially slow) client could result in worse performance than handling what we need ourselves.

  • reusability: I try hard not to tie rust-analyzer to a particular protocol implementation. If we have our own file-watching service, it would be easier to extract a self-contained component out of rust-analyzer.

I think that "multiple servers" should not apply for rust-analyzer: a single instance of the server should handle many separate projects simultaneously (it makes sense, because deps & stdlib will most likely be shared anyway).

I fully understand that creating a correct, performant & cross-platform file-watching service is an enormously hard task, but that's actually the perfect problem to solve in Rust :-) I think (in the long-run) it makes sense for VS Code to switch to Rust for the implementation of this core component :)

In the short run, a "mostly correct" implementation will actually give us 80% benefits I guess?

We could also start with an LSP-based impl and switch to native one later, but I'd rather avoid that route, because it will remove the initiative to write the native impl.

@vemoo
Copy link
Contributor

vemoo commented Jan 5, 2019

After reading your reasons I agree.

Also there's at least another issue https://code.visualstudio.com/api/references/vscode-api#2261:

Note that only files within the current workspace folders can be watched.

so it wouldn't be possible to watch ~/.cargo/registry with vscode.

But since it didn't look too difficult I implemented it here https://github.com/vemoo/rust-analyzer/tree/lsp-file-system.
Let me know if it's worth merging.

@matklad
Copy link
Member Author

matklad commented Jan 5, 2019

@vemoo I'd still prefer to start with notify: the bare-bones implementation shouldn't be too hard either I think?

@vemoo
Copy link
Contributor

vemoo commented Jan 5, 2019

Ok, I will start working on a bare-bones impl with notify

@vemoo
Copy link
Contributor

vemoo commented Jan 6, 2019

It's mostly implemented here https://github.com/vemoo/rust-analyzer/tree/vfs-watch.

There's still something that's not quite working.
https://github.com/vemoo/rust-analyzer/blob/b4f06b05ab51a1cd6a2e883fe0d5c0b63cd8a3b6/crates/ra_vfs/src/watcher.rs#L79-L90
I thought that droping the watcher would cause the Reciever to recieve with error and terminate the thread, but the thread never terminates.

I will investigate further, maybe I'm missing something.

@matklad
Copy link
Member Author

matklad commented Jan 6, 2019

I've created a Watcher struct similar to how thread_worker is structured but with a mpsc::channel as input and a crossbeam_channel as output

Interesting! I wonder if it is possible to make both scanner and watcher to use the same channel to output events. That is, I originally envisioned that TaskResult becomes an enum, and that all watching is handled entirely inside vfs. The overlay functions are not for watched files, they are specifically for "file contents in editor differs from contents on disk". Sorry for an obscure reply, gotta run now!

@vemoo
Copy link
Contributor

vemoo commented Jan 7, 2019

Interesting! I wonder if it is possible to make both scanner and watcher to use the same channel to output events. That is, I originally envisioned that TaskResult becomes an enum, and that all watching is handled entirely inside vfs.

I will try to implement that.

The overlay functions are not for watched files, they are specifically for "file contents in editor differs from contents on disk".

I figured something like that, because of how remove_file_overlay was implemented, but I wasn't sure how the watcher events should interact with that. Now I have some questions:

  • After thinking about it I see another flaw with the current watcher impl. All the blocking io operations happen on the main loop. To fix that WatcherChange should include the file contents. But that means that the watcher must know about ignored files to avoid reading them, maybe has_rs_extension is enough for now?

  • Should VfsFileData include something like overlay_text: Option<Arc<String>>? That way when a watcher event happens we won't overwrite it. Also we would avoid the blocking io operation on remove_file_overlay that's happening on the main loop now

  • What should happen if there's a WatcherChange event but there's an overlay, should we update the VfsFileData but not generate a VfsChange?

@vemoo
Copy link
Contributor

vemoo commented Jan 7, 2019

I've started looking at what's needed to use the same channel for scanner and watcher events and I think I have an idea of how it can be done.

  • Task also needs to become an enum, with a variant for the current scanner task and a new variant for the watcher events. Something like:
enum Task {
  AddRoot(...),
  WatcherChange(...)
}
  • TaskResult becomes an enum and the variant for the watcher includes the file contents, that way all the blocking io is performed on the io thread worker

This way we can send Task::AddRoot or Task::WatcherChange to the same channel and Vfs::handle_task would handle both cases

@matklad
Copy link
Member Author

matklad commented Jan 7, 2019

But that means that the watcher must know about ignored files to avoid reading them, maybe has_rs_extension is enough for now?

The Task includes pub(crate) filter: Box<Fn(&DirEntry) -> bool + Send>, this should be used by both scanner and watcher. We might want to change Fn to some trait here, to allow filtering to work with both walkdir and notify APIs. But the filtering logic should be the same. In general, I'd prefer to think about this as a single operation: "bulk-scan the source root and then send notifications about updates".

Should VfsFileData include something like overlay_text: Option<Arc>?

Yep, that would be the ideal solution. The simple solution would be to store is_overlayed: bool field, and just re-scan the text in remove_overlay. This approach has the nice property that we gurantee to read the text after overlay was disabled, so we neccessary got what was written to disk. I wouldn't worry to much about reading a single file synchronously in the main loop.

What should happen if there's a WatcherChange event but there's an overlay, should we update the VfsFileData but not generate a VfsChange?

VfsChange should not be generated. As for update, is_overlayed: bool means that the event is just ignored, while overaly_text: Arc<String> will record the text in the other field.

I have an idea of how it can be done.

Yes, that's precisely the intention. We probably could do watching in the io module as well (or in a submodule of the io module.)

@vemoo
Copy link
Contributor

vemoo commented Jan 12, 2019

I've made some progress here.

To be able to filter watcher events I've turned Task into an enum:

pub(crate) enum Task {
    AddRoot {
        root: VfsRoot,
        path: PathBuf,
        filter: Box<Fn(&DirEntry) -> bool + Send>,
    },
    LoadChange(crate::watcher::WatcherChange),
}

And added another case TaskResult:

pub enum TaskResult {
    AddRoot(AddRootResult),
    HandleChange(WatcherChange),
    LoadChange(Option<WatcherChangeData>),
}

That way on Vfs::handle_task we can first filter: https://github.com/vemoo/rust-analyzer/blob/ea7667c226b373dcf98735d18fd1319b0ae15453/crates/ra_vfs/src/lib.rs#L219 and send the task back to the io to load the file contents, and the we load the data: https://github.com/vemoo/rust-analyzer/blob/ea7667c226b373dcf98735d18fd1319b0ae15453/crates/ra_vfs/src/lib.rs#L232.

There's another notify event that I didn't initially handle DebouncedEvent::Rescan, but I think we should. It says:

Rescan is emitted immediately after a problem has been detected that makes it necessary to re-scan the watched directories.

But unless we keep separate watchers per root we will have to reload all the roots. We could remove all files (except ones with overlay) and then send Task::AddRoot for each root.

I also figured out why the watcher thread wasn't shutting down. It was a bug in notify. I created a PR to fix it here: notify-rs/notify#170

@matklad
Copy link
Member Author

matklad commented Jan 12, 2019

I've looked through the changes and they look good to me! I think it's OK to not handle Rescan to start with: given that nothing works at all at the moment, even without rescan this will be a huge improvement. I don't think we should send Task::AddRoot on Rescan. At least, we defeneltey should not emit VfsChange::AddRoot for the second time for a root.

I also figured out why the watcher thread wasn't shutting down. It was a bug in notify

Awesome! I bet it was tricky to debug that! And it's cool that we actually hit that bug: this means that we do put some effort on our side to cleanup the threads properly. I think it's ok to just let that thread hang for the time being.

@vemoo
Copy link
Contributor

vemoo commented Jan 12, 2019

The problem is that the Watchers thread has a Sender for the Workers channel, and since that Sender is not dropped the channel doesn't close so the worker also doesn't terminate. And that causes a bunch of tests to fail.

@passcod
Copy link

passcod commented Jan 13, 2019

Note that Notify is more "passively maintained" than "frozen". If you need changes/features (or fixes, as you've seen!) and can contribute an impl I'll be more than happy to merge, even for breaking changes (with a higher threshold, tho). I'm just not going to spend lots of time on it.

Current estimation is that I won't release v5 before async/await is in stable rust, but that could change.

@matklad
Copy link
Member Author

matklad commented Jan 13, 2019

Awesome! Thanks for chiming in, @passcod !

@vemoo vemoo mentioned this issue Jan 15, 2019
@matklad
Copy link
Member Author

matklad commented Jan 26, 2019

Fixed!

@matklad matklad closed this as completed Jan 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-hard fun A technically challenging issue with high impact
Projects
None yet
Development

No branches or pull requests

4 participants