Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easy walk-dir like API #175

Open
matklad opened this issue Jan 26, 2019 · 21 comments
Open

Easy walk-dir like API #175

matklad opened this issue Jan 26, 2019 · 21 comments
Labels
A-enhancement Z-needs implementation Needs an implementation, will accept PRs

Comments

@matklad
Copy link

matklad commented Jan 26, 2019

Hi!

We've been using notify in rust-analyzer, and one thing I've noticed that a task like "watch src/**.rs glob" requires quite a bit of manual implementation. Specifically, I think two bits are complex:

  • setting up recursive watching with exclusion
  • handling of rescan event which requires repeating the recursive logic again

I feel like there's some higher-level API missing here... Ideally I'd love to use something like walkdir:

for event in WatchDir::new("foo").filter_entry(|e| !is_hidden(e)) {
    match event {
        Event::Create => (),
        Event::Change => (),
        Event::Delete => (),
     }
}
@passcod
Copy link
Member

passcod commented Jan 26, 2019

Oh, that's a great idea! Certainly something I'll use for vNext, but would also be great if it was implemented for the current branch... not sure how feasible. Thanks for the lovely idea!

@passcod
Copy link
Member

passcod commented Jan 29, 2019

In the mean time, hotwatch looks pretty cool: https://crates.io/crates/hotwatch (good job @francesca64):

use hotwatch::{Hotwatch, Event};

let mut hotwatch = Hotwatch::new().expect("Hotwatch failed to initialize.");
hotwatch.watch("war.png", |event: Event| {
    if let Event::Write(path) = event {
        println!("War has changed.");
    }
}).expect("Failed to watch file!");

I liked it so much I put it in the readme.

@matklad
Copy link
Author

matklad commented Feb 1, 2019

Hotwatch API looks nice (especially due to not exposing mpsc channel), but it doesn't handle the most tricky bit: filtering the watched directories. That's a pretty important bit for rust-analyzer: if we just watch the whole directory with Cargo project, we get flooded (up to 100% CPU) with events for the target directory when Cargo starts building the project.

Also cc @vemoo, you might be interested in helping out with this :)

@matklad
Copy link
Author

matklad commented Feb 1, 2019

Another thought: for this API, it would be important to specify the consistency semantics of the events. Obviously, if I receive Event::Create, I can't just blindly read the file: it might have been deleted in the next moment! Similarly, I'd rather not have a Rescan event in the API and instead delegate rescaning to the library.

I think what we want here is quiescent consistency: if there are no changes for a period of time, the last event for each path should correspond to the actual state. In other words, in the sum of the events between two quiescent periods must be equal to the difference between two walkdir traversals.

@passcod
Copy link
Member

passcod commented Feb 1, 2019

Agreed on the filtering.

Rescan is a historical mistake, really. It means two things on two platforms, and has no equivalent on any others. In inotify it's really a fatal error event, and in fsevents it's an indication that something was missed, but fsevents misses things all the time by design (there's a huge discussion elsewhere in issues about fsevents with the tl;dr and conclusion that fsevents should not be used for this library, see #144, #147; alas, time).

Probably what that library should do to achieve such consistency is to attempt to attr files right after the quiet down and then returns Attrs (or whatever the struct is called) alongside the path and Op. If the file is non existent it could emit/modify the event to actually reflect reality, and exposing the Attrs would save the caller from doing a subsequent (and possibly outdated by then) lookup.

I don't know that guaranteeing "the sum of the events between two quiescent periods must be equal to the difference between two walkdir traversals" is practical, because I have low confidence in the actual reliability of these platforms APIs. Appending this guarantee with "in an ideal world" is probably good? No, I am being overly cynical here. Relying only on native events is on the order of 99.9% accurate in 90% of cases. The 10% include things like unsupported filesystems, monitoring huge trees, networked shares, permission issues, and ghosts. As maintainer, I tend to see those 10% a lot more.

@matklad
Copy link
Author

matklad commented Feb 1, 2019

Appending this guarantee with "in an ideal world" is probably good?

Yeah, that's what I've meant. Specifically, this clause about "consistency" was about a sublte bug we had in rust-analyzer.

Initially, we did recursive walkdiring on the one thread, and watching on another. We also are interested in the actual "contents" of the files, so both walkdiring thread and watching thread were reading files from disk and sending the results to the single channel, which was the "API".

The problem with this setup is that one thread can read file at time t1, another thread can read the file at t2 > t1, but the events in the channel could be in a swapped order. We fixed this by making sure we always read file contents from the first thread, and made the watching thread to only send requests for reading the files to the first one: that way we guarantee that file contents only progresses forward in time.

That is, besides infinite ways in which native APIs are broken, there's a fair amount of places where we might accidentally break logical "happens before" relation, and we need to think about that :)

@vemoo
Copy link
Contributor

vemoo commented Feb 1, 2019

Yes, I can try to implement this.

For v4 at least for inotify shouldn't be too hard since it's already using WalkDir internally. For windows and mac I'll have to take a closer look at the code. It seems they nativelly suport recursive watches, but maybe when a filter is provided instead of creating a recursive watcher we should handle the recursion ourselves.

From taking a quick look at the next v5 code the issue I mention before is more clear thanks to the Capability but the solution could be the same. What I think is missing is a way of adding watches after the Bakend is created, that way we can implement the filtering on top of the backend.

@passcod
Copy link
Member

passcod commented Feb 1, 2019

In v4 you get a different implementation per platform that exposes the same interface, in v5 you'd never actually interact with Backends yourself. vNext has very strong separation of abstraction between what Backend implementations do and what the consumers' concerns are, so "vNext Backends" are immutable (makes them super easy to write), and "vNext Notify" itself is a layer that manages adding and removing watches (aka creating and deleting backends) and processing events.

In the current design of vNext what you'd do is implement a Processor that does the recursion and another that does the filtering, and add them in when .recurse() or .filter(|f| !is_hidden(f)) are called on this hypothetical WatchDir builder. So I like this API design very much because it's exactly what vNext is made to be! (For more, see this draft of a presentation/announce.)

(The current state is that Processors are described and there's a trait, but they're not hooked into the Notify mechanic yet. I keep getting bogged down by async code. I anticipate refining/adjusting how the work/look once or while I get them working.)

@vemoo
Copy link
Contributor

vemoo commented Feb 8, 2019

I've started working on the feature for v4, and I have a few questions.

  • I think ideally the filtering should in RecursiveMode something like:

    pub enum RecursiveMode {
        Recursive,
        NonRecursive,
        Filtered(...)
    }

    But that would be a breaking change, so I guess the best thing is to create a new method to create a watcher, new_with_filter maybe?

  • Should we filter files also? It would be a nicer api, but that would mean possibly turning rename events into create or delete if one of the names is filtered, that wouldn't be too bad for debounced events but I don't know how it should be handled for raw events. I was thinking of initially implementing directory filtering only. Also to make the configuration more complete I was thinking of exposing a Fn(&Path)->WalkDir

  • In the current inotify implementation for recursive watching, when a directory is created it's walked recursively and the found directories are also watched. But shouldn't we also emit create events for all the found files and directories?

Regarding the v5, I think the backend trait is missing a way to configure the type of watch when creating it, specifically I'm thinking of creating a non recursive folder watch on windows for example. Also if I'm following the code correctly, for recursive watches with backends that don't support it nativelly, it will end up allocating Vec<PathBuf> each time a directory is created.

@passcod
Copy link
Member

passcod commented Feb 9, 2019 via email

@passcod
Copy link
Member

passcod commented Feb 9, 2019

Can you explain what you mean by

Also to make the configuration more complete I was thinking of exposing a Fn(&Path)->WalkDir

@passcod
Copy link
Member

passcod commented Feb 9, 2019

  • What I mean by "wondering whether adding a variant to that enum would really be a breaking change" is that RecursiveMode is never returned from the API, so there's no reason to match on it. I think the exception to that would be a library that re-exports RecursiveMode to their users and then pattern-matches on it rather than passing it straight through to Notify.

    I'm weakly leaning more towards bumping the major anyway. I think a large new feature like this is worth it, especially if it means doing it right. Upgrading would be trivial, even for libraries as described above. Adoption would be slower, but the nicer API would be a good incentive.

  • Directory filtering at least as a first approach sounds good, as that would be most of the value.

  • shouldn't we also emit create events for all the found files and directories?

    I think that would make sense, but I wonder what others do:

    • What do our other backends do?
    • and what do other notify libraries in other languages do? (or in Rust, if applicable)

    I think this could be handled separately, though, so it doesn't block this issue.

@passcod passcod added the Z-needs implementation Needs an implementation, will accept PRs label Feb 9, 2019
@vemoo
Copy link
Contributor

vemoo commented Feb 9, 2019

Can you explain what you mean by

Also to make the configuration more complete I was thinking of exposing a Fn(&Path)->WalkDir

So that the user could set the available options in WalkDir. But now that I think about it the only one that makes sense is follow_links, since min_depth and max_depth would only be usefull relative to the watch root, not each folder on which we well be using the WalkDir. So I'm thinking the filter could be:

struct RecursionFilter<P>
where
    P: Fn(&Path) -> bool,
{
    filter_dir: P,
    follow_links: bool,
}

@passcod
Copy link
Member

passcod commented Feb 9, 2019

Right, sounds good.

@vemoo
Copy link
Contributor

vemoo commented Feb 11, 2019

I think I found a way to do it here. I've created an abstraction for the watcher implementation: WatcherInternal and that way I can implement the recursive filtering and keep track of the which nested watches belong to which actual watch in RecursionAdapter.

It's mosly an idea, I haven't tried to implement it yet for any backends. I'll try inotify first, by implementing WatcherInternal for inotify::EventLoop, and see if it works.

@matklad
Copy link
Author

matklad commented Feb 13, 2019

Random thought: in rust-analyzer, we've noticed that on mac we can get Crate event when the write is expected. That is, distinguishing between the kind of event reliably seems tricky. So perhaps instead of

        Event::Create => (),
        Event::Change => (),
        Event::Delete => (),

We should have

        Event::MaybeChanged => (),

?

@vemoo
Copy link
Contributor

vemoo commented Feb 17, 2019

I have it mostly working for inotify (https://github.com/vemoo/notify/tree/watch-filter). There's 1 test (watch_recursive_move_out) that fails always and some other that fails ocasionally. I'll investigate.

I have yet to add tests for the filtering, but I think the hardest part is done. Adding it to the other watches should be easier since they don't have a recursive implementation that has to be replaced.

@passcod
Copy link
Member

passcod commented Feb 18, 2019 via email

@vemoo
Copy link
Contributor

vemoo commented Feb 24, 2019

I'm slowly making progress.

The test that was failing was a timing issue, I fixed it here: vemoo@51d80bc

I ended up implementing filtering for files also, because it wasn't that much more work. This is the filter struct: RecursionFilter and this is the item to filter on: FilterItem

To implement the filtering I needed to know if a rename event was the "from" or the "to" part, to try to add or remove watches. To keep it simple I check if the path exists and if it does I treat it as a rename "to", otherwise as a rename "from". But inotify and windows distinguish between the two events, so I though that it would be usefull to separate the RENAME op in two.

I also found something in Debounce that might be a bug. Given this raw events (filtered):

[
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1",
        CREATE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/file1",
        CREATE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/file1",
        CLOSE_WRITE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1",
        CREATE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file1",
        CREATE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file1",
        CLOSE_WRITE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file2",
        CREATE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file2",
        CLOSE_WRITE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2",
        CREATE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2/subdir1",
        CREATE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2/subdir1/file1",
        CREATE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2/subdir1/file1",
        CLOSE_WRITE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/non_ignored_dir",
        RENAME,
        Some(
            2384
        )
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2",
        RENAME,
        Some(
            2386
        )
    )
]

the debounced events are:

[
    NoticeRemove(
        "/tmp/temp_dir.6u6AFtuQQITu/dir1/non_ignored_dir"
    ),
    Create(
        "/tmp/temp_dir.6u6AFtuQQITu/dir1"
    ),
    Create(
        "/tmp/temp_dir.6u6AFtuQQITu/dir1/file1"
    ),
    Create(
        "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir1"
    ),
    Create(
        "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir1/file1"
    ),
    Create(
        "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir1/file2"
    ),
    Create(
        "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir2/subdir1"
    ),
    Create(
        "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir2/subdir1/file1"
    ),
    Create(
        "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir2"
    )
]

(this is the out put of running the tests https://github.com/vemoo/notify/blob/b1890b91e07f8b7292b8de5bc7cb637a07141868/tests/recursion.rs#L68 and https://github.com/vemoo/notify/blob/b1890b91e07f8b7292b8de5bc7cb637a07141868/tests/recursion.rs#L84 on linux)
I think having separate "from" and "to" events will help fix this.

I've also started working on integrating it in windows, but it's a bit harder than I thought.
For mac I won't be able to try it, unless there's a way to do it from linux.

@passcod
Copy link
Member

passcod commented Mar 11, 2019 via email

@vemoo
Copy link
Contributor

vemoo commented Mar 13, 2019

It seem this approach doesn't work for windows. Because windows doesn't allow to delete the watched directory neither renaming the parent directory, and in this approach we create a non recursive watch for each directory. There are some tests that I didn't see until after I hit this issue:
https://github.com/vemoo/notify/blob/dc2e46b6bd4c3c9d3f9edaa6bcfc705b11c24137/tests/watcher.rs#L1117
https://github.com/vemoo/notify/blob/dc2e46b6bd4c3c9d3f9edaa6bcfc705b11c24137/tests/watcher.rs#L1461
that are ignored for windows for these reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-enhancement Z-needs implementation Needs an implementation, will accept PRs
Projects
None yet
Development

No branches or pull requests

3 participants