Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support simple glob patterns for ignore rules #106

Merged
merged 14 commits into from
Jan 5, 2023
Merged

Support simple glob patterns for ignore rules #106

merged 14 commits into from
Jan 5, 2023

Conversation

joaomoreno
Copy link
Contributor

Fixes #64

@devongovett
Copy link
Member

This looks like a good start. Are you planning to continue work on it?

@joaomoreno
Copy link
Contributor Author

joaomoreno commented Nov 4, 2022

Definitely planning to continue this, apologies for the silence. Very busy over here lately.

@erha19
Copy link

erha19 commented Nov 7, 2022

HI, @joaomoreno thanks for your contribution ~

I'm facing the same issue liked: microsoft/vscode#151827.

When I watch a large project with a big node_modules directory on Linux, it will be fair cause the @parcel/watcher watch all the files on InotifyBackend.cc#L67 and the ignore options is only used to filter events after the events have fired, so it will cost a lot of unnecessary times or memory even cause our program to fail.

I looked at your code and found that it did not handle this well, have you considered applying glob rules to filter the directories when watching directories?

@erha19
Copy link

erha19 commented Nov 14, 2022

@devongovett do you have any suggestions here?

@bpasero
Copy link
Contributor

bpasero commented Dec 17, 2022

It took us the longest time to understand test failures on Linux and we still cannot fully explain them but may now have found a workaround. A bit of detail: we use micromatch.makeRe to convert a glob pattern to a regular expression and then use that in C++ directly, assuming the regular expressions are compatible because C++11 introduced ECMAScript regular expressions.

The result of ** is a very complicate regular expression with a negative lookahead:

/^(?:(?!\.)(?:(?:(?!(?:^|\/)\.).)*?)\/?)$/

For a strange reason, on Linux only, this pattern would not match on simple input such as foo.bar. Same works fine on macOS and Windows. The problem seems to originate from the usage of ^ in the negative lookahead: (?!(?:^|\/).

In the end I asked SO (https://stackoverflow.com/questions/74796098/c-regular-expression-with-negative-lookahead-and-matches-on-macos-but-not-l) but did not get a good answer.

The verdict is to set dot: true as micromatch option which will return a regular expression that does not seem to suffer from this 🤷 . Would still be great to better understand this though because maybe there are other patterns that exhibit a similar result...

@devongovett
Copy link
Member

devongovett commented Dec 18, 2022

Is it platform specific or compiler specific? You mentioned you compiled with clang on macOS and gcc on Linux. See if gcc on macOS has the same issue? Anyway, sounds like a possible compiler bug to me.

Edit: or maybe standard library bug?

@bpasero
Copy link
Contributor

bpasero commented Dec 18, 2022

Yeah good point, I would totally put my money on compiler specific and not platform specific, but have not verified yet. One thing we did try is to use newer versions of GCC (up to 11) to see if it still reproduces and there was no difference. https://godbolt.org/ is a great playground to check (e.g. https://godbolt.org/z/YK6GhKcqx).

@bpasero
Copy link
Contributor

bpasero commented Dec 19, 2022

@devongovett I think this is good now for a review from you (Joao is out today, but he will mark the PR as non-draft later).

Besides the trouble with the complex regex where we now have the dot: true workaround, I think this PR is straightforward:

  • is-glob check for telling apart paths and globs keeping the API same as before
  • micromatch to convert the glob to a RegEx that we can use in C++
  • updated bool isIgnored(std::string path) to also check for the glob pattern

Some discussion items to you:

  • glob patterns match on the relative path (e.g. when you watch /Users/somename/foo and there is an event /Users/somename/foo/bar/file.txt the path to match will be bar/file.txt (I think this is a good decision but want to bring it up here in case you disagree and it should be absolute)
  • I noticed in some cases you directly use watcher.mIgnore.count... for seeing if a path is ignored but felt it should use watcher.isIgnored (specifically in fts.cc and legacy.cc). Imho the only locations where watcher.mIgnore is relevant is where the underlying watcher has support for exclude patterns (macOS & Watchman)
  • I did not find glob support on the level of macOS and Watchman, so did not bother trying to push the glob patterns further down
  • I updated the README.md to explain the use of glob patterns

@devongovett
Copy link
Member

Awesome! I'll take a look soon.

@bpasero
Copy link
Contributor

bpasero commented Jan 3, 2023

Hi @devongovett (happy new year!). Was wondering if you had a chance to review 👍

@devongovett devongovett self-requested a review January 3, 2023 06:42
Copy link
Member

@devongovett devongovett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good! I'll test it out later tonight and get a release out for you this week. Thanks again for working on this. 😍

@bpasero
Copy link
Contributor

bpasero commented Jan 3, 2023

@devongovett wonderful ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support simple glob patterns for ignore rules
4 participants