Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jj tracks ignored files #2051

Closed
0xdeafbeef opened this issue Aug 12, 2023 · 11 comments
Closed

jj tracks ignored files #2051

0xdeafbeef opened this issue Aug 12, 2023 · 11 comments

Comments

@0xdeafbeef
Copy link
Collaborator

Steps to Reproduce the Problem

repro.zip

  1. cd $(mktemp -d) && wget https://github.com/martinvonz/jj/files/12327348/repro.zip && unzip repro.zip && cd repro
  2. jj st
    3.observe that files in target are not actually ignored

Expected Behavior

Files are ignored

Specifications

  • Platform: fedora 38
  • Version: jj 0.8.0-1abd36725a47ca1b25b815b684fab2eb1b71fc0d
@qfel
Copy link
Collaborator

qfel commented Aug 17, 2023

Reproducing test:

#[test]
fn test_status_gitignore() {
    let test_env = TestEnvironment::default();
    test_env.jj_cmd_success(test_env.env_root(), &["init", "repo", "--git"]);
    let repo_path = test_env.env_root().join("repo");
    std::fs::write(repo_path.join("test_file"), "bar").unwrap();
    // Commenting out the line below makes the test pass.
    test_env.jj_cmd_success(&repo_path, &["status"]);
    std::fs::write(repo_path.join(".gitignore"), "test_file\n").unwrap();
    let stdout = test_env.jj_cmd_success(&repo_path, &["status"]);
    assert!(!stdout.contains("test_file"));
}

@qfel
Copy link
Collaborator

qfel commented Aug 18, 2023

Hmm though I suspect the example above is actually WAI (never mind whether unintuitive). But then you should be able to fix it with jj untrack, except in your specific case it doesn't seem to recognize those files as ignored.

@martinvonz
Copy link
Owner

We talked a bit about this on Discord (feel free to join at https://discord.com/invite/dkmfj3aGQN) before @0xdeafbeef filed this bug report. It's something about incorrectly respecting a .gitignore inside a .gitignore'd directory. Sorry we didn't provide that context here.

@martinvonz
Copy link
Owner

Let me give some more detail. The .gitignore in the root directory in the repro above looks like this:

/target
.idea/
db/
config.yaml
requests.log
ton-global.config.json
queue/
.env
!/contrib/config.yaml
!/contrib/ton-global.config.json

The relevant bits there are /target/ and the two negative patterns at the end. Note that, per Git's spec, later patterns override earlier ones, so the negative patterns override the /target/ pattern. That means that we can't skip walking target/ even though it's ignored, because there might be a /contrib/config.yaml file in there. Well, since that expression is rooted, it can't, so that's just a bug. Wait, I just found the more important bug. From https://git-scm.com/docs/gitignore#_pattern_format:

An optional prefix "!" which negates the pattern; any matching file excluded by a previous pattern will become included again. It is not possible to re-include a file if a parent directory of that file is excluded. Git doesn’t list excluded directories for performance reasons, so any patterns on contained files have no effect, no matter where they are defined.

I'm pretty sure I had missed the second sentence there before. So that's saying that e.g. !/contrib/config.yaml should be ignored inside target/ even if it had not been rooted.

@qfel, do you feel like fixing that? The code is in https://github.com/martinvonz/jj/blob/main/lib/src/gitignore.rs

@qfel
Copy link
Collaborator

qfel commented Aug 18, 2023

I don't quite follow.

That means that we can't skip walking

That sounds like performance, but the problem is correctness.

From what you're saying, it sounds like the minimal reproduction could be:

/target
config.yaml
!/contrib/config.yaml

and then put something inside target. But if I do that, things look fine (files under /target are ignored).

Also in the provided repo:

$ touch target/debug/build/tikv-jemalloc-sys-e2bcbb41c8161192/out/build/test/unit/x
$ jj untrack target/debug/build/tikv-jemalloc-sys-e2bcbb41c8161192/out/build/test/unit/x
$ touch target/debug/build/tikv-jemalloc-sys-e2bcbb41c8161192/out/build/test/unit/x.c
$ jj untrack target/debug/build/tikv-jemalloc-sys-e2bcbb41c8161192/out/build/test/unit/x.c
Error: 'target/debug/build/tikv-jemalloc-sys-e2bcbb41c8161192/out/build/test/unit/x.c' is not ignored.
Hint: Files that are not ignored will be added back by the next command.
Make sure they're ignored, then try again.

So it seems there's some problem regarding files with extensions.

@martinvonz
Copy link
Owner

I don't quite follow.

That means that we can't skip walking

That sounds like performance, but the problem is correctness.

Hmm, sorry, I stopped my explanation halfway when I discovered the part of the spec I had missed :(

The part I missed was that since the negative pattern means - I thought! - that we need to walk the target/ directory, we would do that like any non-ignored directory. The problem was if target/some/subdir/.gitignore contains patterns like !*, then we'd read that and start not ignoring anything in that directory.

I hope that clarifies. The fix should be to not let negative patterns override earlier patterns matching a directory.

@Dr-Emann
Copy link
Collaborator

Is there a possibility to use burntsushi's ignore crate for gitignore processing?

@martinvonz
Copy link
Owner

Is there a possibility to use burntsushi's ignore crate for gitignore processing?

Maybe. We discussed that briefly in #2109 (comment). It sounds like @qfel is going to look into it. When I took a quick look earlier, it seemed like it was more targeted at walking a directory while respecting the .gitignores. I suspect we won't want to reuse its directory-walking mechanism, but we maybe we can use just the pattern-matching bits. Another thing to keep in mind is that we currently read .gitignores in subdirectories and apply them on top of an instance from the parent directories we already had. I don't know if the ignore crate supports that, but I wouldn't be surprised if that's an unnecessary optimization we get rid of anyway.

@martinvonz
Copy link
Owner

This was fixed by #2109

@qfel
Copy link
Collaborator

qfel commented Sep 7, 2023

I suspect we won't want to reuse its directory-walking mechanism

Yeah, that library heavily splits deciding which directories to walk/filter out and the processing logic. It would be hard to fit he current logic into it while taking advantage of its directory walking code.

In general, it looks like the visit_directory code grew quite complicated and it's hard to do anything with it. It also has no unit tests, and somebody made it even more complicated to optimize for extra concurrency - but added on benchmark, so keeping whatever performance boosts were there is tricky.

If we're going to change it, I think a sensible (but expensive) approach could be:

  • Add unit tests
  • Add some benchmarks good enough to distinguish the parallelized version vs regular
  • Factor out pieces of existing code into multiple shorter functions, so it's easier to understand what's happening
  • Think on potentially replacing the gitignore code with some library. This will likely require a refactor, but now we'll be in a good position to get there without correctness and performance hit.

If this sounds sensible I can slowly [try to] do those things, but it's a lot of work and I don't have that much free time nowadays, so not sure how long it would take me.

@martinvonz
Copy link
Owner

I think this code is going to change a bit more in the near future. We're going to add a custom working copy implementation at Google for our distributed file system. @hooper is going to work on that. That will probably involve a fair amount of refactoring here to be able to reuse code. It might not involve much refactoring of visit_directory() since our distributed file system will instead tell us which paths have changed so we shouldn't have to call that function. My guess is that it's still not worth changing the code until that's done.

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Oct 12, 2023
[0.10.0] - 2023-10-04

### Breaking changes

* A default revset-alias function `trunk()` now exists. If you previously defined
  your own `trunk()` alias it will continue to overwrite the built-in one.
  Check [revsets.toml](cli/src/config/revsets.toml) and [revsets.md](docs/revsets.md)
  to understand how the function can be adapted.

### New features

* The `ancestors()` revset function now takes an optional `depth` argument
  to limit the depth of the ancestor set. For example, use `jj log -r
  'ancestors(@, 5)` to view the last 5 commits.

* Support for the Watchman filesystem monitor is now bundled by default. Set
  `core.fsmonitor = "watchman"` in your repo to enable.

* You can now configure the set of immutable commits via
  `revset-aliases.immutable_heads()`. For example, set it to
  `"remote_branches() | tags()"` to prevent rewriting those those. Their
  ancestors are implicitly also immutable.

* `jj op log` now supports `--no-graph`.

* Templates now support an additional escape: `\0`. This will output a literal
  null byte. This may be useful for e.g.
  `jj log -T 'description ++ "\0"' --no-graph` to output descriptions only, but
  be able to tell where the boundaries are

* jj now bundles a TUI tool to use as the default diff and merge editors. (The
  previous default was `meld`.)

* `jj split` supports the `--interactive` flag. (This is already the default if
  no paths are provided.)

* `jj commit` accepts an optional list of paths indicating a subset of files to
  include in the first commit

* `jj commit` accepts the `--interactive` flag.

### Fixed bugs

### Contributors

Thanks to the people who made this release happen!

* Austin Seipp (@thoughtpolice)
* Emily Kyle Fox (@emilykfox)
* glencbz (@glencbz)
* Hong Shin (@honglooker)
* Ilya Grigoriev (@ilyagr)
* James Sully (@sullyj3)
* Martin von Zweigbergk (@martinvonz)
* Philip Metzger (@PhilipMetzger)
* Ruben Slabbert (@rslabbert)
* Vamsi Avula (@avamsi)
* Waleed Khan (@arxanas)
* Willian Mori (@wmrmrx))
* Yuya Nishihara (@yuja)
* Zachary Dremann (@Dr-Emann)


[0.9.0] - 2023-09-06

### Breaking changes

* The minimum supported Rust version (MSRV) is now 1.71.0.

* The storage format of branches, tags, and git refs has changed. Newly-stored
  repository data will no longer be loadable by older binaries.

* The `:` revset operator is deprecated. Use `::` instead. We plan to delete the
  `:` form in jj 0.15+.

* The `--allow-large-revsets` flag for `jj rebase` and `jj new` was replaced by
  a `all:` before the revset. For example, use `jj rebase -d 'all:foo-'`
  instead of `jj rebase --allow-large-revsets -d 'foo-'`.

* The `--allow-large-revsets` flag for `jj rebase` and `jj new` can no longer be
  used for allowing duplicate destinations. Include the potential duplicates
  in a single expression instead (e.g. `jj new 'all:x|y'`).

* The `push.branch-prefix` option was renamed to `git.push-branch-prefix`.

* The default editor on Windows is now `Notepad` instead of `pico`.

* `jj` will fail attempts to snapshot new files larger than 1MiB by default.
  This behavior can be customized with the `snapshot.max-new-file-size`
  config option.

* Author and committer signatures now use empty strings to represent unset
  names and email addresses. The `author`/`committer` template keywords and
  methods also return empty strings.
  Older binaries may not warn user when attempting to `git push` commits
  with such signatures.

* In revsets, the working-copy or remote symbols (such as `@`, `workspace_id@`,
  and `branch@remote`) can no longer be quoted as a unit. If a workspace or
  branch name contains whitespace, quote the name like `"branch name"@remote`.
  Also, these symbols will not be resolved as revset aliases or function
  parameters. For example, `author(foo@)` is now an error, and the revset alias
  `'revset-aliases.foo@' = '@'` will be failed to parse.

* The `root` revset symbol has been converted to function `root()`.

* The `..x` revset is now evaluated to `root()..x`, which means the root commit
  is no longer included.

* `jj git push` will now push all branches in the range `remote_branches()..@`
  instead of only branches pointing to `@` or `@-`.

* It's no longer allowed to create a Git remote named "git". Use `jj git remote
  rename` to rename the existing remote.
  [#1690](martinvonz/jj#1690)

* Revset expression like `origin/main` will no longer resolve to a
  remote-tracking branch. Use `main@origin` instead.

### New features

* Default template for `jj log` now does not show irrelevant information
  (timestamp, empty, message placeholder etc.) about the root commit.

* Commit templates now support the `root` keyword, which is `true` for the root
  commit and `false` for every other commit.

* `jj init --git-repo` now works with bare repositories.

* `jj config edit --user` and `jj config set --user` will now pick a default
  config location if no existing file is found, potentially creating parent
  directories.

* `jj log` output is now topologically grouped.
  [#242](martinvonz/jj#242)

* `jj git clone` now supports the `--colocate` flag to create the git repo
  in the same directory as the jj repo.

* `jj restore` gained a new option `--changes-in` to restore files
  from a merge revision's parents. This undoes the changes that `jj diff -r`
  would show.

* `jj diff`/`log` now supports `--tool <name>` option to generate diffs by
  external program. For configuration, see [the documentation](docs/config.md).
  [#1886](martinvonz/jj#1886)

* A new experimental diff editor `meld-3` is introduced that sets up Meld to
  allow you to see both sides of the original diff while editing. This can be
  used with `jj split`, `jj move -i`, etc.

* `jj log`/`obslog`/`op log` now supports `--limit N` option to show the first
  `N` entries.

* Added the `ui.paginate` option to enable/disable pager usage in commands

* `jj checkout`/`jj describe`/`jj commit`/`jj new`/`jj squash` can take repeated
  `-m/--message` arguments. Each passed message will be combined into paragraphs
  (separated by a blank line)

* It is now possible to set a default description using the new
  `ui.default-description` option, to use when describing changes with an empty
  description.

* `jj split` will now leave the description empty on the second part if the
  description was empty on the input commit.

* `branches()`/`remote_branches()`/`author()`/`committer()`/`description()`
  revsets now support exact matching. For example, `branch(exact:main)`
  selects the branch named "main", but not "maint". `description(exact:"")`
  selects commits whose description is empty.

* Revsets gained a new function `mine()` that aliases `author(exact:"your_email")`.

* Added support for `::` and `..` revset operators with both left and right
  operands omitted. These expressions are equivalent to `all()` and `~root()`
  respectively.

* `jj log` timestamp format now accepts `.utc()` to convert a timestamp to UTC.

* templates now support additional string methods `.starts_with(x)`, `.ends_with(x)`
  `.remove_prefix(x)`, `.remove_suffix(x)`, and `.substr(start, end)`.

* `jj next` and `jj prev` are added, these allow you to traverse the history
  in a linear style. For people coming from Sapling and `git-branchles`
  see [#2126](martinvonz/jj#2126) for
  further pending improvements.

* `jj diff --stat` has been implemented. It shows a histogram of the changes,
  same as `git diff --stat`. Fixes [#2066](martinvonz/jj#2066)

* `jj git fetch --all-remotes` has been implemented. It fetches all remotes
  instead of just the default remote

### Fixed bugs

* Fix issues related to .gitignore handling of untracked directories
  [#2051](martinvonz/jj#2051).

* `jj config set --user` and `jj config edit --user` can now be used outside of
  any repository.

* SSH authentication could hang when ssh-agent couldn't be reached
  [#1970](martinvonz/jj#1970)

* SSH authentication can now use ed25519 and ed25519-sk keys. They still need
  to be password-less.

* Git repository managed by the repo tool can now be detected as a "colocated"
  repository.
  [#2011](martinvonz/jj#2011)

### Contributors

Thanks to the people who made this release happen!

* Alexander Potashev (@aspotashev)
* Anton Bulakh (@necauqua)
* Austin Seipp (@thoughtpolice)
* Benjamin Brittain (@benbrittain)
* Benjamin Saunders (@Ralith)
* Christophe Poucet (@poucet)
* Emily Kyle Fox (@emilykfox)
* Glen Choo (@chooglen)
* Ilya Grigoriev (@ilyagr)
* Kevin Liao (@kevincliao)
* Linus Arver (@listx)
* Martin Clausen (@maacl)
* Martin von Zweigbergk (@martinvonz)
* Matt Freitas-Stavola (@mbStavola)
* Oscar Bonilla (@ob)
* Philip Metzger (@PhilipMetzger)
* Piotr Kufel (@qfel)
* Preston Van Loon (@prestonvanloon)
* Tal Pressman (@talpr)
* Vamsi Avula (@avamsi)
* Vincent Breitmoser (@Valodim)
* Vladimir (@0xdeafbeef)
* Waleed Khan (@arxanas)
* Yuya Nishihara (@yuja)
* Zachary Dremann (@Dr-Emann)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants