Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude duplicate entries #24

Closed
kashav opened this issue May 24, 2017 · 3 comments
Closed

Exclude duplicate entries #24

kashav opened this issue May 24, 2017 · 3 comments
Labels

Comments

@kashav
Copy link
Owner

kashav commented May 24, 2017

Duplicate entries are shown when any of the directories has a leading/trailing slash. This only applies to queries which select from multiple directories with at least one being a child directory of another.

Assume structure is as follows:

.
├── foo
│   ├── ...
│   ├── ...

And you run the following queries

$ fsql "select all from ., foo"
...
$ fsql "select all from ., foo/"
...
$ fsql "select all from ., ./foo"
...
$ fsql "select all from ., ./foo/"
...

You should see that the first doesn't include foo twice, but the next 3 do.


This can easily be solved with a new Excluder implementation which maintains a set (map[string]bool) of seen directories to determine if the directory has already been traversed. I'm fairly certain that the best solution for this would be to use regex, but it may actually be more efficient to just add multiple entries to the map per file (no trailing/leading slash, trailing, leading, and both). There's a naive implementation of this in Query.execute.

Very open to hearing other ideas!

@kashav kashav added the bug label May 24, 2017
@ghost
Copy link

ghost commented May 25, 2017

@kshvmdn do you mean add multiple to the map per path that you're selecting from?

Is there any reason why we can't normalize the paths to get rid of insignificant preceding and trailing /'s?
e.g:
./foo/ is equivalent to foo,
./some/path/foo is equivalent to ./some/path/foo/

if the user query's with
select all from ., ./some/path/foo/ then we know that once the . branch runs it will eventually add ./some/path/foo` to seen. if we normalize the trailing slashes we don't have to run a regex on each entry, and we don't have to add each form to the map.

Seems like as long as you preserve everything between (./) blah/something/else/other/file (/) we should be ok

Maybe I'm missing an edge case though. What do you think?

essentially remove the noise and then the problem becomes a bit easier

@kashav
Copy link
Owner Author

kashav commented May 25, 2017

@jonesbrianc26 You're absolutely correct, not sure why I missed this idea. There's actually a method, filepath.Clean, that does exactly this:

1. Replace multiple Separator elements with a single one.
2. Eliminate each . path name element (the current directory).
3. Eliminate each inner .. path name element (the parent directory)
   along with the non-.. element that precedes it.
4. Eliminate .. elements that begin a rooted path:
   that is, replace "/.." by "/" at the beginning of a path,
   assuming Separator is '/'.

@kashav
Copy link
Owner Author

kashav commented May 25, 2017

Fixed in #25!

@kashav kashav closed this as completed May 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant