Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pathspec should expose the information of what matched in a string/path #30

Closed
ftrofin opened this issue Jan 13, 2020 · 9 comments
Closed

Comments

@ftrofin
Copy link

ftrofin commented Jan 13, 2020

This library is very cool and I want to use it a project. However, i'm running against a severe limitation: I can't tell which part of a filename was matched.
Consider this example: I have several match patterns (let's call them SpecEntries). They specified what I'm looking for and, optionally, what should the matched thing be remapped to. Example:
'/Documentation/': 'docs/',
'/*.html': 'docs/',
'**/Examples/SDK/': 'docs/'

In the above examples, all those patterns on the left side are remapped to a 'docs/' folder.
Now I'm using match_tree to iterate and directory and compare against the patterns specified by my SpecEntries.

  1. The first issue is that the results returned by match_tree doesn't specify which pattern matched which file. I worked around this by iterating through my patterns, compile each one and calling match_files against it. Doable but inneficient (consider this an improvement request).

  2. After the matches are returned I'd like to remap those paths according to the right hand side of the SpecEntry for example:

    '/Documentation/foo.txt' -> 'docs/foo.txt'
    'foo.html': 'docs/foo.html'
    'blah/Examples/SDK/bar.txt' -> 'docs/bar.txt'

The problem is that there isn't an API that will allow me to do this. pathspec library knows which part of the path matched my specifier but it doesn't expose that information to me so I can't do this remapping. (I considered using regular expressions or fnmatch but they won't easily match pathspec's capabilities - for example no easy way to match '**')

Is it possible to expose the matching logic in the library APi so callers can implement this kind of remapping feature?

@cpburnz
Copy link
Owner

cpburnz commented Jan 14, 2020

@ftrofin

For (1) it would be straight forward for me to implement a version of match_tree()/match_files() that returns addition information such as which pattern a file or directory was matched with.

For (2) what you're trying to do makes sense. I'll have to evaluate the current code to determine the feasibility of the feature. Currently the patterns are converted to regular expressions and those are matched against the file paths without capturing the matching part.

@ftrofin
Copy link
Author

ftrofin commented Jan 16, 2020

Thank you so much for the prompt response and for the great news! (too many times I've heard "sorry, we can't do that"). Let me know if I can help in any way.

@ftrofin
Copy link
Author

ftrofin commented Jan 29, 2020

@cpburnz Hi Caleb, any update on this?

@cpburnz
Copy link
Owner

cpburnz commented Jan 30, 2020

@ftrofin I have a work-in-progress that I should be able to complete this weekend.

@cpburnz
Copy link
Owner

cpburnz commented Feb 3, 2020

@ftrofin

I've made to the master branch that will at least support (1).

I'm not currently sure how to implement the feature for (2). Even if I added passing through the regex match result, the gitwildmatch format would escape any regex capturing groups. So the only way to use this potential feature would be to construct RegexPattern objects directly.

@ftrofin
Copy link
Author

ftrofin commented Feb 4, 2020

Thank you Caleb for taking the time to address this. Can you please make a git release or bump the version and upload to pypi these changes so I can test them? I might have an idea on how to implement 2. If 1 is working now it means that now I know which pattern matched which SpecEntry (the left side) right? So maybe I could just re-apply the regex used during matching (if I could get it somehow) to that particular entry and get the capturing groups...
I'll have to think more about it.

@ftrofin
Copy link
Author

ftrofin commented Feb 5, 2020

@cpburnz Looking at the new code I don't see the util.detailed_match_files() being used anywhere...Is this still work in progress?

@cpburnz
Copy link
Owner

cpburnz commented Feb 8, 2020

@ftrofin That function is not being used elsewhere yet. I'm not sure if I'm completely satisfied with its implementation so I would consider it a work in progress. That's why I haven't made a proper release uploaded to PyPI.

Yes, (1) is working now but it's only in that utility function at present.

@cpburnz
Copy link
Owner

cpburnz commented Apr 9, 2020

v0.8.0 has been released. The detailed_match_files() is still only a utility function. I've been delaying too long and might as well release what's available. The way you would use it is:

patterns = ...
files = pathspec.util.iter_tree_files(...) # Or your own file list.
matches = pathspec.util.detailed_match_files(patterns, files)

See:

@cpburnz cpburnz closed this as completed Apr 9, 2020
bors bot added a commit to rehandalal/therapist that referenced this issue Apr 20, 2020
118: Update pathspec to 0.8.0 r=rehandalal a=pyup-bot


This PR updates [pathspec](https://pypi.org/project/pathspec) from **0.7.0** to **0.8.0**.



<details>
  <summary>Changelog</summary>
  
  
   ### 0.8.0
   ```
   ------------------

- `Issue 30`_: Expose what patterns matched paths. Added `util.detailed_match_files()`.
- `Issue 31`_: `match_tree()` doesn&#39;t return symlinks.
- `Issue 34`_: Support `pathlib.Path`\ s.
- Add `PathSpec.match_tree_entries` and `util.iter_tree_entries()` to support directories and symlinks.
- API change: `match_tree()` has been renamed to `match_tree_files()`. The old name `match_tree()` is still available as an alias.
- API change: `match_tree_files()` now returns symlinks. This is a bug fix but it will change the returned results.

.. _`Issue 30`: cpburnz/python-pathspec#30
.. _`Issue 31`: cpburnz/python-pathspec#31
.. _`Issue 34`: cpburnz/python-pathspec#34
   ```
   
  
</details>


 

<details>
  <summary>Links</summary>
  
  - PyPI: https://pypi.org/project/pathspec
  - Changelog: https://pyup.io/changelogs/pathspec/
  - Repo: https://github.com/cpburnz/python-path-specification
</details>



Co-authored-by: pyup-bot <github-bot@pyup.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants