Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect .gitignore #49

Closed
rstacruz opened this issue Jan 16, 2016 · 29 comments
Closed

Respect .gitignore #49

rstacruz opened this issue Jan 16, 2016 · 29 comments
Assignees

Comments

@rstacruz
Copy link

I'm not sure if this was ever filed, but it'd be nice to have .gitignore respected. is that possible?

@rstacruz
Copy link
Author

As a workaround, this can work:

cloc $(git ls-files)

...as long as you don't have spaces in your path. A bit cumbersome, though.

@rstacruz
Copy link
Author

also, maybe this might be as simple as having a flag to read filenames from stdin?

git ls-files | cloc --stdin-files

@zbeekman
Copy link

also, maybe this might be as simple as having a flag to read filenames from stdin?

You can use xargs too, shorter than adding an option no one will remember to read file names from stdin:

mkdir empty
cd empty
cloc .
git ls-files .. | xargs cloc

@rstacruz
Copy link
Author

wouldnt that suffer from the spaces problem too?

@zbeekman
Copy link

First of all, my apologies, I did miss that point; I'm quite tired at the moment. However, how do you propose that the --stdin-files flag will fix the word splitting issue? Force one file per line? I'm not convinced that it's cloc's job to figure out what an appropriate IFS is... What if I want to do something like echo */*.c | cloc and the file names or directories have spaces in them? Are we supposed to assume the IFS will be limited to \n?

As a work around for this particular case, git ls-files -z | xargs -0 cloc definitely works when paths have spaces in them...

@zbeekman
Copy link

(I do agree that adding support for .gitignore and other VCS ignore files is certainly worthwhile, however.)

@AlDanial AlDanial self-assigned this Jan 16, 2016
@AlDanial
Copy link
Owner

I first started looking at obeying .gitignore in July 2015. The more I looked, the less fun it seemed to implement. Yes, it would be a nice feature. I'll certainly entertain pull requests. However, until I have a burst of enthusiasm about this problem, implementation is a ways off.

If anyone is interested in moving this along, we can tackle this independently of cloc, that is to say, given a stand-alone solution, I'll handle integrating it into cloc. The problem can be reduced to this: given a text file containing a sorted directory tree (eg the output of find . -type f | sort), read each .gitignore and apply its rules to the tree. Output would be a list of files which survive all .gitignore's.

@zbeekman
Copy link

Language preference for said implementation?

(To be clear, I'm not necessarily volunteering, I am so slammed right now,
but I think I can find a pretty straight forward way to do this in
bash... Operative word here is "think" being as I haven't dug into the
implementation details yet...)

Also, would it make sense to rely on the VCS to determine what to include
and what not to include? Rather than parsing .gitignore (and also please
note users may have global ignore files) the VCS could return a list of
files for the project (which would take into account .gitignore) and then
use that to determine what to show? This would ease the implementation for
SVN, CVS, hg, etc.

On Fri, Jan 15, 2016 at 11:54 PM AlDanial notifications@github.com wrote:

I first started looking at obeying .gitignore in July 2015. The more I
looked, the less fun it seemed to implement. Yes, it would be a nice
feature. I'll certainly entertain pull requests. However, until I have a
burst of enthusiasm about this problem, implementation is a ways off.

If anyone is interested in moving this along, we can tackle this
independently of cloc, that is to say, given a stand-alone solution, I'll
handle integrating it into cloc. The problem can be reduced to this: given
a text file containing a sorted directory tree (eg the output of find .
-type f | sort), read each .gitignore and apply its rules to the tree.
Output would be a list of files which survive all .gitignore's.


Reply to this email directly or view it on GitHub
#49 (comment).

@AlDanial
Copy link
Owner

Ideally the implementation would be in Python or Perl, but bash is plenty good enough.

Re: relying on an external VCS--that's what the "git ls-files" work-around earlier in the thread is all about. I'm not keen on making cloc do system calls unless there's no other way to do it (cloc currently does system calls to archive tools like tar and zip).

@zbeekman
Copy link

I get the desire to avoid system calls, but that means much more code will be needed, and less duplication between VCSs. Also, it complicates the user's interaction: Is cloc expected to respect settings in global VCS config files? How does cloc find these? Otherwise cloc --vcs-files-only will potentially create different results from git ls-files -z | xargs -0 cloc which has the potential to be a source of great confusion.

@zbeekman
Copy link

I guess an alternative implementation would be to explicitly pass the ignore file to cloc:

cloc --vcs-ignore-file=.gitignore and then it should be clear that cloc is only parsing the .ignore file. I think the syntax is pretty similar among VCSs for ignore files too...

@AlDanial
Copy link
Owner

Or perhaps cloc --vcs="system call to VCS to list versioned files" which in git would amount to cloc --vcs="git ls-files" and in Subversion (I think) cloc --vcs="svn ls -R".

Again I'm not keen on the system calls but am coming around to the thinking that this may be the right solution here. The documentation will explicitly state that whatever the user puts in quotes will be invoked as a system call and the output treated as a list of files for cloc to consider. Subsequent filters like --match-d, --not-match-d, --match-f, --not-match-f, would still apply.

@rstacruz
Copy link
Author

I like that. I've hacked up a script that looks like this:

# git-cloc
git ls-files $* | \
  grep -v -E '(coverage|log|tmp|temp|vendor|fixture|fixtures|dist|cassettes)/' | \
  tr '\n' '\0' | \
  xargs -0 cloc ...

to be able to use --not-match-d would be great.

@controversial
Copy link

The way I do this personally is by pushing to remote (git push respects .gitignore of course), then cloning the remote into a separate folder, and then running cloc on that.

Example (... is output I excluded for clarity):

$ git push origin master
... To https://github.com/The-Penultimate-Defenestrator/wikipedia-map.git ...
$ cd ~/Desktop
$ git clone https://github.com/The-Penultimate-Defenestrator/wikipedia-map.git
...
$ cloc wikipedia-map

It's not pretty, but it's certainly a reasonable workaround.

@AlDanial
Copy link
Owner

That's a nice tip, thanks.

My planned implementation will hopefully be more simple (namely a single step) but
I've been traveling lately and haven't had time to work on cloc. I'm shooting for
a commit to the dev branch for this feature within two weeks.

@controversial
Copy link

Cool, thanks.

On Mon, Feb 15, 2016 at 11:47 PM AlDanial notifications@github.com wrote:

That's a nice tip, thanks.

My planned implementation will hopefully be more simple (namely a single
step) but
I've been traveling lately and haven't had time to work on cloc. I'm
shooting for
a commit to the dev branch for this feature within two weeks.


Reply to this email directly or view it on GitHub
#49 (comment).

@mbovel
Copy link

mbovel commented Feb 21, 2016

Hi guys,

I had exactly this problem today (counting line of codes in a git repo) and came across this issue.

As said earlier in this discussion

cloc $(git ls-files)

works like a charm, except with files containing a space in their name.

However, that seems to work, even with spaces in names:

git ls-files > list.txt
cloc --list-file=list.txt

So, +1 for the idea of @rstacruz: juste be able to pass file list via stdin.

Parsing .gitignore, communicating with vcs or anything in this direction doesn't seem like the job of cloc to me.

@AlDanial
Copy link
Owner

That is, in fact, how I plan to implement cloc --vcs=git. Under the hood it calls git ls-files and works with that file list. Similarly --vcs=svn will invoke svn ls -R to get a file list.

This coming Friday I'll have time to implement this.

AlDanial added a commit that referenced this issue Feb 26, 2016
any user provided file name generator; issue #49
@AlDanial
Copy link
Owner

Git commit 55e616e on the master branch implements --vcs. Please give it a try with
--vcs git, --vcs svn, or any file name generator such as --vcs 'find . -type f -name "*.c" -size +500k', for example, to count only C files greater than 500 kB in size.

@AlDanial
Copy link
Owner

Also: if you crank the verbose level to 2 (with -v 2) or more, you'll see exactly which files the --vcs XX command has generated.

I don't know of git repos that have files with spaces in them but I don't expect these to be an issue.

@mbovel
Copy link

mbovel commented Feb 26, 2016

Tested with --vcs=git and --vcs='find . -name *.js', with and without spaces in file names.

Works great, thank you very much!

@rstacruz
Copy link
Author

neat! curious:

I'm not keen on making cloc do system calls unless there's no other way to do it (cloc currently does system calls to archive tools like tar and zip).

considering --vcs=git uses git ls-files under the hood, what made you change your mind on the point above? there actually /is/ another way (manually do a gitignore-aware directory traversal), though cumbersome.

not that i suggest you take that route (imho outsourcing the work to git ls-files is preferable), just curious on your thought process.

@AlDanial
Copy link
Owner

It's a matter of practicality. The time I have to work on cloc is quite limited. I'd weighed different approaches to parsing .gitignore for six months (July-Dec. 2015) without progress. Other requests for new language support and bug fixes naturally keep coming in and must be attended to. Bottom line was that I saw no feasible alternative to doing the system calls.

@mbovel -- thanks for testing!

@AlDanial
Copy link
Owner

The latest release of cloc is 1.70, haven't gotten to 2.0 yet.
If you run cloc like so

  cloc --version

it will tell you which version you're using.

Get the latest release from https://github.com/AlDanial/cloc/releases

On Fri, Jul 15, 2016 at 11:16 AM, Fernando Montoya <notifications@github.com

wrote:

I have installed cloc 2.0, and I am getting this error:

👉 cloc --vcs git
Unknown option: vcs


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#49 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABZG3U88y_b6QBpk5qraTI0H7-avQ0qtks5qV86agaJpZM4HGLY9
.

@suweller
Copy link

suweller commented Jul 20, 2016

First off, @AlDanial thanks for making and maintaining cloc.
Secondly, if you'd like to make cloc respect git, run it from git using:

# ~/.gitconfig
[alias]
  cloc = !cloc $(git ls-files)

Run git cloc from a repository root and voila.

@AlDanial
Copy link
Owner

Good tip, thanks; as a git novice I continue to be surprised by git's power and flexibility.

If you haven't tried it already, cloc --vcs git should do the same thing.

@bernardoadc
Copy link

@suweller I've tried with no success

# ~/.gitconfig
[alias]
  cloc = !cloc --vcs=git

Any idea why? throws:

Can't create unknown regex: $RE{comment}{C++} at (..)/cloc/lib/cloc line 9619.
...propagated at (..)/cloc/lib/cloc line 4789.

While running it directly does work

@suweller
Copy link

Both methods work for me now so I can't reproduce your error.
You could try the way I suggested -using ls-files-, maybe that solves your issue.

@sohailsomani
Copy link

This is sufficient. Just putting it in for the next time I need to search for it ;-)

git ls-tree -r master --name-only -- path/you/want | grep -v anything | grep -v ignored | xargs cloc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants