Skip to content

Framework for calculating statistics on illumos-related repositories and data sources.

License

Notifications You must be signed in to change notification settings

nickziv/illumetrics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This contains ALL of the illumos repositories and related projects. It is
essentially read-only and not intended for building and/or compilation. It is
to be used to analysis of commit activity and so forth.

This is a framework for downloading illumos-related repositories from git-hub,
and doing data analysis on the commit-logs of those repositories. There is a
surprising amount of information in those logs.

To use for first time:


cd tools
./get_repos.ksh
# This generates elaborate git-logs in JSON format in the ../data directory
# You'll figure it out.
./commit_jlogs.ksh


To update the repos:

cd tools
./pull_repos.ksh
./commit_jlogs.ksh

To get some stats:

cd tools
./repo_stats.ksh

You'll then want to go get a cup of coffee, and maybe play some video-games --
it will take 15 minutes or so get the stats over all of the repositories. If you
think that's too long, keep in mind that it used to take 51 minutes -- for
_one_ repository. This will distill the JSON commit logs into independent
commit logs for each author, to facilitate easier (and more parallelizable)
per-author analysis.  Also creates an author summary. All output is in JSON, so
it can be consumed by additional custom scripts. The output is stored in:

	data/commit_logs/<project type>/<repo name>_authors/

The repo_stats.ksh script and supporting code are still in the works, and will
be getting support for more advanced statistics. Also, ideas welcome!

If something doesn't work, please open an issue on github, or email me at
zivkovic.nick@gmail.com

If you are the owner of an Illumos-related repo, please send me an email or
open an issue to have your project's repo included in the data. Illumos-related
repos are anything. Even userland code that depends on or leverages
Illumos-specific facilities (for example the DTraceToolkit or even Joyent's
Manta). The more data the better.

Future work may include fetching the mailing list data for every
illumos-related mailing-list and doing some kind of (as of yet unknown)
analysis on this data. For example, I want to see if there is a correlation
between commit-activity and email-activity. Also, I suspect, but would like to
verify/falsify that the most political and opinionated mailing-list posts are
more likely to be written by people that have zero or close to zero commits in
the kernel repos (and other related repos perhaps).

Special thanks to Mike Dimitropoulos (sdimitro on github) for suggesting this
side-project which was a breath of fresh air, from the more sophisticated stuff
I work on.

About

Framework for calculating statistics on illumos-related repositories and data sources.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published