Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author stats over multiple repos #70

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

PriitParmakson
Copy link

Wrote a script that merges authors.json files produced by git-of-theseus-analyze, so that authors chart can be produced over multiple repos.

@erikbern
Copy link
Owner

erikbern commented Jan 7, 2020

Thanks for the addition. I think this would be cleaner if the stack plotting script would take multiple files on the command line, or what do you think?

@PriitParmakson
Copy link
Author

It felt safer to make it separately, as it's my first program in Python. I wrote first version in Go, to satisfy my immediate need. Of course, the integrated way is cleaner. I'll look into it, in a week perhaps.

@PriitParmakson
Copy link
Author

Now there seem to be some side effect. I don't know Travis.

Idea of the merging algorithm:

Each authors.json file defines a function LOC(r, a, t), where
r is repo name,
a ∈ A(r) is author (from the set of authors present in repo), and
t ∈ T(r) is time.

For the plot we need a fully defined function LOC(r, a, t), where
r ∈ R (set of all repos),
a ∈ A (union of authors of repos), and
t ∈ T (union of times of repos).

However, authors.json files provide data only for a partially defined LOC(r, a, t). Fully defined function can be obtained by extrapolation:
If the is no data point for (r, a, t) in authors.json file, then define (r, a, t) = (r, a, t1), where t1 is the latest timepoint, t1 < t, present in the file of r; if there's no such timepoint, then (r, a, t) = 0.

@erikbern
Copy link
Owner

this is a better approach. the code seems pretty convoluted though – feels like it shouldn't be more than a 5-10 lines to accomplish what you want. you just need to add up the stats and account for the fact that the timestamps are irregular right?

@leonid-shevtsov
Copy link

I don't know if it's too convoluted or not (also not a pythonist), but this PR sure helped me produce a chart for all of our repos together 👍

@drew2a
Copy link

drew2a commented Nov 20, 2023

I don't know if it's too convoluted or not (also not a pythonist), but this PR sure helped me produce a chart for all of our repos together 👍

It was helpful for me as well. Thank you! @PriitParmakson

@@ -74,7 +126,8 @@ def stack_plot_cmdline():
parser.add_argument('--max-n', default=20, type=int, help='Max number of dataseries (will roll everything else into "other") (default: %(default)s)')
parser.add_argument('--normalize', action='store_true', help='Normalize the plot to 100%%')
parser.add_argument('--dont-stack', action='store_true', help='Don\'t stack plot')
parser.add_argument('input_fn')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test

before_script: # configure a headless display to test plot generation
- "export DISPLAY=:99.0"
- "sh -e /etc/init.d/xvfb start"
- sleep 3 # give xvfb some time to start

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants