-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stop abusing git for timing data storage #36
Comments
What about sticking it in some cloud nosql store (DynamoDB, BigTable, etc.)? |
How about producing CSV and pushing it to S3? |
The data's not well structured for CSV (AFAICT you'd have to have many, many smallish files), but yeah, any simple storage solution should work fine for this to start. |
The json files compress well (e.g. JuliaCI/BenchmarkTools.jl#79), so while we have generated a lot of data, and may want to add a TSDB for other reasons, the current rate of growth isn't terrible:
What might make the most difference is doing each by_date run as an update to |
I kind of doubt the value in saving the timing data for each trial and think it is enough to save the statistics. Sure, people said some years ago that someone maybe would want to do some analysis on the data but looking at the activity in these reports, I doubt that will happen, nor bring anything actionable for old runs. |
The isdaily reports are generated from the data.tar.xz files |
Yes, in there is a ~500 MB json file in there that contains the timing for every sample for every benchmark. I don't think anything has been done with those trial timings other than computing the minimum of it. So I am saying to just store e.g. the minimum and reduce the file size by a couple of order of magnitudes. |
Should've made this issue a long time ago.
Each Nanosoldier run generates a fair amount of timing data, which is currently stored in https://github.com/JuliaCI/BaseBenchmarkReports. As was discussed way back in the early days of Nanosoldier, this is a pretty gross abuse of git/GitHub.
We could instead just dump the data on a publicly accessible filesystem (and eventually, let the data be ingested by a more granularly queryable database).
While not directly tied to this issue, it'd also be nice to tackle the old lightweight/stable/portable serialization issue at the same time. It should be simple enough to write a JSON (de)serializer for the list of (benchmark key,
BenchmarkTools.Trial
) pairs you'd need to store.The text was updated successfully, but these errors were encountered: