Project Proposal: Weighted Audit Contest score and other warden performance stats #28
adamavenir
started this conversation in
Masonry
Replies: 1 comment 7 replies
-
Nice ! Let's push for C4 as an esport - wen C4 at the Olympics ? |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Issue
C4 relies on a mixture of new established talent and emerging talent development in order to continue to produce results that are on par and even surpassing top audit firms.
While we know that we have dramatically outpaced sponsor growth with warden growth, balancing both sides of the community is essential.
We are intending to continue to grow the number of contests we run concurrently. Right now we have only rudimentary methods to ensure contest coverage and to assist wardens in choosing which contests they might be able to add the most value to and therefore benefit most from participating in.
We have a ton of data about how wardens have performed in contests, but it’s not very contextualized and our current approaches of evaluating talent are deeply biased toward “I’ve seen that name at the top of a lot of contests” which becomes less of a useful signal as more people compete.
C4 needs a comparative performance metric
Because of the elegant design of the C4 award mechanism, individual contest leaderboards do a nice job of comparing performance within an individual contest. And on the overall leaderboard, we can get a sense of how player performance compares based on aggregate totals of earnings and issues found. (We could even do a better job of this by showing average performance stats by adding a per-contest divisor.)
But we know that there is a big difference between contests. Contracts written by junior developers produce more vulnerabilities. Some contests feature code that’s already been audited once or even twice.
And some competitors are so new to auditing and/or smart contracts that it would be nice to give them a way to view their performance compared to others closer to their presumed skill level. Providing more meaningful statistical feedback to wardens on their performance based on their zone of proximal development will help people to stay motivated in the journey of leveling up their skills.
Further, because C4 is also a talent aggregator and has played a part in folks getting a job working as a smart contract developer or auditor, having a way of measuring performance will help C4 do a better job playing this critical role for individuals and for the overall ecosystem.
North star: C4 is an esport
Sports excel at tracking numbers and analyzing performance in order to compare players’ contributions and output. We can look to examples from well-established sports as a way to develop our own stat.
I’m a baseball geek—yes, a rarity in our global C4 community—but follow me here, as I think there is a useful analogy in baseball stats that we might be able to use in how we think about a comparative performance stat.
RC (Runs Created) is a metric which attempts to describe a player’s offensive output. The highest impact thing you can do is hit a home run (hit the ball out of the field, which would mean moving four bases). Hitting a ‘double’ (two bases) or a ‘triple’ (three bases) are more impactful than a standard hit (a ‘single’). RC is a counting stat, so one nice thing about RC is that you can see the player’s accumulative impact over time. (See the all-time leaders of runs created for example.)
But baseball is weird. All baseball fields are different sizes—some dramatically different from others. And different eras of the game have been more prone to home runs than others (for lots of reasons: bounciness of the ball, pitching rules that made it harder/easier for hitters, propensity of steroid use). These things obviously impact the likelihood of your ability to hit a home run.
So wRC+ (Weighted Runs Created) also adjusts for different size fields and eras of the game and is specifically scoped to players who play at the same level of competition. (A minor league player could have a 120 wRC+ in lower leagues but a 92 wRC+ when coming into the top league.) In addition, wRC+ removes players from the average set who cannot be expected to perform at the same level (namely pitchers).
To make things more user-friendly, wRC+ normalizes player stats to set 100 as the league average with each point above/below 100 meaning 1 percentage point above/below average. This lets you quickly see where someone’s performance fits on the overall curve: a 105 wRC is 5% above average; a 95 wRC is 5% below average.
What are our RC and wRC+ equivalents?
I find RC and wRC+ make a nice analogy model for us when it comes to thinking of C4 performance.
We have several types of results which graduate in impact: Hitting a home run is a bit like finding a solo high severity issue in a C4 contest :) But we also have shared high impact findings, solo medium findings, shared medium findings, plus the graded performance of a low/non-critical report or gas optimization report.
We actually already kind of have a couple of Audit Contest score methods in their most basic form available to us pretty easily in our findings.csv data, though we have not yet used them:
The adjusted version is cool because it tells us how hard the bug was to find based on the competition, with each bug’s value decreasing based on number of times it was found.
Unfortunately using that kind of weighting isn’t useful from contest to contest because we don’t know what the competition looked like:
There’s also some other data which would be extremely useful to incorporate:
So there is value in having a few statistical models which can tell us:
These numbers should ultimately be able to be split out based on certain filters like:
tl;dr
hey like what if we had advanced stats
Proposed Solution: stats contest
Beta Was this translation helpful? Give feedback.
All reactions