You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
zizmor currently does a lot of GitHub API calls in online mode, the overwhelming majority of which are redundant or universal across different users.
For example, impostor-commit needs to fetch the entire branch/tag history for each repo, which means in practice that actions/checkout gets hit, over and over, by hundreds of users. This isn't a good use of anybody's time or API quota 🙂
Separately, zizmor currently hardcodes a lot of "coordinates of interest," e.g. use-trusted-publishing hardcodes the rubygems/release-gem and pypa/gh-action-pypi-publish actions (among others). This isn't maintainable/ideal long term, since changes to the list of actions require a new release of zizmor that everybody has to re-download.
The solution to both of these problems is the same: a static, non-quota'd API that zizmor (the CLI client) can hit to retrieve batched information and timely updates relevant to specific audits.
The easiest place for us to serve this static API is probably on https://woodruffw.github.io/zizmor/, since it's already a static website and should be able to easily handle a sub-hierarchy for API routes.
Here's a rough sketch of what I'm thinking (these routes can be relative to whatever):
/data-api
/v1 # everything goes under v1 for now
/last-update.dat # returns the most recent update to the data files
/common-refs.dat # map of slug -> set[ref]
/known-vulns.dat # GHSA known vulnerabilities
...and so forth. I'm using .dat as the suffix because I'm not sure how we want to serialize these yet (JSON would be the easy choice, but these might get large and thus something like bincode or maybe rkyv might make sense).
We'd also want to cache these locally, so that repeated invocations of zizmor don't have to re-fetch them. This is a current limitation of our GitHub API usage as well.
In addition to these being served over HTTP, a copy of them (or some of them?) would also be baked into the zizmor builds themselves. This would ensure graceful degradation in the offline mode.
In sum, the order of operations with these data files would become:
If offline, use only the embedded copy and/or previously cached copy (and maybe emit a warning if it's more than 24 hours old)
If online:
Check the age of the cached data, if it exists. If it's fresh, use it.
Check the age of the embedded data. If it's fresh, use it.
If neither is fresh, attempt to hit the static API. Fail gracefully if the request fails for any reason, and fall back.
If the static API's last-update isn't new, fall back.
Use the fresh response and cache it.
Finally, if the static data doesn't have what we're looking for, use the GitHub APIs.
For the hot path (repos like actions/checkout), this should make things way faster. For the slow path, it'll at least be no slower than it was before.
CC @ubiratansoares for thoughts on the rough sketch above, since you arrived at the same idea 🙂 -- I'm not strongly bound to any of the design pieces above, so I'm curious if you have alternative ideas!
The text was updated successfully, but these errors were encountered:
zizmor
currently does a lot of GitHub API calls in online mode, the overwhelming majority of which are redundant or universal across different users.For example,
impostor-commit
needs to fetch the entire branch/tag history for each repo, which means in practice thatactions/checkout
gets hit, over and over, by hundreds of users. This isn't a good use of anybody's time or API quota 🙂Separately,
zizmor
currently hardcodes a lot of "coordinates of interest," e.g.use-trusted-publishing
hardcodes therubygems/release-gem
andpypa/gh-action-pypi-publish
actions (among others). This isn't maintainable/ideal long term, since changes to the list of actions require a new release ofzizmor
that everybody has to re-download.The solution to both of these problems is the same: a static, non-quota'd API that
zizmor
(the CLI client) can hit to retrieve batched information and timely updates relevant to specific audits.The easiest place for us to serve this static API is probably on https://woodruffw.github.io/zizmor/, since it's already a static website and should be able to easily handle a sub-hierarchy for API routes.
Here's a rough sketch of what I'm thinking (these routes can be relative to whatever):
...and so forth. I'm using
.dat
as the suffix because I'm not sure how we want to serialize these yet (JSON would be the easy choice, but these might get large and thus something like bincode or maybe rkyv might make sense).We'd also want to cache these locally, so that repeated invocations of
zizmor
don't have to re-fetch them. This is a current limitation of our GitHub API usage as well.In addition to these being served over HTTP, a copy of them (or some of them?) would also be baked into the
zizmor
builds themselves. This would ensure graceful degradation in the offline mode.In sum, the order of operations with these data files would become:
last-update
isn't new, fall back.For the hot path (repos like
actions/checkout
), this should make things way faster. For the slow path, it'll at least be no slower than it was before.CC @ubiratansoares for thoughts on the rough sketch above, since you arrived at the same idea 🙂 -- I'm not strongly bound to any of the design pieces above, so I'm curious if you have alternative ideas!
The text was updated successfully, but these errors were encountered: