Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery scorecard-v2 table doesn't have new partitions #4443

Open
mpace965 opened this issue Dec 10, 2024 · 4 comments
Open

BigQuery scorecard-v2 table doesn't have new partitions #4443

mpace965 opened this issue Dec 10, 2024 · 4 comments
Assignees
Labels
kind/bug Something isn't working

Comments

@mpace965
Copy link

Describe the bug
BigQuery scorecard-v2 dataset doesn't have new partitions. The last available partition is 20241125.

Reproduction steps
Use the following query to check for new partitions

SELECT partition_id
FROM openssf.scorecardcron.INFORMATION_SCHEMA.PARTITIONS
WHERE table_name="scorecard-v2"
AND partition_id!="__NULL__" ORDER BY partition_id DESC
LIMIT 1;

The result is 20241125 instead of something more recent like 20241202 or 20241209.

Expected behavior
These partitions exist and have scorecard data.

Additional context
I did not see any issues or discussion about this - apologies if this has already been reported. Is this still the right place to check for scorecard data? Or maybe there is an ongoing issue and new partitions are not being generated at the moment.

@mpace965 mpace965 added the kind/bug Something isn't working label Dec 10, 2024
@spencerschrock
Copy link
Member

instead of something more recent like 20241202 or 20241209.

Results are batch uploaded weekly. The 20241209 analysis just started ~1 day ago, and wont be complete until 20241216. So this is working as intended.

In terms of 20241202, the weekly run had issues getting through all repos, in which case the partial results don't get uploaded. There were quite a few GitHub incidents this past week (example) which are likely to blame.

Is this still the right place to check for scorecard data

For aggregated results, it's your best bet. You can also check https://api.scorecard.dev/ for any individual projects.

I'm going to close this as completed/answered, as both missing partitions are working as intended, but feel free to discuss further or re-open as needed.

@spencerschrock
Copy link
Member

The cron also failed the upload process this past week, with many unprocessed repos. Using this as the tracking issue for the slowdown investigation.

@spencerschrock
Copy link
Member

spencerschrock commented Dec 16, 2024

Profiling shows 20% of the time is spent in one hotspot, which is referenced indirectly via osv-scanner.

Will need to investigate further to understand cause

@spencerschrock
Copy link
Member

Reverting to osv-scanner v1.9.0 has fixed our throughput, and I'm working with them upstream to fix the issue. I'm not sure if this week's run will finish in time, but it will hopefully be fixed going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
Status: In Progress
Development

No branches or pull requests

2 participants