fix: add pagination to job_collector task #8233
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
pr-type/bug-fix
,pr-type/feature-development
, etc.Summary
Add pagination in github_graphql job collector task.
Does this close any open issues?
Closes #8028
Screenshots
We needed to extract data from a large and complex repository, which has over 30000 workflow runs, and some of that can have more than 200 job runs.
Whenever the Collect Job task started, It simply wouldn't finish the query in time, entering in the retry flow:
we have tried to reduce api timeout, but it would only increase the number of unsuccessful retries:
So, after implementing the solution, it would solve our case, and after 17 hours, it was able to collect all data:
(don't mind the log I added locally to debug)
as for comparison purposes, we have extracted data from a much less complex repository, which before the implementation, took the following time:
and, after bringing off the solution (of course, rerunning in hard refresh mode), there was no change in pipeline overall time:
Other Information
Any other information that is important to this PR.