Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add pagination to job_collector task #8233

Merged
merged 1 commit into from
Dec 12, 2024

Conversation

ClaudioMascaro
Copy link
Contributor

@ClaudioMascaro ClaudioMascaro commented Dec 6, 2024

⚠️ Pre Checklist

Please complete ALL items in this checklist, and remove before submitting

  • I have read through the Contributing Documentation.
  • I have added relevant tests.
  • I have added relevant documentation.
  • I will add labels to the PR, such as pr-type/bug-fix, pr-type/feature-development, etc.

Summary

Add pagination in github_graphql job collector task.

Does this close any open issues?

Closes #8028

Screenshots

We needed to extract data from a large and complex repository, which has over 30000 workflow runs, and some of that can have more than 200 job runs.

Whenever the Collect Job task started, It simply wouldn't finish the query in time, entering in the retry flow:

Screenshot from 2024-12-09 16-56-00

we have tried to reduce api timeout, but it would only increase the number of unsuccessful retries:

Screenshot from 2024-12-06 15-38-14

So, after implementing the solution, it would solve our case, and after 17 hours, it was able to collect all data:

image
(don't mind the log I added locally to debug)

image

as for comparison purposes, we have extracted data from a much less complex repository, which before the implementation, took the following time:

Screenshot from 2024-12-11 08-11-06

and, after bringing off the solution (of course, rerunning in hard refresh mode), there was no change in pipeline overall time:

Screenshot from 2024-12-11 08-18-53

Other Information

Any other information that is important to this PR.

@ClaudioMascaro ClaudioMascaro marked this pull request as ready for review December 7, 2024 11:10
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. component/plugins This issue or PR relates to plugins pr-type/bug-fix This PR fixes a bug labels Dec 7, 2024
@klesh
Copy link
Contributor

klesh commented Dec 9, 2024

Thank you for your contribution!

Have you had a chance to test the code? If so, it would be great if you could share some screenshots to help us review. Thank you!

@ClaudioMascaro ClaudioMascaro force-pushed the feat/pagination-collect-jobs branch from 1d9685a to 83f3e97 Compare December 10, 2024 16:47
@ClaudioMascaro
Copy link
Contributor Author

Hey @klesh

After some testing, I have brought a final solution. All the evidence is in the PR description. Thanks

Copy link
Contributor

@klesh klesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your contribution.
Would you like to submit another PR to the release-v1.0 branch so it can be released to the community sooner?

@klesh klesh merged commit 47b4014 into apache:main Dec 12, 2024
10 checks passed
@ClaudioMascaro
Copy link
Contributor Author

@klesh Sure, here it is: #8240 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/plugins This issue or PR relates to plugins pr-type/bug-fix This PR fixes a bug size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug][Github] GraphQL API requests will eventually fail forever collecting large repositories data
2 participants