Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] cron.update_mirrors - LIMIT_SIZE config parameter #16982

Closed
somera opened this issue Sep 7, 2021 · 12 comments · Fixed by #17568
Closed

[Feature] cron.update_mirrors - LIMIT_SIZE config parameter #16982

somera opened this issue Sep 7, 2021 · 12 comments · Fixed by #17568
Labels
type/proposal The new feature has not been accepted yet but needs to be discussed first.

Comments

@somera
Copy link

somera commented Sep 7, 2021

Description

The update_mirrors cron is updating all mirros (where updated_unix is ...) in one row. In my case I'm running the one once per day. That the cron needs ~2,5h to update all >6000 mirrors. And if Gitea is updating too much repos in one row GitHub is blocking Gite for some minutes. In this case Gitea gets 503 HTTP-ERROR

2021/09/07 04:09:27 ...irror/mirror_pull.go:176:runSync() [E] Failed to update mirror repository &{394 67 OpenAPITools <nil> openapi-generator-cli openapi-generator-cli   2 https://github.com/OpenAPITools/openapi-generator-cli.git master 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 false false false true <nil> [] 0 map[] map[] [] <nil> false 0 <nil> false 0 <nil> 54835310 <nil> <nil> false false [openapi openapi3 npm openapi-generator openapi2] default  1615665794 1630844437}:
        Stdout: Fetching origin

        Stderr: fatal: unable to access 'https://github.com/OpenAPITools/openapi-generator-cli.git/': The requested URL returned error: 503
        error: Could not fetch origin

        Err: exit status 1
        /source/services/mirror/mirror_pull.go:176 (0x2006e8a)
        /source/services/mirror/mirror_pull.go:276 (0x2008e9e)
        /source/services/mirror/mirror.go:79 (0x2004545)
        /source/modules/graceful/manager.go:139 (0xc5c565)
        /usr/local/go/src/runtime/asm_amd64.s:1371 (0x47aa40)

In this case I would preffer some config parameter where I can set the size of the current update_mirror cron.

; Update mirrors
[cron.update_mirrors]
; SCHEDULE = @every 24h
SCHEDULE = 0 0 * * * *
**LIMIT_SIZE = 50**

In this case the update_mirrors cron will run every hour und call the update for the 50 oldest (select * from repository where is_mirror = true order by updated_unix asc limit 50) mirrors. And if LIMIT_SIZE is not set, then it will gets all mirrors in the right order.

Or is this now possible?

There is an MIRROR_QUEUE_LENGTH config parameter. But I didn't find the usage in code.

image

@lunny lunny added the type/proposal The new feature has not been accepted yet but needs to be discussed first. label Sep 8, 2021
@lunny
Copy link
Member

lunny commented Sep 8, 2021

Worker number and queue number should help to control that but it's not direct.

@zeripath
Copy link
Contributor

zeripath commented Nov 5, 2021

In 1.16 the mirror queue will be a proper queue and we would recommend that you use TYPE=level or TYPE=redis queue if you have a lot of mirrors.

@somera
Copy link
Author

somera commented Nov 5, 2021

In 1.16 the mirror queue will be a proper queue and we would recommend that you use TYPE=level or TYPE=redis queue if you have a lot of mirrors.

Sounds good. Is there an config example? I can't find nothing in

https://github.com/go-gitea/gitea/blob/main/custom/conf/app.example.ini

@zeripath
Copy link
Contributor

zeripath commented Nov 5, 2021

If you're happy to run redis for all your queues it would be as simple as:

[queue]
TYPE=redis
CONN_STR=; as per docs

To specifically make the mirror queue and the pull request task queues level queues then it's:

[queue.pr_patch_checker]
TYPE=level

[queue.mirror]
TYPE=mirror

That should do it.

@somera
Copy link
Author

somera commented Nov 6, 2021

@zeripath currently I'm running my Gitea on an Mini-Server with 8GB RAM with PostgreSQL and Memcached, Nexus, ... . And if I will optimize my mirror process than I need Redis. In this case I can replace Memcached with Redis. Could be possible.

Perhaps it will be better to reduce the amount of different tools which can be used with Gitea. ;) Cause not every tool can be used for some operations. And the development process will be easier.

But if I set

[queue]
...
LENGTH = 2000

and my mirror cron runs every 24 hours than only 2000 oldest mirrors will be updated. Right? This means, that if I have 6000 mirrors after 3 days all mirrors will be up2date. Right?

@zeripath
Copy link
Contributor

zeripath commented Nov 6, 2021

@somera if you don't want to use redis just use the level queue which is built into gitea itself.

The problem with using a persistable-channel queue for the mirror queue is if you have 2001 mirrors and 2000 are queued, the 2001st request to push to the mirror queue will block.

It is this blocking that is likely the cause your repeated issues of opened or stuck processes. Not every call to queue.Push() is async'd with go queue.Push(...).

@somera
Copy link
Author

somera commented Nov 6, 2021

ok. But what is with this question? Cause I try to understand this new functionality. It this what I "wanted" in my initial post?

If I set

[queue]
...
LENGTH = 2000

and my mirror cron runs every 24 hours than only 2000 oldest mirrors will be updated. Right? This means, that if I have 6000 mirrors after 3 days all mirrors will be up2date. Right?

@zeripath
Copy link
Contributor

zeripath commented Nov 6, 2021

Ah I think I now understand what you mean - you'd prefer to limit the number of mirrors added to the queue by cron.update_mirrors.

OK let me take a look at that now.

@somera
Copy link
Author

somera commented Nov 6, 2021

@zeripath right. I don't want update all the 6000 mirrors in one row. this need's ~3h at the moment. And I will be blocked on Github ... too many requests in xxx minutes. but split theam.

On every update mirror cron call gite should update only xxxx oldest updated mirrors.

zeripath added a commit to zeripath/gitea that referenced this issue Nov 6, 2021
Add `PULL_LIMIT` and `PUSH_LIMIT` to cron.update_mirror task to limit
the number of mirrors added to the queue each time the cron task is run.

Fix go-gitea#16982

Signed-off-by: Andrew Thornton <art27@cantab.net>
@somera
Copy link
Author

somera commented Nov 23, 2021

@zeripath thx. If I set than (perhaps in 1.16.0)

PULL_LIMIT=1000

than on update_mirrors cron only 1000 oldest mirrors will be updated?

@zeripath
Copy link
Contributor

Each time the update_mirrors task is run only the oldest PULL_LIMIT pull mirrors and oldest PUSH_LIMIT push mirrors will be added to the queue.

If the mirror is already in the queue it will not count towards the limit.

So if the task limit is 3 say and you have repos A-N waiting to be updated and in increasing staleness, if A-E are already in the queue F, G and H will be added.

@somera
Copy link
Author

somera commented Feb 5, 2022

@zeripath after upgrade to 1.16.0 the update mirror process isn't working like in 1.15.x anymore. See #18607

And I don't understand the new process.

I did ~9000 curl api calls to Gitea:

curl -X 'POST' 'http://nuc-mini-celeron:3000/api/v1/repos/gaphor/in-app-notification-demo/mirror-sync' -H 'accept: application/json' -H 'Authorization: token xxxxx' -d ''

If I repead the curl calls I see this

2022/02/05 01:04:08 ...ces/mirror/mirror.go:161:func1() [E] Unable to push sync request for to the queue for push mirror repo[6616]: Error: already in queue
        /source/services/mirror/mirror.go:161 (0x1cd87c9)
        /usr/local/go/src/runtime/asm_amd64.s:1581 (0x471520)

in the logs.

And I set this:

[cron.update_mirrors]
SCHEDULE = 0 0 4 * * 5
PULL_LIMIT = -1
PUSH_LIMIT = -1

But Gitea is not updating all the repos.

Thy? When will Gitea update all the repos where the mirror.updated_unix date is older than one day?

image

@go-gitea go-gitea locked and limited conversation to collaborators Apr 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type/proposal The new feature has not been accepted yet but needs to be discussed first.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants