Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protocols/kad: accounting error for background job capacity #4191

Closed
iand opened this issue Jul 12, 2023 · 2 comments · Fixed by #5148
Closed

protocols/kad: accounting error for background job capacity #4191

iand opened this issue Jul 12, 2023 · 2 comments · Fixed by #5148
Labels
bug getting-started Issues that can be tackled if you don't know the internals of libp2p very well help wanted

Comments

@iand
Copy link

iand commented Jul 12, 2023

Summary

In the kad behaviour poll function the current query capacity is calculated. Then there is a loop over upto num possible provider announcement jobs, possibly starting any that are ready. However a soon as one returns pending the loop is exited via break. Then the current query capacity is decreased by num, however due to the break the number of queries started may be less than num. It seems to me that the capacity should be reduced by the iteration reached in the loop, not the upper bound. (here: https://github.com/libp2p/rust-libp2p/blob/master/protocols/kad/src/behaviour.rs#L2428)

Then the code proceeds to iterate over replication jobs, using the reduced capacity as a limit on the number that may be started.

Expected behaviour

jobs_query_capacity should reflect an accurate capacity for performing replication / publication jobs

Actual behaviour

jobs_query_capacity is always lower than the actual capacity so fewer replication / publication jobs can be run than expected

Would you like to work on fixing this bug?

Maybe, although I have no actual Rust skills 😁

@mxinden
Copy link
Member

mxinden commented Jul 13, 2023

Thanks Ian for the detailed report!

Indeed, there's a chance that jobs_query_capacity might be reduced by an extra unit in the event that job.poll returns Poll::Pending.

is always lower than the actual capacity

Nit pick. I don't think that is true. I.e. in case all num calls to job.poll return Poll::Ready jobs_query_capacity is accurate. Am I missing something?

Maybe, although I have no actual Rust skills

Contribution would be very much appreciated. Happy to help! Also happy to pair program on this.

On a higher level, I find the throttling mechanism (i.e. JOBS_MAX_QUERIES and JOBS_MAX_NEW_QUERIES) rather brittle. Having a limit per poll invocation does not make sense to me, given that poll invocations might happen on the order of microseconds whereas requests resolve on the order milliseconds, thus the per-poll limit is quickly reached through consecutive invocations.

@mxinden mxinden added bug help wanted getting-started Issues that can be tackled if you don't know the internals of libp2p very well labels Jul 13, 2023
@mxinden
Copy link
Member

mxinden commented Jul 13, 2023

//CC @dariusc93 in case you are seeing this issue on https://github.com/dariusc93/rust-ipfs.

@mergify mergify bot closed this as completed in #5148 Mar 27, 2024
mergify bot pushed a commit that referenced this issue Mar 27, 2024
guillaumemichel pushed a commit that referenced this issue Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug getting-started Issues that can be tackled if you don't know the internals of libp2p very well help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants