-
Notifications
You must be signed in to change notification settings - Fork 53
Conversation
- Use only one slice for shards and links. - Protect slice with a RWMutex to avoid race conditions. - Remove all direct slice access from outside childer. - Add concurrency test. Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>
Thank you for submitting this PR!
Getting other community members to do a review would be great help too on complex PRs (you can ask in the chats/forums). If you are unsure about something, just leave us a comment.
We currently aim to provide initial feedback/triaging within two business days. Please keep an eye on any labelling actions, as these will indicate priorities and status of your contribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a few notes:
- We are conflating in the same commit/PR the actual fix (lock) with the refactor of a new
childOrLink
structure (which isn't minor in the line count nor impact in the patch). - I'm fine adding a lock provided we're confident that:
- it doesn't affect performance
- this was actually causing the panic in the parallel walk
- I don't see what is being improved with
childOrLink
in terms of having bothnil
. It certainlydoesn't make it worseimproves upon the old code making a stronger coupling between child and link, having them in the same slice, so I'm fine landing this but we still have two pointers we reference and need to explicitly check fornil
(compare with an API that exposes callbacks to process child or link and it actually enforces that only one will run).
@schomatis
Locking for sure affects performance, but the behavior will be correct.
I added a minimal test to reproduce the error before starting the bug fixing. It was the easiest way to reproduce the error. If you have a better idea for a test to reproduce the problem just let me know! |
The mutex does that, not consolidating child and link (which is still a good thing to better understand the code). As mentioned before I get the feeling we're conflating different issues here. |
That's good, but the test is introducing the concept that we're using shards in parallel that I'm not getting where is this happening in the code. |
Added more context here: ipfs/kubo#9063 (comment) |
Closing because we thnk when we attack ipfs/kubo#9063 in the future, we'll add locking at the MFS level. |
Signed-off-by: Antonio Navarro Perez antnavper@gmail.com