Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compaction: async deletion of files can result in many goroutines #2379

Closed
nicktrav opened this issue Mar 4, 2023 · 4 comments
Closed

compaction: async deletion of files can result in many goroutines #2379

nicktrav opened this issue Mar 4, 2023 · 4 comments

Comments

@nicktrav
Copy link
Contributor

nicktrav commented Mar 4, 2023

When using the deletion pacer, we kick off the deletion in a goroutine, here. This can result in many goroutines being spawned, only to be put to sleep by the pacer, here.

Here's an example summary I saw, which showed 500+ goroutines all sleeping, waiting their turn to delete the files from their respective compactions. This accounted for ~5% of the total goroutine count of this process:

Screenshot 2023-03-03 at 5 20 43 PM

When using a deletion pacer, rather than spawning a goroutine per compaction, we could consider having a single goroutine that is responsible for deletion of files. When a compaction completes, it puts the files that need to be deleted in a queue (guarded by some mutex). Then a second goroutine could pull from this queue in order to satisfy the deleting pacing constraints.

Note that when running without a pacer, this problem isn't apparent, as the files are deleted synchronously as part of the compaction.

@jbowens
Copy link
Collaborator

jbowens commented Mar 6, 2023

Was the high goroutine count present with and without encryption-at-rest? cockroachdb/cockroach#98051 has the potential to slow Remove FS calls and exacerbate the issue.

@nicktrav
Copy link
Contributor Author

nicktrav commented Mar 6, 2023

Was the high goroutine count present with and without encryption-at-rest?

Correct. I saw it in both. The test run using a master binary doesn't seem to have this issue. It may take some time to manifest though.

@bananabrick
Copy link
Contributor

bananabrick commented Mar 6, 2023

When a compaction completes, it puts the files that need to be deleted in a queue (guarded by some mutex). Then a >second goroutine could pull from this queue in order to satisfy the deleting pacing constraints.

We already have this queue, versionSet.obsoleteTables. We're not using it in the way which is proposed in this issue, though.

@jbowens
Copy link
Collaborator

jbowens commented Dec 5, 2023

This was resolved in #2641.

@jbowens jbowens closed this as completed Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

3 participants