Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] async cgroups #65

Closed
wants to merge 15 commits into from
Closed

Conversation

tsturzl
Copy link
Collaborator

@tsturzl tsturzl commented Jun 6, 2021

I'm putting this up as a work in progress since I'm having trouble understanding some of the test failures. I'm running Void Linux(glibc kernel 5.10.15) on my machine and locally I'm seeing integration tests pass, however on CI I'm seeing a lot of them fail. If I revert one of the subsystems to using non-async filesystem interactions I don't see the failures on CI, but it seems to be affecting many of the controllers, if not all of them. Some interesting details here is that when I was working on this locally the integration tests were failing prior to adding in an explicit sync_data call on the File. This makes me assume that perhaps there's some caveats I'm not aware of in the virtual filesystem in the kernel. I hope to try to read some of the source code in the kernel to better understand what's going on here, but decided to create this PR to hopefully get some more eyes on this. Maybe someone has a better idea of what's going on here? Or perhaps someone who can get these failures to happen on their machine can provide some more insight? For now I'm planning to hopefully start work on ticket #39 which could provide the tooling I need to dig deeper into this, and hopefully spending some of my free time learning more about how the cgroups vfs works under the hood, since I'm aware that current generation of async filesystem interactions have some caveats such as that most async IO is actually direct not buffered and some virtual filesystems do not support this well.

Also please forgive the mess of commits here. I thought I could get CI working with a couple of quick changes, but alas I'm stumped. This feature branch is also trailing behind main branch quite a ways now, and I will rebase main in soon, but I'm hoping to wait until the underlying issue is resolved. I'm inclined to think this problem is addressable since this passing tests for me locally. Let me know what you think.

@utam0k utam0k requested review from Furisto and utam0k June 6, 2021 03:41
@Furisto
Copy link
Collaborator

Furisto commented Jun 6, 2021

I will test this on my machine this week. Have you benchmarked this @tsturzl ? What is the performance improvement?

@tsturzl
Copy link
Collaborator Author

tsturzl commented Jun 7, 2021

@Furisto I have not setup any kind of profiling or benchmarking yet as it's not currently passing CI tests. I'm actually now aware that most async frameworks, including the one I'm using(smol), are actually pushing blocking IO off into a threadpool. I naively assumed most of them relied on something like IOCP or AIO, but this isn't the case. So my assumption is that the threadpool startup costs will likely outweigh the performance gains of concurrent IO in our case. I think I'm going to revisit this and possibly have to do some of the low level work myself. Even looking at Tokio, many of it's file operations get pushed off into a background threadpool.

I've done some work with rio which relies on the io_uring features in kernel 5.1, and then I'm familiar with poll, aio, epoll, etc but they come with a lot of caveats, mainly that they only support direct IO and not buffered IO which I'm uncertain if that will work with the cgroups vfs.

@tsturzl
Copy link
Collaborator Author

tsturzl commented Jun 10, 2021

Closing the PR. I'm going to reattempt this with some more experimental approaches. Will comment details in #17

@tsturzl tsturzl closed this Jun 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants