-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asynchronous cgroup controller configuration #17
Comments
I think this is a wonderful thing to be able to achieve. I would love to verify and implement this. |
I'll probably be building out some kind of profiling and benchmarking on this to see what kind of improvements this feature brings. I've got a lot of the initial work done for making the manager async. I need to do some work to get the unit tests working again. Hoping to have the integration tests running on this by tonight. |
I was trying it out at my local computer, and I am very aware that cgroups is making things run slower than before. |
I'm going to reattempt this. I found out that the actual implementation of almost all async runtimes for handling file operations is to just move file IO off into a thread pool. I naively assumed it chose between different kernel APIs depending on the build targets. I think in our case this isn't awesome. The cost of starting a threadpool diminishes and possibly outright negates the potential performance improvements of concurrent cgroups configuration. I think what I'd like to do is approach this from a lower level. It would be an interesting experiment to make a small program to see if the AIO or epoll work well for interacting with a cgroup controller, otherwise lib_uring might address all of the possible caveats and dead ends in the other async IO kernel features. As far as I'm aware AIO and epoll are both incapable of doing the more typical buffered IO, which some virtual filesystems support exclusively. If the cgroups vfs handles direct IO well then AIO or epoll might be preferable simply for the fact that older kernel version support them, while io uring is relatively new and only available in kernel version 5.1 or greater. However IO uring features are much better methods of doing async IO, the only system calls required are to setup the shared ring buffers with the kernel. This means less context switching, you just add file operations to the outgoing ring buffer and listen for completion on the completion buffer. I also have some recent experience hacking on IO uring, and it's also probably a safer interface than AIO and a faster interface than epoll and AIO. However neither do us much good for supporting Windows. It might be preferable in our case to hide this feature behind a compile time feature flag. With all of the active development on cgroups v1 and v2 it might actually be better to have the async version of the controllers broken out into their own half of the cgroup module so I can have the async controllers trail behind the progress of the synchronous implementation. |
@tsturzl |
Further development on this issue will be postponed and effectively blocked by work happening on the cgroups implementations #9 and #78. Some details and notes on the desired implementation. I think the ring buffers should be setup in the manager and passed through into each controller, this will likely mean some changes to the controller trait. Additionally the opening of file handles will need to be moved to the point in which we await the writes, likely in the Controller::apply function. This is because the file descriptor must remain open and outlive the IO operation until a completion is received. So opening the file earlier in the process and passing those file descriptors by reference will likely be required here to prevent use after free. For the most part rio provides a lot of tolerance against misuse, with the only caveat I'm aware of being the potential for a use after free. In this case with the design outlined above, we should be able to completely avoid this, and in cases where there are multiple writes to a single file this is better design anyway. |
@tsturzl Why not create a feature branch and work on it there? |
@utam0k I expect things to change quite a bit with cgroups, and that seems like it will be hard to resolve the conflicts between async and the baseline implementation. It wouldn't be able to merge in until most of the cgroup work is done anyway without disrupting other work in flight. I might decide to start before the cgroups are complete, but I'd feel like it's more worth the effort once cgroups is close to completion. I might start working on some of the boilerplate in the meantime, like much of common is going to need some rework for cgroups. Once cgroups are complete I don't expect it to take me much time to implement async, I'd assume most of my time is going to be spent testing. For this it would be nice to have good unit and integration testing for cgroups v2 also, since I know runtime-tools doesn't yet support v2. |
I'm providing a long overdue update on the progress here. As previously mentioned I had halted my attempts at this, and to reiterate some of the previous hurdles I found out that pretty much every async framework for Rust was pushing file operations off into a threadpool instead of doing async IO for file operations. Turns out Linux has only recently supported a decent way of doing file IO called liburing, which basically shares 2 ring buffers with the kernel allowing for queuing of IO operations and their completions. I've long followed the Rio projects for Rust which currently seems like the safest way to use this IO system. I had picked up this effort again recently and have been making progress with Rio now that development on cgroups has been mostly completed. Where I'm at now is I have a majority of the cgroups v1 features implemented with liburing. Unfortunately it's a bit slow moving as the interactions with the filesystem have had to change since the behavior between uring and typical blocking buffered IO are different. There have been a few caveats, such as now instead of making several writes to the allow and denied devices files I just write the entire contents in a single operation, since it would seem the different means of writing have inconsistent behavior with each other. In fact the devices file system interactions seems very odd in the fact that if you write to a normal file the same way you write to the devices files you will end up continuously overwriting the last entry, whereas when you write to the devices files each entry automatically somehow ends up on a new line instead of overwriting. So in general the cgroup filesystem doesn't work the way a normal filesystem would, and there are lots of caveats in doing asynchronous IO on these files. With these changes I've also had to rework some tests, which has taken some time. I suspect by the time I get to integration testing v1 cgroups I'll have a lot of issues to resolve. I have yet to setup profiling or benchmarks, but I have plans to start benchmarking once I have cgv1 complete. My hope is to finish cgv1, bring it up to date with the main branch, and then start doing comparative benchmarks using criterion to see if there are any significant performance improvements. I also plan to setup flamegraph to hopefully get insightful performance profiling for youki, and get an actual understanding of the performance improvements between async cgroups and blocking IO. If cgroups v1 is passing tests and has notable performance gains I will be hopefully creating a PR for cgv1 before the end of the month. I will then take a look at applying the same changes to cgv2 probably in a separate PR. |
@tsturzl Excellent progress. I'm very curious to see how this code turns out. Is it possible to create a Draft PR, even if it does not work? I'm very interested in this experiment and I'm excited to see how it turns out. |
@tsturzl |
@utam0k I'm avoiding threads since I think they'll have more overhead than benefit, so hopefully that won't be an issue, though that very interesting. I'm currently working through some issues with my implementation using Rio. I am however partially worried with some of the runtime behaviors and caveats. I've just seen that tokio released their uring implementation and I'm curious to give that a try also to see if it's has less pitfalls. I believe I can get the Rio implementation working, but one thing that worries me is maintainability. It would seem Tokio's implementation of uring looks more like the Rust File IO and handles a lot of the pitfalls. It is however the very first release and Rio is a more mature project. I think it's still worth investigating both solutions. |
For future reference, |
We should be able to concurrently configure each of the cgroup subsystems to hopefully see some performance improvements. One thing to note is write order within a controller is important to keep track of since values can be validated against each other by the kernel. Initially it may be best to write configs in each controller in series and have each controller be executing concurrently with the next, and eventually move into more granular concurrency.
The text was updated successfully, but these errors were encountered: