Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update nix to 0.27.1 #2369

Merged
merged 5 commits into from
Oct 9, 2023
Merged

update nix to 0.27.1 #2369

merged 5 commits into from
Oct 9, 2023

Conversation

anti-entropy123
Copy link
Contributor

@anti-entropy123 anti-entropy123 commented Sep 19, 2023

I'm currently working on issue #2317.

note: the following content is now outdated

In the code changes I've made, there are two things that might raise questions:

  1. The use of std::mem::forget().
  2. The use of an unsafe function OwnedFd::from_raw_fd().

I use forget() because some functions in the nix library now return an OwnedFd. This helps avoid accidentally closing the file descriptor too soon by preventing its automatic drop. Maybe fully relying on OwnedFd to manage the fd's lifetime instead of manually calling close on RawFd is a better approach, but I think this could require quite a bit of code changes.

I also use OwnedFd::from_raw_fd() because of the nix library. The nix::sched::setns() function now requires an Fd: AsFd as one of its parameters, and I couldn't find a better way to convert an i32 into Fd: AsFd.

Additionally, this is my first time contributing code to an open-source repository. If there are any important details I've missed, please let me know.

@codecov-commenter
Copy link

codecov-commenter commented Sep 20, 2023

Codecov Report

Merging #2369 (951e5c6) into main (fee4641) will increase coverage by 0.02%.
Report is 8 commits behind head on main.
The diff coverage is 91.83%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2369      +/-   ##
==========================================
+ Coverage   65.98%   66.01%   +0.02%     
==========================================
  Files         133      133              
  Lines       16814    16832      +18     
==========================================
+ Hits        11095    11111      +16     
- Misses       5719     5721       +2     

@YJDoc2
Copy link
Collaborator

YJDoc2 commented Sep 20, 2023

Hey, thanks for the PR. I think there was someone else also assigned to this you should also ping them .
As there are a lot of changes, I'll need some more time to review this, probably over the weekend.

@YJDoc2 YJDoc2 added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Sep 20, 2023
@utam0k
Copy link
Member

utam0k commented Sep 23, 2023

I'll take a look at this PR tomorrow.

Copy link
Member

@utam0k utam0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your first contribution, and congrats your first OSS contribution! Awesome.
I left some comments.

Comment on lines 211 to 212
std::mem::forget(os_sender);
std::mem::forget(os_receiver);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use forget() because some functions in the nix library now return an OwnedFd. This helps avoid accidentally closing the file descriptor too soon by preventing its automatic drop. Maybe fully relying on OwnedFd to manage the fd's lifetime instead of manually calling close on RawFd is a better approach, but I think this could require quite a bit of code changes.

What should we do to support OwnedFd? I feel it is too dangerous for low-level container runtime because we must take care of the security risk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Sender and Receiver, the problem is relatively easy. I noticed that Sender::close() will closes the fd.

I don't have a solid plan for fully supporting OwnedFd. I can only say that I've tried to make my PR as compatible as possible with the previous implementation that used rawfd, and reuse the code that previously managed file descriptors.

The main difficulty right now is that if we changes entirely to OwnedFd, the console_socket in struct ContainerArgs (crates/libcontainer/src/process/args.rs:20) won't satisfy the Clone trait. Maybe need to introduce Rc<T>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I avoid forget() in this PR, or address it in a separate issue? I'd like to hear your opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I avoid forget() in this PR, or address it in a separate issue? I'd like to hear your opinion.

Thanks for your detailed investigation. This problem relates to the security issue. So I can't merge it before removing forget().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this cautious decision. I have some ideas for removing forget(), but it will take some time to implement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for understanding 🙏

but it will take some time to implement.

NP 👍 Let's give it a try!

fd
}
None => -1,
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when it returns -1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, we can combine L90-L94 into L81-89. Is there a reason why you separate them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when it returns -1?

It doesn't seem to make any difference compared to the past.
The previous setup_console_socket() would return Ok(-1) under certain conditions. Now, under the same conditions, it returns Ok(None).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, we can combine L90-L94 into L81-89. Is there a reason why you separate them?

There's nothing special. I can make the modification as you suggested.

@@ -164,12 +164,12 @@ fn clone(cb: CloneCb, flags: u64, exit_signal: Option<u64>) -> Result<Pid, Clone
// do not use MAP_GROWSDOWN since it is not well supported.
// Ref: https://man7.org/linux/man-pages/man2/mmap.2.html
let child_stack = unsafe {
mman::mmap(
mman::mmap::<File>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask you to write a comment that File doesn't have any meaning because we won't use it.

Copy link
Contributor Author

@anti-entropy123 anti-entropy123 Sep 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's related to the change in the function signature of nix::sys::mman::mmap(). The current signature of the mmap() function is:

pub unsafe fn mmap<F: AsFd>(
    addr: Option<NonZeroUsize>, 
    length: NonZeroUsize, 
    prot: ProtFlags, 
    flags: MapFlags, 
    f: Option<F>, 
    offset: off_t) -> Result<*mut c_void>

If you remove ::<File>, it will result in a compilation error:

type annotations needed; cannot infer type of the type parameter F declared on the function mmap

Is there any other way to avoid the compilation error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood it, but it's unclear to the reader of this code. May I ask you to write a comment about it?

Comment on lines 120 to 125
let csocketfd = self.setup_tty_socket(&container_dir)?;
let csocketfd = csocketfd.map(|sockfd| {
let fd = sockfd.as_raw_fd();
forget(sockfd);
fd
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let csocketfd = self.setup_tty_socket(&container_dir)?;
let csocketfd = csocketfd.map(|sockfd| {
let fd = sockfd.as_raw_fd();
forget(sockfd);
fd
});
let csocketfd = self.setup_tty_socket(&container_dir)?.map(|sockfd| {
let fd = sockfd.as_raw_fd();
forget(sockfd);
fd
});

},
)?;
// The spec requires the listener socket to be closed immediately after sending.
let _ = unistd::close(socket);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we can remove close()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think OwnedFd will automatically call libc::close() when it goes out of function body.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec requires the listener socket to be closed immediately after sending.

We need to close it as soon as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L106 is the return statement, so sendmsg() will be immediately followed by the drop of the socket.
However, considering that the spec emphasizes this point, I believe it's necessary to explicitly close the socket. I plan to call drop(socket) on L105. Do you think that's okay?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's necessary to explicitly close the socket. I plan to call drop(socket) on L105. Do you think that's okay?

I prefer it because there is a possibility that we might change the code around there.

@@ -47,14 +50,12 @@ pub fn recv_seccomp_listener(seccomp_listener: &Path) -> SeccompAgentResult {
Ok(msg) => msg,
Err(e) => {
let _ = unistd::close(conn);
let _ = unistd::close(socket);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove unisted::close because OwnedFd is automatically closed by Drop, can't we?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to check if fd is leaking?

@yihuaf
Copy link
Collaborator

yihuaf commented Sep 24, 2023

I will take a look this week.

@utam0k
Copy link
Member

utam0k commented Sep 25, 2023

@yihuaf Thanks for taking your time 🙏 I want you to review it, too.

@anti-entropy123
Copy link
Contributor Author

Hi, I've resubmitted the new code, and I've removed all forget() (except for one in a test function, crates/libcontainer/src/tty.rs:211 ).
In my downstream repository, all CI tests have already passed.
Please review it again when you have time. Thank you very much. @utam0k

@utam0k
Copy link
Member

utam0k commented Sep 28, 2023

@anti-entropy123 May I ask you to resolve the conflict? 🙏

@anti-entropy123
Copy link
Contributor Author

May I ask you to resolve the conflict?

👌 Done. @utam0k

@utam0k
Copy link
Member

utam0k commented Oct 1, 2023

@anti-entropy123 Thanks. Please tell me why we can't use try_clone()?
https://doc.rust-lang.org/nightly/std/os/fd/struct.OwnedFd.html#method.try_clone

@anti-entropy123
Copy link
Contributor Author

anti-entropy123 commented Oct 1, 2023

Please tell me why we can't use try_clone()?

Hi, thanks for your advice.

Honestly, I didn't initially think of trying try_clone(). But I think using try_clone() might cause the file descriptor to be accidentally closed.

For example, in the container_main_process() (crates\libcontainer\src\process\container_main_process.rs:49), the channels are cloned into closures on the heap. For the parent process, this closure will be destructed after clone() finishes, causing all the channels within it to be closed.

Is my understanding correct?

@utam0k
Copy link
Member

utam0k commented Oct 3, 2023

@anti-entropy123 I assume that try_clone() creates a new file descriptor with DUPFD. so there is no problem I think...

@anti-entropy123
Copy link
Contributor Author

I assume that try_clone() creates a new file descriptor with DUPFD. so there is no problem I think...

@utam0k Indeed, you are right. So, do you think we shouldn't use Rc? 🤔

Although you might already be aware, I'd like to mention that try_clone() might also result in the need to use close() to cleanup those old FDs after the child process is created.

@utam0k
Copy link
Member

utam0k commented Oct 4, 2023

@anti-entropy123 I think Rc is one of a complex category. I would avoid using it if possible.

@utam0k
Copy link
Member

utam0k commented Oct 4, 2023

I wish I had noticed that first. I'm sorry it took me so long to notice.

@anti-entropy123
Copy link
Contributor Author

I would avoid using it if possible.

@utam0k Okey, I will update the relevant code.

@YJDoc2
Copy link
Collaborator

YJDoc2 commented Oct 5, 2023

Hey, I took a look at the changes and the conversation that is happening, and

  • I don't think at this point youki can completely switch from RawFd to OwnedFd, because at several places we are opening sockets or connections, and then passing them around, or sharing them across forks, for which (as of now) we need them to be treated as i32
  • Given that till now we were using RawFds, we have taken care to make sure everything that was opened is closed properly. I think in the upgrade to nix 0.27.1, we should only update types so that they compile and propagate properly, and while doing that we can see where RawFds are being used, and check that they are properly being closed. Then in a separate PR, we can do the change from RawFd to (maybe) OwnedFd , and see how complex that becomes.
  • I agree that using Rc/Arc would not be a good idea, they have their own handling complexity, and I would prefer not to use ref-counted things in this context as much as possible due to potential overhead they have.
  • I think that instead of using std::mem::forget , using std::mem::ManuallyDrop would be a better idea, as that fits here better semantically. https://doc.rust-lang.org/std/mem/struct.ManuallyDrop.html
  • I have opened a PR on my fork Nix upgrade YJDoc2/youki#260 , where I have only done type changes and used ManuallyDrop, so you can see how that looks for reference.

@utam0k 👀

@utam0k
Copy link
Member

utam0k commented Oct 5, 2023

@anti-entropy123 Sorry for confusing you, but I agree with @YJDoc2 . We should take care of OwnedFd in another PR if possible.

Signed-off-by: anti-entropy123 <1348651580@qq.com>
@anti-entropy123
Copy link
Contributor Author

We should take care of OwnedFd in another PR if possible.

I understand. I referenced @YJDoc2 's fork, and using ManuallyDrop seems to be a better approach.
Please re-review again. @utam0k

Copy link
Member

@utam0k utam0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anti-entropy123 May I ask you to create the issue to fully support OwnedFd in another PR

Comment on lines +219 to +220
let f1 = std::mem::ManuallyDrop::new(f1);
let f2 = std::mem::ManuallyDrop::new(f2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: Could you leave a comment on why we need to ManuallyDrop?

Signed-off-by: anti-entropy123 <1348651580@qq.com>
Signed-off-by: anti-entropy123 <1348651580@qq.com>
Signed-off-by: anti-entropy123 <1348651580@qq.com>
@anti-entropy123
Copy link
Contributor Author

May I ask you to create the issue to fully support OwnedFd in another PR

No problem. 👌 @utam0k

Copy link
Collaborator

@YJDoc2 YJDoc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, some minor comments, but nothing blocking. So if everything else is ok, we can go ahead and merge, and address these later

Signed-off-by: anti-entropy123 <1348651580@qq.com>
Copy link
Collaborator

@YJDoc2 YJDoc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍
nits by utam0k have been implemented, so going ahead and merging this.

@YJDoc2 YJDoc2 merged commit b968ac5 into youki-dev:main Oct 9, 2023
13 checks passed
@YJDoc2
Copy link
Collaborator

YJDoc2 commented Oct 9, 2023

@anti-entropy123 thank you for following on this and implementing 🙏 Apologies for the back-and-forth that was done from our side, and you had to change the PR 😓
Congrats on your first contribution to Youki ! 🎉 🎉

@anti-entropy123
Copy link
Contributor Author

@YJDoc2 You're welcome, and I greatly appreciate your reviews and suggestions. I've learned a lot from them 👍. I'm looking forward to making more contributions to Youki !

By the way, I have a question. I noticed that there are many individual commits in the main branch after merging this PR. Why not use 'Squash and merge'? Or should I aim to keep the number of commits as small as possible?

@YJDoc2
Copy link
Collaborator

YJDoc2 commented Oct 10, 2023

I'm looking forward to making more contributions to Youki !

🙌

By the way, I have a question. I noticed that there are many individual commits in the main branch after merging this PR. Why not use 'Squash and merge'? Or should I aim to keep the number of commits as small as possible?

So there is no hard-and-fast rule regarding this. We usually try to keep the main branch clean, and keep the commits in PR, so we can refer back to specific commit in history if needed. However, if there are lots of commits, or many commits which are just fixing lints / minor changes, we ask dev to rebase and squash, or merge the PR in squash-and-merge fashion. For this particular PR, 5 commits are not too much, and even though there are couple of lint/comment fix commits, I felt that would be fine.

Overall, if possible one should aim to keep number of commits small, so the main branch stays clean ; but if it is required to have large number of commits (eg step-wise changes, separating each in a commit would make review easier) then one can request on PR to sqash as well (or we'd do so when we merge).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants