Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --preserve-file-descriptors=N to create #1320

Merged
merged 1 commit into from
Feb 24, 2017

Conversation

ijc
Copy link
Contributor

@ijc ijc commented Feb 10, 2017

This preserves the given number of file descriptors on top of the 3 stdio, that
is, fds [3..3+N) will be passed through.

Since Socket Activation (via $LISTEN_FDS) also claims fd 3 onwards the use of
this new flag is incompatible with socket activation.

Signed-off-by: Ian Campbell ian.campbell@docker.com

utils_linux.go Outdated

if len(r.listenFDs) > 0 && r.preserveFDs > 0 {
return -1, fmt.Errorf("Preserve FDs and socket activation (LISTEN_FDS) are mutually incompatible")
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a bit of a cop out, but I couldn't think of a better way. The only alternative I thought of was to specify that preserveFds preserves N over and above ${LISTEN_FDS:-0} but that seemed pretty fugly. It seems unlikely to me that one would write an application which uses both socket activation and expects other random open fds to be present upon exec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You an just start by using len(process.ExtraFiles), adding 3 stdio and always preserve from what is already in the ExtraFiles

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crosbymichael thanks, so to check I understand, you are suggesting that --preserve-fds should be cumulative over SD_LISTEN (or anything else which extends process.ExtraFiles in the future). IOW given LISTEN_FDS=3 and --preserve-fds=5 then we would preserve [3, 11) (in addition to stdio). That's 3, 4 & 5 from LISTEN_FDS=3 and 6, 7, 8, 9 & 10 from --preserve-fds=5.

I'll do that.

create.go Outdated
@@ -46,6 +46,10 @@ command(s) that get executed on start, edit the args parameter of the spec. See
Name: "no-new-keyring",
Usage: "do not create a new session keyring for the container. This will cause the container to inherit the calling processes session key",
},
cli.IntFlag{
Name: "preserve-file-descriptors",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty long flag name, Maybe go for something like --preserve-fd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I think I'll go for --preserve-fds to keep it plural. Maybe --nr-preserve-fds would be clearer though (but then the length creeps back up...). --nr-keep-fds?

utils_linux.go Outdated

if len(r.listenFDs) > 0 && r.preserveFDs > 0 {
return -1, fmt.Errorf("Preserve FDs and socket activation (LISTEN_FDS) are mutually incompatible")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You an just start by using len(process.ExtraFiles), adding 3 stdio and always preserve from what is already in the ExtraFiles

@crosbymichael
Copy link
Member

I still think the nr- is kinda verbose. I don't even know what nr means. ;)

@dqminh
Copy link
Contributor

dqminh commented Feb 15, 2017

@ijc25 i'm curious, do you have a usecase for this noted down somewhere ?

@ijc
Copy link
Contributor Author

ijc commented Feb 15, 2017

@crosbymichael nr is an abbreviation for number (maybe not as commonly used as I thought). I added it to indicate that the argument was the count of such things, rather than a list of them or whatever. I'll happily drop it if you prefer though.

@dqminh basically I'd like to pass an open socket or file fd to my application so I can reduce its privileges to not include opening sockets/files.

@rhatdan
Copy link
Contributor

rhatdan commented Feb 15, 2017

I also don't like -nr, have not seen this in a CLI before.

@crosbymichael
Copy link
Member

crosbymichael commented Feb 15, 2017

LGTM

Approved with PullApprove

@mrunalp
Copy link
Contributor

mrunalp commented Feb 16, 2017

Could you either squash or rework the commits?

cli.IntFlag{
Name: "preserve-fds",
Usage: "Pass N additional file descriptors to the container (stdio + $LISTEN_FDS + N in total)",
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also add this to run.go? They almost have same options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that would make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were even similar enough that git show create.go | patch run.go did the right thing ;-)

@ijc
Copy link
Contributor Author

ijc commented Feb 17, 2017

edited: I can't see ...

I can't see how adding the option to run.go has caused the continuous-integration/travis-ci/pr failure. Failure is:

--- FAIL: TestExecIn (0.21s)

	utils_test.go:51: execin_test.go:42: unexpected error: container_linux.go:247: starting container process caused "process_linux.go:270: running exec setns process for init caused \"exit status 26\""

I see similar messages in some other recent master failures e.g.
https://travis-ci.org/opencontainers/runc/jobs/198764647#L2632
https://travis-ci.org/opencontainers/runc/jobs/201936653#L2642
Is there some flakiness in CI or shall I continue to investigate?

utils_linux.go Outdated
if len(r.listenFDs) > 0 {
process.Env = append(process.Env, fmt.Sprintf("LISTEN_FDS=%d", len(r.listenFDs)), "LISTEN_PID=1")
process.ExtraFiles = append(process.ExtraFiles, r.listenFDs...)
}

baseFd := 3 + len(process.ExtraFiles)
for i := baseFd; i < baseFd+r.preserveFDs; i++ {
process.ExtraFiles = append(process.ExtraFiles, os.NewFile(uintptr(i), "PreserveFD"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the name be different for each fd instead of "PreserveFD"? Maybe "PreserveFD:i" ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nicer for debugging to do that, It does not really make a difference functionality wise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

This preserves the given number of file descriptors on top of the 3 stdio and
the socket activation ($LISTEN_FDS=M) fds.

If LISTEN_FDS is not set then [3..3+N) would be preserved by --preserve-fds=N.

Given LISTEN_FDS=3 and --preserve-fds=5 then we would preserve fds [3, 11) (in
addition to stdio).  That's 3, 4 & 5 from LISTEN_FDS=3 and 6, 7, 8, 9 & 10 from
--preserve-fds=5.

Signed-off-by: Ian Campbell <ian.campbell@docker.com>
@cyphar
Copy link
Member

cyphar commented Feb 21, 2017

@ijc25 It's not a flakiness in the CI, it was a race condition that we've been hitting a lot recently. It's been fixed in #1237.

@@ -57,6 +57,10 @@ command(s) that get executed on start, edit the args parameter of the spec. See
Name: "no-new-keyring",
Usage: "do not create a new session keyring for the container. This will cause the container to inherit the calling processes session key",
},
cli.IntFlag{
Name: "preserve-fds",
Usage: "Pass N additional file descriptors to the container (stdio + $LISTEN_FDS + N in total)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to have this sort of API, I would prefer that we make it more explicit than this. For example, make it so that you have to specify which file descriptors you want to pass (IntSliceFlag). This would also mean that cases where you don't want to use low file descriptors (see this SQLite option for more reasons why that might matter).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it would look like

runc run --preserve-fd 512 --preserve-fd 1337 ctr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current interface doesn't allow you to pass through fds 0, 1, or 2 in this way so I think the SQLlite reasoning for avoiding those fds doesn't really apply here.

I can take a look switching things to behave like you suggest but it would have to be contingent on Go's os/exec Cmd.ExtraFiles supporting the necessary semantics (i.e. a bug slice full of nils apart from at offset 512 and 1337 in your example). Personally I don't think it is terribly useful to be able to pass arbitrary fds, it's pretty easy (and arguably more controlled) for the caller to dup2 things into the right place after fork but before exec of runc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this i'm not sure defining individual fds is optimal. If you have a large amount of fds to pass you would run out of space for all the --preserve-fd 3 --preserve-fd 4 --preserve-fd 5...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I was a bit surprised that IntSliceFlag didn't support comma separated lists or even N-M spans (or if it does that the syntax wasn't more obvious...).

FWIW I did confirm that Cmd.ExtraFiles looks on first glance to DTRT with nil in the list, although finalizeNamespace and CloseExecFrom would probably need extending to handle a sparse fd space to be absolutely safe.

@dqminh
Copy link
Contributor

dqminh commented Feb 24, 2017

@dqminh basically I'd like to pass an open socket or file fd to my application so I can reduce its privileges to not include opening sockets/files.

Thanks the the clarification ! I was wondering why reusing the listen_* interface was not reasonable enough, but i guess that setting thelisten_* env variable is also abit more complex than just passing them as CLI arg, especially when we know the command to be run.

@crosbymichael
Copy link
Member

@dqminh ya, and it would be overloading the systemd functionality that people expect with listen_

@crosbymichael
Copy link
Member

crosbymichael commented Feb 24, 2017

Thanks for squashing

LGTM

Approved with PullApprove

@dqminh
Copy link
Contributor

dqminh commented Feb 24, 2017

LGTM.

Approved with PullApprove

@dqminh dqminh merged commit 17966ce into opencontainers:master Feb 24, 2017
@ijc ijc deleted the preserve-fds branch February 27, 2017 09:43
ijc pushed a commit to ijc/go-runc that referenced this pull request Mar 1, 2017
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
ijc pushed a commit to ijc/go-runc that referenced this pull request Mar 1, 2017
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
ijc pushed a commit to ijc/go-runc that referenced this pull request Mar 2, 2017
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
crosbymichael added a commit to containerd/go-runc that referenced this pull request Mar 2, 2017
ijc pushed a commit to ijc/containerd that referenced this pull request Mar 3, 2017
This allows extra open fds to be passed through into the container, relying on
opencontainers/runc#1320 to actually pass them to the init process.

These fds are opened in the order they appear in the ExtraFDs array and begin
at fd 3 after stdio.

Initially only two kinds of extra fd are supported, /dev/null (mostly as a PoC
to have more than one, its not actually very useful) and a named tap device.
Other useful types which could be considered might include anything which could
be returnable by Go's net.Dial or Fds received as ancillary data over a Unix
domain socket.

The fds are opened in the main containerd process and passed through to the
shim as a simple count. This slightly complicates passing the open gRPC socket
fd through to the shim since inserting the extra fds before it means it is no
longer fixed at fd 3. The number is now communicated through an environment
variable, this seemed to have less potential for confusion than retaining the
fd at 3 and then requiring containerd-shim to shuffle the extra fds down one as
it does the exec of runc.

NB revendoring crosbymichael/go-runc also pulls in
db0c87b Remove - from temp process name
c098a32 Remove process options from ExecOpts
as well as the changes from containerd/go-runc#8.

Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants