Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading language in image-layout.md #173

Closed
runcom opened this issue Jul 21, 2016 · 28 comments
Closed

Misleading language in image-layout.md #173

runcom opened this issue Jul 21, 2016 · 28 comments
Assignees
Milestone

Comments

@runcom
Copy link
Member

runcom commented Jul 21, 2016

Three issues:

First:

Object names in the refs and blobs subdirectories MUST NOT include characters outside of the set of "A" to "Z", "a" to "z", "0" to "9", the hyphen -, the dot ., and the underscore _.

This sentence is wrong because I can have colons : into the refs subdirectory to say ./refs/vbatts-app:latest

Second:

Hash algorithm identifiers containing the colon : will be converted to the hyphen -. For example sha256:5b will map to the layout blobs/sha256-5b.

these two sentences don't play nice together. What are we talking about? refs naming? both? I can have ./refs/sha256:5b58496 but what does that mean that the name will map to blobs/sha256-5b58496? AFAICT the content of that refs which is a descriptor will give us the location in the blobs dirs, not the ref file name.

Third:

$ cd example.com/app/
$ find .

The above (with app/) isn't making clear that an image-layout can carry more than just one app with:

./refs/vbatts-app:latest
./refs/alpine:1.0
./refs/ubuntu:sha256-5bfdjsldf

/cc @vbatts @stevvooe @philips

@runcom
Copy link
Member Author

runcom commented Jul 21, 2016

I believe we need to split the refs and blobs naming paragraph into two pieces - one for refs and one for blobs because clearly the two have different naming restrictions.

@wking
Copy link
Contributor

wking commented Jul 21, 2016

On Thu, Jul 21, 2016 at 06:44:19AM -0700, Antonio Murdaca wrote:

This sentence is wrong because I can have colons : into the refs
subdirectory to say ./refs/vbatts-app:latest

Colons are not portable, and we map them to hyphens 1. If folks
want more characters in the ref names, I suggest we have Unicode names
that are encoded to UTF-8 and then to filesystem-safe base64 2.

Hash algorithm identifiers containing the colon : will be converted to the hyphen -. For example sha256:5b will map to the layout blobs/sha256-5b.

these two sentences don't play nice together. What are we talking
about? refs naming? both?

This is just about the blob filename. Splitting it into its own
paragraph would help clarify that.

AFAICT the content of that refs which is a descriptor will give us
the location in the blobs dirs, not the ref file name.

This is correct.

$ cd example.com/app/
$ find .

The above (with app/) isn't making clear that an image-layout can carry more than just one app with:

./refs/vbatts-app:latest
./refs/alpine:1.0
./refs/ubuntu:sha256-5bfdjsldf

I'm fine adding different app examples here, although I'd recommend
either avoiding colons or base64-encoding the ref names.

@runcom
Copy link
Member Author

runcom commented Jul 21, 2016

 Hash algorithm identifiers containing the colon : will be converted to the hyphen -. For example sha256:5b will map to the layout blobs/sha256-5b.

these two sentences don't play nice together. What are we talking
about? refs naming? both?
This is just about the blob filename. Splitting it into its own
paragraph would help clarify that.

it's still misleading - what does that actually mean? what are we talking about? blobs naming? who does the conversion? is it really needed to say that? can't we just say blobs are named ALGO-SUM?
The second sentence is also misleading, where is the case someone come up with `sha256:5b? I guess from a manifest blob, yes, correct, but that sentence right there isn't saying it explicitly, I'm just guessing it.

Not sure about the colon story and base64 - I'd love maintainers to chime in

@wking
Copy link
Contributor

wking commented Jul 21, 2016

On Thu, Jul 21, 2016 at 08:34:36AM -0700, Antonio Murdaca wrote:

Hash algorithm identifiers containing the colon : will be
converted to the hyphen -. For example sha256:5b will map to the
layout blobs/sha256-5b.

these two sentences don't play nice together. What are we
talking about? refs naming? both?

This is just about the blob filename. Splitting it into its own
paragraph would help clarify that.

it's still misleading - what does that actually mean? what are we
talking about? blobs naming? who does the conversion?

The CAS consumer says “get me ‘sha256:5b…’”, and the CAS engine says
“with image-layout 1.0.0, ‘sha256:5b…’ will be stored in
‘blobs/sha256-5b…’” and fetches and returns that content 1.

is it really needed to say that? can't we just say blobs are named
ALGO-SUM?

You could do that too. I don't think it really matters.

The second sentence is also misleading, where is the case someone
come up with `sha256:5b? I guess from a manifest blob, yes, correct,
but that sentence right there isn't saying it explicitly, I'm just
guessing it.

Who cares where sha256:5b… comes from? Both sentences are just “If
you have a blob who's SHA-256 hash has a hex digest 5b…, the name for
that blob is sha256:5b… and the image-layout 1.0.0 path for that blob
is blobs/sha256-5b…”. It doesn't matter if it's a manifest, or a
layer, or a PNG, or whatever.

Consumers will know what sha256:5b… should be from the context of
wherever they got the name (e.g. from a descriptor that also carries
mediaType, or from an email saying “I've just cut a new release…” or
“Here's a cute cat picture”).

@runcom
Copy link
Member Author

runcom commented Jul 21, 2016

What's your saying is ok with with as I understand most of the concept and stuff you've highlighted - still I think that's not really clear in those sentences

@wking
Copy link
Contributor

wking commented Jul 21, 2016

On Thu, Jul 21, 2016 at 09:28:49AM -0700, Antonio Murdaca wrote:

… still I think that's not really clear in those sentences

I expect PRs are welcome ;).

@stevvooe
Copy link
Contributor

Colons should not be allowed.

The example refs are completely invalid. The following:

./refs/vbatts-app:latest
./refs/alpine:1.0
./refs/ubuntu:sha256-5bfdjsldf

Should be:

./refs/latest
./refs/1.0
./refs/sha256-5bfdjsldf

Otherwise, we completely break windows compatibility, as colons are used for alternate data streams.

@runcom
Copy link
Member Author

runcom commented Jul 21, 2016

So there's no way to have multiple images (images as in Ubuntu, Alpine, Debian, Fedora) in a single image layout?!

@wking
Copy link
Contributor

wking commented Jul 21, 2016

On Thu, Jul 21, 2016 at 03:44:51PM -0700, Stephen Day wrote:

Colons should not be allowed.

Otherwise, we completely break windows compatibility, as colons are
used for alternate data streams.

This particular bit (or at least, one solution to it) has been spun
off into #174.

@stevvooe
Copy link
Contributor

So there's no way to have multiple images (images as in Ubuntu, Alpine, Debian, Fedora) in a single image layout?!

Rather extreme conclusion and tone. Maybe try taking a deep breath before replying? 🐹

The refs simply aren't a tool for naming. They work more like tags and are separated in a similar manner to git. refs provides root references, from which an a set of blobs can be accessed. I'm not even sure that refs will have a one to one mapping with tags (the above example was great).

The path name of a ref must be compatible with as many filesystem and archive formats as possible.

Let's look at alternative example to better understand the future role of refs (not a proposal):

refs/images -> sha256:xxxx, type=some naming thing
blobs/sha:xxxx ->
  {
    "images": [
       { # a descriptor!
         "name": "alpine:latest",
         "digest": "..."
       }
    ]
  }

This part of the specification still needs to be worked out carefully. There are several considerations:

  1. Names need to be compatible across image layout, dns and other discovery systems.
  2. Image Layout needs to support several images from possibly different repositories.
  3. The trust model for naming must be clarified (how does one trust the names in a tar file?). File names aren't a very good authority if you can't verify the integrity of the underlying storage.

@wking
Copy link
Contributor

wking commented Jul 22, 2016

On Thu, Jul 21, 2016 at 04:38:55PM -0700, Stephen Day wrote:

  1. Image Layout needs to support several images from possibly
    different repositories.

This seems trivial, since you can drop all the blobs into the same
image-layout bucket, and then pick whatever you like as the refs
pointing into those blobs. The only possible issue I see is if you
want to specify a standard ref-name template for a particular image
name, and I think #174 covers any possible concerns there.

@wking
Copy link
Contributor

wking commented Jul 22, 2016

On Thu, Jul 21, 2016 at 04:38:55PM -0700, Stephen Day wrote:

  1. The trust model for naming must be clarified…

Spun off into #176.

@runcom
Copy link
Member Author

runcom commented Jul 22, 2016

Rather extreme conclusion and tone. Maybe try taking a deep breath before replying? 🐹

Ha sorry Stephen, that didn't sound that bad when I wrote it and thanks for explaining that. I was sure that was something image-spec already had and I was just surprised to learn it doesn't fully support that.

@runcom
Copy link
Member Author

runcom commented Jul 22, 2016

Image Layout needs to support several images from possibly different repositories.

and this is what I was looking for, thanks

@stevvooe
Copy link
Contributor

@runcom No worries.

This is an active area that we need to figure out.

Another possibility is to support a hierarchy in refs. Following from the example above, we'd have something like this:

./refs/vbatts-app/latest
./refs/alpine/1.0
./refs/ubuntu/sha256-5bfdjsldf

This opens the question of whether or not we specify structure in the naming standard.

@runcom
Copy link
Member Author

runcom commented Jul 22, 2016

@stevvooe what about having an file which indexes names?

$  cat ./names #ugly
{"busybox":{"refs":"latest"}, "alpine":{"refs":"v1.0"}}

@wking
Copy link
Contributor

wking commented Jul 22, 2016

On Fri, Jul 22, 2016 at 11:34:14AM -0700, Stephen Day wrote:

Another possibility is to support a hierarchy in refs. Following
from the example above, we'd have something like this:

./refs/vbatts-app/latest
./refs/alpine/1.0
./refs/ubuntu/sha256-5bfdjsldf

This is going to be difficult to support portably if we don't have
#174 or special-case path-separator conversion rules to patch around /
on POSIX vs. Windows filesystems.

This opens the question of whether or not we specify structure in
the naming standard.

If “the naming standard” is for signed name-assertions (#176) it seems
completely orthogonal to refs.

@stevvooe
Copy link
Contributor

@runcom The makes merging more complicated. Let's say you have two directories with an image layout in them. With the json index approach, it would require special code to merge the json files. If we can define path semantics, the merge can be done with a naive cp -r command.

Albeit, the approach of having a file with the name index provides much better security guarantees, since the set can be verified.

@wking
Copy link
Contributor

wking commented Jul 22, 2016

On Fri, Jul 22, 2016 at 11:58:55AM -0700, Stephen Day wrote:

Albeit, the approach of having a file with the name index provides
much better security guarantees, since the set can be verified.

I'm still missing the need for (or ability to have) security on
mutable refs. If you have a name index that's getting signed, isn't
that a CAS blob?

@stevvooe
Copy link
Contributor

@wking Users interact with names. They need to be able to trust that when they say foo, they actually get foo and not something an attacker wants for foo.

@wking
Copy link
Contributor

wking commented Jul 22, 2016

On Fri, Jul 22, 2016 at 12:31:55PM -0700, Stephen Day wrote:

@wking Users interact with names

Agreed, but names != refs. You say “please unpack foo”, and
oci-image-tool (or whatever) says (#88):

  1. Let me look for mutable refs claiming to be foo (this might be
    refsEngine.Get("foo"), but could be other things too) and get a
    descriptor.
  2. Let me walk the CAS tree that descriptor references looking for a
    signed name assertion (Signed name-assertion objects #176).
  3. Does that named.blob assert the name is ‘foo’?
  4. Do I trust any of the signatures in the signed.blob referencing
    that named.blob?
  5. Unpack.

The ‘foo’ ref is just a hint for finding the CAS root blob, not part
of the security. It's vulnerable to denial-of-service sorts of
attacks if your ref engine hosts misleading ‘foo’ instances pointing
to CAS trees that you don't end up trusting, so some write security
is in order for public ref stores. But code interacting with
image-layout is the ref store, so I don't think DoS attacks are a
concern there. And I don't see how you'd address DoS ref attacks with
only refs and blobs to work with anyway :p.

@stevvooe
Copy link
Contributor

@wking Naive code generally isn't that careful. Drawing the parallel here is dangerous, as programmers will make assumptions.

We can find a solution that is both simple and secure, as long as we define the requirements carefully.

@wking
Copy link
Contributor

wking commented Jul 22, 2016

On Fri, Jul 22, 2016 at 01:15:01PM -0700, Stephen Day wrote:

We can find a solution that is both simple and secure, as long as we
define the requirements carefully.

I guess I'll wait and see when you file that proposal^ ;).

@philips philips added this to the v0.5.0 milestone Aug 24, 2016
@philips
Copy link
Contributor

philips commented Aug 24, 2016

I am going to take this one and clarify that refs should not include an image name but simply the "tags".

@philips philips self-assigned this Aug 24, 2016
@runcom
Copy link
Member Author

runcom commented Aug 24, 2016

I am going to take this one and clarify that refs should not include an image name but simply the "tags".

is this just temporary or what? did we move from refs contains different (unrelated) images also?

@vbatts
Copy link
Member

vbatts commented Aug 29, 2016

@runcom it's more a clarification. After in-person conversation, it was clarified that the refs were intended to be most correlated to the tag current docker image format.
While it could be possible to shove desperate images into a single image-layout, it leads to the kind of confusion you encountered. So perhaps there is a super structure needed for a multi-image layout. (to that end hard-links and/or symlinks ought to be supported for deduplicating the blobs/objects).

@runcom
Copy link
Member Author

runcom commented Aug 29, 2016

@runcom it's more a clarification. After in-person conversation, it was clarified that the refs were intended to be most correlated to the tag current docker image format.
While it could be possible to shove desperate images into a single image-layout, it leads to the kind of confusion you encountered. So perhaps there is a super structure needed for a multi-image layout. (to that end hard-links and/or symlinks ought to be supported for deduplicating the blobs/objects).

makes total sense, thx for the explanation @vbatts

philips pushed a commit to philips/image-spec that referenced this issue Aug 29, 2016
Based on the confusion from Antonio in the image-layout in
opencontainers#173 we are going to
move everything that was in `refs` to `refs/tags`. This does a few
things:

1. It makes it clearer that this layout is for a single "image name" instead of allowing for lots of different refs.
2. It matches the layout of git so it likely matches existing expectations.

This was also discussed during the OCI F2F and decided to be a
reasonable idea.

Signed-off-by: Brandon Philips <brandon.philips@coreos.com>
@philips
Copy link
Contributor

philips commented Aug 31, 2016

Trying again with this: #225

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants