manifest: layer federation #169

runcom · 2016-07-06T17:37:40Z

Bringing up discussion on layer federation..

The docker v2s2 schema also has them https://github.com/docker/distribution/blob/master/docs/spec/manifest-v2-2.md#image-manifest-field-descriptions and docker itself it's already using this feature for Windows layer.

This will allow people to implement layer federation functionality in their own registry. ISVs who want to host their layers somewhere else are now free to do so. It will be up to whatever client/server implementation to fully support them. The basic idea still remains and OCI would be fine by having this standardized.

Signed-off-by: Antonio Murdaca runcom@redhat.com

wking · 2016-07-06T17:41:38Z

manifest.md

@@ -191,6 +191,10 @@ Unlike the [Manifest List](#manifest-list), which contains information about a s

        This REQUIRED property is the digest of the content, as defined by the [Descriptor](descriptor.md) digest format.

+    - **`urls`** *array*
+
+       For an ordinary layer, this is empty, and the layer contents can be retrieved directly from the registry. For a layer with mediatype of application/vnd.docker.image.rootfs.foreign.diff.tar.gzip, this contains a non-empty list of URLs from which this object can be downloaded.


The application/vnd.docker.image.rootfs.foreign.diff.tar.gzip media type is not defined in this spec, so we probably want to either define it or remove the restriction. Is there a reason for the media-type restriction or for making this layer-specific? I think “when I wrote this, you could get the blob from {urls}” is useful information for descriptor-targets in general (in which case this field should go here).

that's a copy-paste error :)

you can see urls is there https://github.com/opencontainers/image-spec/blob/master/descriptor.md#reserved I just wanted to bring up the discussion here for a future version of the spec probably

On Wed, Jul 06, 2016 at 10:41:39AM -0700, W. Trevor King wrote:

For an ordinary layer, this is empty, and the layer contents can be retrieved directly from the registry. For a layer with mediatype of application/vnd.docker.image.rootfs.foreign.diff.tar.gzip, this contains a non-empty list of URLs from which this object can be downloaded.

The application/vnd.docker.image.rootfs.foreign.diff.tar.gzip
media type is not defined in this spec…

Now that I have a clearer idea of what this media type is for 1, I
think a better approach is to make ‘foreign’ (or some such) an
explicit property of wherever the mutable mirror data is stored 2.
Then the publisher is saying “you need these blobs to build my image”,
but not telling you how to get them. And the distributor (managing
the mutable mirror data) is saying “you can get that blob from the
usual place and also these mirrors” or “you can't get that blob from
the usual place and have to get it from one of these mirrors”.

Clients who choose not to hit the mutable mirror engine and go to the
usual place for foreign blobs will get the usual key-not-found error
(and potentially a hint pointing at a particular mutable mirror
engine), but that's an acceptable workflow to my eyes.

wking · 2016-07-06T17:45:34Z

On Wed, Jul 06, 2016 at 10:37:41AM -0700, Antonio Murdaca wrote:

This will allow people to implement layer federation functionality
in their own registry.

Putting this on or next to the descriptor means that only the
publisher is free to add mirrors. I think we probably want to relax
that and put mirror lookup in mutable data (see also #129). That way
you can keep the mirror list up to date without resigning the image,
and third-party mirrors with access to the mutable-companion-data
engine can help out without needing to coordinate with either the
publisher or consumer.

runcom · 2016-07-06T17:47:28Z

Putting this on or next to the descriptor means that only the
publisher is free to add mirrors. I think we probably want to relax
that and put mirror lookup in mutable data (see also #129). That way
you can keep the mirror list up to date without resigning the image,
and third-party mirrors with access to the mutable-compantion-data
engine can help out without needing to coordinate with either the
publisher or consumer.

makes a lot of sense, but I'm not sure ppl other than the publisher needs to do edit mirrors

wking · 2016-07-06T17:54:50Z

On Wed, Jul 06, 2016 at 10:47:29AM -0700, Antonio Murdaca wrote:

makes a lot of sense, but I'm not sure ppl other than the publisher
needs to do edit mirrors

Why not? For example, Gentoo mirrors lots of upstream package source
(“distfiles”) without coordinating with the package authors who
published the original tarballs 1. I don't see a reason to make
that ability exclusive to publishers.

runcom · 2016-07-06T18:04:35Z

Why not? For example, Gentoo mirrors lots of upstream package source
(“distfiles”) without coordinating with the package authors who
published the original tarballs [1]. I don't see a reason to make
that ability exclusive to publishers.

I'm all for mirrors and being able for clients to specify their own :O I just said I'm not sure and want more people to chime in :)

wking · 2016-07-06T18:10:32Z

On Wed, Jul 06, 2016 at 11:04:36AM -0700, Antonio Murdaca wrote:

Why not? For example, Gentoo mirrors lots of upstream package source
(“distfiles”) without coordinating with the package authors who
published the original tarballs [1]. I don't see a reason to make
that ability exclusive to publishers.

I'm all for mirrors and being able for clients to specify their own…

I think this is different from clients specifying their own (they
don't need to read from image-spec stuff at all if they want to fetch
blobs from alternative sources). This change is about allowing
publishers (and possibly third party mirrors) to advertise their
existence. I expect there will be clients that:

Don't bother fetching the mirror information and always go to the
original publisher,
Don't bother fetching the mirror information and always use their
own locally-configured mirror/proxy.
Fetch the mirror information and look for whitelisted hosts, falling
back to the original publisher if they don't find any.
Fetch the mirror information and exclude blacklisted hosts, falling
back to the original publisher if that leaves no mirrors.

I just said I'm not sure and want more people to chime in :)

This would be good too ;).

stevvooe · 2016-07-06T18:56:07Z

The plan was to make sure this field worked well from a UX perspective before bringing into OCI. Docker is going to be the guinea pig here.

Also, this needs to go to the descriptor specification.

runcom · 2016-07-06T18:59:20Z

I'm interested into defining what we call layer federation also which could or could not be strictly tied to the urls field (like additional data as wking highlighted above)

wking · 2016-07-06T20:22:16Z

Having a URL-based mirror system would also allow folks to use IPFS
(or other protocols that don't use the raw content hash as the CAS ID
1) to distribute blobs. For compatibility with naive clients, you
can lean on the public gateway 2. For clients that understand IPFS,
you could use [3,4] and let them pick their own node/gateway.

The only downside is a rehashing step as you confirm both the native
IPFS protobuf hash and the image-spec raw-content hash, and
double-hashing is probably going to be cheaper than WAN transport ;).

 Subject: Re: [oci-tob] Proposal for an OCI Distribution Format Spec
 Date: Wed, 9 Mar 2016 21:56:40 -0800
 Message-ID: <20160310055640.GA10073@odin.tremily.us>

stevvooe · 2016-07-06T20:29:02Z

Having a URL-based mirror system would also allow folks to use IPFS
(or other protocols that don't use the raw content hash as the CAS ID
[1]) to distribute blobs. For compatibility with naive clients, you
can lean on the public gateway [2]. For clients that understand IPFS,
you could use [3,4] and let them pick their own node/gateway.

I'm not sure if we should consider this a good place to make that transition. The use case of this field is for assets that must come from a specific source.

I would think IPFS would work directly with an image layout.

wking · 2016-07-06T20:42:33Z

On Wed, Jul 06, 2016 at 01:29:02PM -0700, Stephen Day wrote:

Having a URL-based mirror system would also allow folks to use
IPFS (or other protocols that don't use the raw content hash as
the CAS ID [1]) to distribute blobs. For compatibility with naive
clients, you can lean on the public gateway [2]. For clients that
understand IPFS, you could use [3,4] and let them pick their own
node/gateway.

I'm not sure if we should consider this a good place to make that
transition. The use case of this field is for assets that must
come from a specific source.

Who cares where the blob comes from? You have the digest/size so you
can validate a blob sourced from an untrusted supplier.

I would think IPFS would work directly with an image layout.

Possibly. I'm guessing that choice will come down to how easy it will
be to plug that alternate source into your toolchain and how many
publishers are pushing images to IPFS. With a toolchain that has been
designed around image-spec (and didn't consider IPFS and similar) or a
body of publishers that are pushing to image-spec formats (and not to
IPFS and similar), this blob mirror hook may be an easy way to graft
on IPFS transport.

stevvooe · 2016-07-06T20:49:53Z

Who cares where the blob comes from? You have the digest/size so you
can validate a blob sourced from an untrusted supplier.

You're missing my point: the field was designed for a particular purpose. It allows distributors of software to host a layer outside of the main distribution flow. If these layers are in IPFS, they're already distributed in that channel and the use of the urls field is pointless.

wking · 2016-07-06T21:05:49Z

On Wed, Jul 06, 2016 at 01:49:54PM -0700, Stephen Day wrote:

Who cares where the blob comes from? You have the digest/size so
you can validate a blob sourced from an untrusted supplier.

You're missing my point: the field was designed for a particular
purpose. It allows distributors of software to host a layer outside
of the main distribution flow.

Ah, searching around turned up distribution/distribution#1725, where the new
media type is for “must come from one of the mirror URLs” 1. I
think “the usual channel will not work” is well covered by a …foreign…
media type. But “these are locations that will work” is meaningful
independent of that setting. Having both a set of mirror URLs and a
choice of media types, you can say each of the following things:

“You can get this from the usual place, but may also be able to get
it from these mirrors” (non-foreign media type and a populated
‘urls’).
“You cannot get this from the usual place, and have to get it from
one of these mirrors” (foreign media type and a populated ‘urls’).
“You can only get this from the usual place” (non-foreign media type
and unset ‘urls’).
“You're out of luck” (foreign media type and unset ‘urls’).

If these layers are in IPFS, they're already distributed in that
channel and the use of the urls field is pointless.

This assumes that “the usual place” is IPFS, and with the current
image-spec trajectory I think that is unlikely for most OCI-based
tooling. Putting an IPFS URL in here is an easy way get an
IPFS-ignorant toolchain using IPFS (with images provided by non-IPFS
publishers if a third party is mirroring the blobs in IPFS).

stevvooe · 2016-07-06T21:30:07Z

I think “the usual channel will not work” is well covered by a …foreign…
media type.

A foreign media type only signals to the implementation how to process the Windows layer. The presence of urls is independent of whether or not it is a special windows base layer. The urls field simply signifies there is an alternative fetch location.

Putting an IPFS URL in here is an easy way get an
IPFS-ignorant toolchain using IPFS (with images provided by non-IPFS
publishers if a third party is mirroring the blobs in IPFS).

If it works over HTTP/S, then fine. We will not, however, require clients to implement IPFS. This is probably not the right model and will lead to unnecessary complexity.

wking · 2016-07-06T21:40:30Z

On Wed, Jul 06, 2016 at 02:30:08PM -0700, Stephen Day wrote:

I think “the usual channel will not work” is well covered by a
…foreign… media type.

A foreign media type only signals to the implementation how to
process the Windows layer. The presence of urls is independent of
whether or not it is a special windows base layer. The urls field
simply signifies there is an alternative fetch location.

Yup, I think we're on the same page here 1.

Putting an IPFS URL in here is an easy way get an IPFS-ignorant
toolchain using IPFS (with images provided by non-IPFS publishers
if a third party is mirroring the blobs in IPFS).

If it works over HTTP/S, then fine. We will not, however, require
clients to implement IPFS.

And I think we're on the same page here 2.

This is probably not the right model and will lead to unnecessary
complexity.

Maybe. Whether I agree here depends on how the ecosystem evolves 3.

vbatts · 2016-07-18T15:05:30Z

I'm all for this. Especially given the nature of the descriptor, it has the mimetype to be expected. And the URL(s) could just be a redirector to mirrored content. It seems the change would not be exclusive to this single spot, rather a change to descriptors in general.
The field could be omitted if empty.

philips · 2016-08-03T20:16:48Z

+1 on adding this feature.

I am confused on why we need the new mime type though, that is just a hint/optimization? It seems like the language would just say that you SHOULD try to fetch from the URL instead if it exists and use that mimetype in the fetch. It is super confusing having the mimetype be "foreign" when the object you are getting back from the URL is really a "non-foreign".

wking · 2016-08-03T20:33:59Z

On Wed, Aug 03, 2016 at 01:16:51PM -0700, Brandon Philips wrote:

I am confused on why we need the new mime type though, that is just
a hint/optimization? It seems like the language would just say that
you SHOULD try to fetch from the URL instead if it exists and use
that mimetype in the fetch. It is super confusing having the
mimetype be "foreign" when the object you are getting back from the
URL is really a "non-foreign".

I'm not sure what you mean by “non-foreign”, but the idea behind the
‘foreign’ media type(s) is that the object is guaranteed to not be
available via “the usual channel” 1. Maybe “the usual channel” is
too vague to be something that you can guarantee at publication-time?
Attempting either the provided URL or the usual channel and falling
back to the other if the first attempt fails doesn't seem so
inefficient that I'm opposed to SHOULD though.

On the other hand, if we're ok with failed attempts, I don't see a
need to put the hint in CAS at all. Can't we just keep all of this
information (foreign hint and mirror URLs) in mutable refs [2,3,4]?

stevvooe · 2016-08-03T21:23:33Z

I am confused on why we need the new mime type though, that is just a hint/optimization? It seems like the language would just say that you SHOULD try to fetch from the URL instead if it exists and use that mimetype in the fetch. It is super confusing having the mimetype be "foreign" when the object you are getting back from the URL is really a "non-foreign".

The mimetype is incorrect. That is to mark whether or not the implementation should push or pull the image regularly. It is out of scope for urls support and should be considered separately.

philips · 2016-08-10T16:48:38Z

@runcom Do you want to clean this up and remove the new mime-type? Where do you want to go with this?

vbatts · 2016-08-24T13:26:58Z

@runcom bump

stevvooe · 2016-08-24T13:29:14Z

cc @jstarks @RobDolinMS @JLB13

philips · 2016-08-24T13:29:48Z

The discussion in the OCI f2f is that this is important for Solaris and Windows support. And both Microsoft and Oracle would like to see it in by v1.0.0. @JLB13 is taking point and if it makes it in in the next two weeks it will make 1.0.

cc @RobDolinMS @JLB13

runcom · 2016-08-24T13:49:37Z

@philips @vbatts et all - rebased and modified

wking · 2016-08-24T17:16:47Z

descriptor.md


-The following are field keys that MUST NOT be used in descriptors specified in other OCI specifications:
+  This OPTIONAL property specifies a list of URLs from which this object can be downloaded.


“object” → “blob” (see this and later).

we use object here as well https://github.com/opencontainers/image-spec/pull/169/files#diff-d1f5919405acb9727b345dc7b1c68785R19

not sure what to do, @stevvooe @vbatts wdyt

In this case, we are referring to a specific object, since we have type and size.

are you saying it's fine as object here?

On Thu, Aug 25, 2016 at 07:48:50AM -0700, Stephen Day wrote:

-The following are field keys that MUST NOT be used in descriptors specified in other OCI specifications:

This OPTIONAL property specifies a list of URLs from which this object can be downloaded.

In this case, we are referring to a specific object, since we have type and size.

The caller may know the type and size, but it's only downloading the
blob. And that only meets your “free-form binary data” criterion 1.

Nit: "may be downloaded."

We should use may instead of can

wking · 2016-08-24T17:23:01Z

descriptor.md


-The following are field keys that MUST NOT be used in descriptors specified in other OCI specifications:
+  This OPTIONAL property specifies a list of URLs from which this object can be downloaded.
+  For an ordinary layer, this property is empty and the layer contents can be retrieved directly from the registry.


Can we drop this line? I have a few concerns with it:

This is the descriptor spec and “layer” (i.e. application/vnd.oci.image.serialization.rootfs.tar.gzip) is a higher level idea. You could rename “layer” → “blob”, but…

Who knows if you can get the content from “the registry”? What registry are we talking about anyway? The descriptor spec doesn't currently give you a way to say “and you can fetch this blob from this registry” (that's sort of what urls is for).

The previous line clearly declares urls as OPTIONAL, so I think it will be pretty clear to implementers that in its absence, they should be asking for the blob in any CAS engines they know about, as configured by the caller. I don't think we gain any additional clarity by waving our hands about “the registry” here.

runcom · 2016-08-25T13:19:54Z

@stevvooe @philips @vbatts PTAL

stevvooe · 2016-08-26T23:03:14Z

@runcom Do you also want to handle specification of the "un-pushable" layer media type?

runcom · 2016-08-27T00:18:15Z

@stevvooe yes, I think we should say something like "a not empty list of urls means the object isn't supposed to be uploaded to a registry". Note I'm not sure about the language there because we shouldn't talk about registries. Any advice?

stevvooe · 2016-08-27T00:21:42Z

@stevvooe yes, I think we should say something like "a not empty list of urls means the object isn't supposed to be uploaded to a registry". Note I'm not sure about the language there because we shouldn't talk about registries. Any advice?

That is not the condition we use to make that decision. The existence of urls or not just means there are or are not urls at which to fetch the layer.

The media type of the target dictates whether or not the layer should not be distributed by an end user. This means it won't be pushed or added to an image layout file.

Does that make sense?

runcom · 2016-08-29T09:38:13Z

@stevvooe opened #216 to take care of the un-pushable media type (as per #169 (comment)).

This should be good to review and merge @vbatts @philips

stevvooe · 2016-09-07T18:30:42Z

@runcom Please address https://github.com/opencontainers/image-spec/pull/169/files#r77850606. Need to use "may" instead of "can".

runcom · 2016-09-07T18:35:09Z

Yup, currently afk though.

runcom · 2016-09-08T07:49:10Z

@stevvooe @philips @vbatts fixed and rebased - should be good to go

wking · 2016-09-08T16:02:19Z

descriptor.md


-The following field keys MUST NOT be used in descriptors specified in other OCI specifications:
+  This OPTIONAL property specifies a list of URLs from which this object may be downloaded.


“may” → “MAY”? This would allow a compliant implementation to ignore urls if it didn't want to support them (or didn't want to follow a particular URL, or whatever). In that case the “OPTIONAL” would mean “descriptor authors don't have to set this field” and the “MAY” would mean “implementations don't have to use this field”.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>

philips · 2016-09-09T05:23:59Z

LGTM

jonboulle · 2016-09-09T13:47:50Z

LGTM

stevvooe · 2016-09-09T20:30:40Z

LGTM

wking reviewed Jul 6, 2016
View reviewed changes

runcom changed the title ~~manifest: support urls for layers~~ manifest: layer federation Jul 6, 2016

philips added this to the post-v1.0.0 milestone Aug 24, 2016

philips modified the milestones: v1.0.0, post-v1.0.0 Aug 24, 2016

runcom force-pushed the layer-urls branch from dc87ccd to a3bf898 Compare August 24, 2016 13:49

wking reviewed Aug 24, 2016
View reviewed changes

runcom force-pushed the layer-urls branch from a3bf898 to 5b42064 Compare August 25, 2016 13:19

runcom mentioned this pull request Aug 29, 2016

media-types.md: add foreign layer media type #216

Closed

stevvooe mentioned this pull request Sep 1, 2016

layers: describe policy enforcement for foreign layer media type #233

Merged

runcom force-pushed the layer-urls branch from 5b42064 to 944f707 Compare September 8, 2016 07:48

wking reviewed Sep 8, 2016
View reviewed changes

descriptor: add urls property

64e7a30

Signed-off-by: Antonio Murdaca <runcom@redhat.com>

runcom force-pushed the layer-urls branch from 944f707 to 64e7a30 Compare September 8, 2016 16:50

stevvooe merged commit 5a0447a into opencontainers:master Sep 9, 2016

runcom deleted the layer-urls branch September 9, 2016 20:47

wking mentioned this pull request Nov 22, 2016

descriptor.md: Add the urls attribute instance in the Examples #437

Merged

wking mentioned this pull request Sep 1, 2017

Proposals of Policies of OCI Image Discovery xiekeyang/oci-discovery#1

Closed


		The following are field keys that MUST NOT be used in descriptors specified in other OCI specifications:
		This OPTIONAL property specifies a list of URLs from which this object can be downloaded.


		The following field keys MUST NOT be used in descriptors specified in other OCI specifications:
		This OPTIONAL property specifies a list of URLs from which this object may be downloaded.

manifest: layer federation #169

manifest: layer federation #169

Conversation

runcom commented Jul 6, 2016 • edited Loading

wking Jul 6, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wking commented Jul 6, 2016 • edited Loading

runcom commented Jul 6, 2016 • edited Loading

wking commented Jul 6, 2016 • edited Loading

runcom commented Jul 6, 2016

wking commented Jul 6, 2016

stevvooe commented Jul 6, 2016

runcom commented Jul 6, 2016

wking commented Jul 6, 2016

stevvooe commented Jul 6, 2016

wking commented Jul 6, 2016

stevvooe commented Jul 6, 2016

wking commented Jul 6, 2016

stevvooe commented Jul 6, 2016

wking commented Jul 6, 2016

vbatts commented Jul 18, 2016

philips commented Aug 3, 2016

wking commented Aug 3, 2016

stevvooe commented Aug 3, 2016

philips commented Aug 10, 2016

vbatts commented Aug 24, 2016

stevvooe commented Aug 24, 2016

philips commented Aug 24, 2016

runcom commented Aug 24, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

runcom commented Aug 25, 2016

stevvooe commented Aug 26, 2016

runcom commented Aug 27, 2016

stevvooe commented Aug 27, 2016

runcom commented Aug 29, 2016 • edited Loading

stevvooe commented Sep 7, 2016

runcom commented Sep 7, 2016

runcom commented Sep 8, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philips commented Sep 9, 2016 • edited by caniszczyk Loading

jonboulle commented Sep 9, 2016 • edited by caniszczyk Loading

stevvooe commented Sep 9, 2016 • edited by caniszczyk Loading

runcom commented Jul 6, 2016 •

edited

Loading

wking Jul 6, 2016 •

edited

Loading

wking commented Jul 6, 2016 •

edited

Loading

runcom commented Jul 6, 2016 •

edited

Loading

wking commented Jul 6, 2016 •

edited

Loading

runcom commented Aug 29, 2016 •

edited

Loading

philips commented Sep 9, 2016 •

edited by caniszczyk

Loading

jonboulle commented Sep 9, 2016 •

edited by caniszczyk

Loading

stevvooe commented Sep 9, 2016 •

edited by caniszczyk

Loading