Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add layer discovery/location to the image spec #15

Closed
brendandburns opened this issue Apr 9, 2016 · 24 comments
Closed

Add layer discovery/location to the image spec #15

brendandburns opened this issue Apr 9, 2016 · 24 comments

Comments

@brendandburns
Copy link

If a client has a URL/URI/file for an image spec. manifest, how does the client then know how to download/obtain the relevant layers?

This seems like its an important part of the image spec. that is currently not defined anywhere. Given the image manifest as its currently defined, I don't think that anyone will actually be able to distribute image manifests in a way where multiple different client implementations can figure out how to get the layers.

If we enter a world where each client does it differently, then the whole point of having a standardize specification will be lost, so I think we need to have some sort of guidance for this in the image spec.

I don't know that we have to require every implementation of specification to implement this part of the spec, but we should at least make it optional so that we don't end up with N different implementations.

It could be something as simple as:

https://my-server.com/layers/

Or we could actually add fields to the image spec that give a URL, but regardless we need something in order to begin implementation.

@wking
Copy link
Contributor

wking commented Apr 9, 2016

On Fri, Apr 08, 2016 at 10:10:06PM -0700, Brendan Burns wrote:

If a client has a URL/URI/file for an image spec. manifest, how does
the client then know how to download/obtain the relevant layers?

This seems like its an important part of the image spec.

That's an important part of a complete distribution system, but it's
pretty independent of the image-format itself. For previous
discussion, see the required-HTTPS and CAS API discussion in this
thread 1 and the “Why doesn't this project mention distribution?”
FAQ 2.

If we enter a world where each client does it differently, then the
whole point of having a standardize specification will be lost, so I
think we need to have some sort of guidance for this in the image
spec.

If different clients retrieve CAS objects differently but still share
the same image format, it allows for multiple CAS systems to share the
same pool of images.

I don't know that we have to require every implementation of
specification to implement this part of the spec, but we should at
least make it optional so that we don't end up with N different
implementations.

I think an optional CAS-over-HTTPS spec is part of the plan, it just
needs to go through some TOB review first. More on this in the “Why
doesn't this project mention distribution?” FAQ 2.

 Subject: Proposal for a new project: OCI Image Format Spec
 Date: Fri, 26 Feb 2016 00:33:07 +0000
 Message-ID: <CAD2oYtPCZRUXOER-Fz7WKsnNwYA-PoPmCG59jD1i3=xgYnGLZw@mail.gmail.com>

 Subject: ACTION: Proposal for a new project: OCI Image Format Spec (v3)
 Date: Sun, 20 Mar 2016 01:41:27 +0000
 Message-ID: <CAD2oYtNiOgCgLgZk--4bfrdywTmwP=bqSxvkSJ-rv=PTEoaTag@mail.gmail.com>

@philips
Copy link
Contributor

philips commented Apr 9, 2016

@wking @brendandburns Yes, I agree this is something we need to sort out in the TOB as soon as possible. Having the spec only container a manifest describing CAS objects and a way to serialize a filesystem to a CAS object doesn't help the 99% use case today of downloading the manifest and then knowing where to find the CAS objects.

I will raise this on the TOB mailing list again so we can start the discussion and make it a layer of this project. Thank you.

@jonboulle
Copy link
Contributor

+1, an optional but well-known mechanism for locating these layers seems
critical for making the spec usable.

On Sun, Apr 10, 2016 at 1:26 AM, Brandon Philips notifications@github.com
wrote:

@wking https://github.com/wking @brendandburns
https://github.com/brendandburns Yes, I agree this is something we need
to sort out in the TOB as soon as possible. Having the spec only container
a manifest describing CAS objects and a way to serialize a filesystem to a
CAS object doesn't help the 99% use case today of downloading the manifest
and then knowing where to find the CAS objects.

I will raise this on the TOB mailing list again so we can start the
discussion and make it a layer of this project. Thank you.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#15 (comment)

@vbatts
Copy link
Member

vbatts commented Apr 11, 2016

Glad it was not just me. This exact issue was brought up on the initial call.
+1 from me

@philips
Copy link
Contributor

philips commented Apr 11, 2016

I have raised this discussion to the TOB to put it into scope for this project. I will provide an update to this issue once that discussion is done. We can discuss the technical details here in the meantime.

@stevvooe
Copy link
Contributor

-1

The fact that we have the potential to resolve images in N different ways is a feature of OCI. By going down this route, we risk irrecoverably coupling identity and locality in the specification. When you define a target transport, it becomes very easy to have the specification grow towards that transport, limiting the possibilities with other transports.

When we start looking at the existing formats, the commonality is that their transport, naming, format and runtime are all tightly coupled. While vertical integration of implementations may make sense when solving a specific problem, doing so in specifications leads to inflexible and incompatible systems. This needs to be avoided in OCI.

We can look at this by taking a small bit of pseudocode. Resolving download locations of layers is the easiest problem here:

for layer in manifest.layers:
  location = resolve(layer.digest)
  fetcher.add(layer, location)

Above, we have a resolve function, which provides location and a fetcher, which manages the actual retrieval. This can be done today, with the current version of the registry and can already be used in developing this specification. The part of the specification "that is currently not defined anywhere" just doesn't need to be to for an image specification to add value.

Such pseudocode can also be adapted to SSH, IPFS, BitTorrent and myriad other patterns for image storage. Ideally, one would be able to adapt OCI to their specific environment, rather than be tied to the limitations of distribution implementations today or only those defined in OCI.

We also have the risk of the the scope of such a specification growing into defining the behavior of a CAS system. There still aren't many extant CAS systems in existence such that we can generically define their expected behavior.

@wking
Copy link
Contributor

wking commented Apr 12, 2016

On Tue, Apr 12, 2016 at 11:34:27AM -0700, Stephen Day wrote:

The fact that we have the potential to resolve images in N different
ways is a feature of OCI.

I agree with this, which is why I'm advocating a CAS API for testing
[1,2].

When you define a target transport, it becomes very easy to have the
specification grow towards that transport, limiting the
possibilities with other transports.

Agreed, which is another reason it makes sense to put any
OCI-maintained CAS protocol specs in other repositories (besides the
versioning complications 3). And again, that doesn't mean that the
OCI shouldn't develop a CAS protocol or two if it wants. It just
means that they'd be independent repositories (and potentially
different OCI Projects 4).

Such pseudocode can also be adapted to SSH, IPFS, BitTorrent and
myriad other patterns for image storage.

I'm not clear on how you'd use IPFS to back a generic CAS API, because
IPFS hashes the (protobuf) Merkle objects, not the payload hashes used
in the Docker manifest 5. Folks distributing over IPFS would
probably just use IPFS's unixfs to distribute their bundles, once IPFS
gets more robust handling for file metadata.

 Subject: Re: [oci-tob] Proposal for a new project: OCI Image Format Spec
 Date: Wed, 2 Mar 2016 20:24:48 -0800
 Message-ID: <2016030304...@odin.tremily.us>

 Subject: Re: [oci-tob] Discussion: Addition of Optional Transport
   layer to Scope Table and OCI Image Project
 Date: Mon, 11 Apr 2016 14:46:19 -0700
 Message-ID: <20160411214619.GK22888@odin.tremily.us>



 Subject: Re: [oci-tob] Proposal for an OCI Distribution Format Spec
 Date: Wed, 9 Mar 2016 21:56:40 -0800
 Message-ID: <20160310055640.GA10073@odin.tremily.us>

@stevvooe
Copy link
Contributor

@wking Thanks for the response!

I'm not clear on how you'd use IPFS to back a generic CAS API, because
IPFS hashes the (protobuf) Merkle objects, not the payload hashes used
in the Docker manifest [5].

This lends somewhat to my larger point: how can we interface with CAS systems that have disparate hash algorithms? It is doable, but seems a little out of scope.

Folks distributing over IPFS would
probably just use IPFS's unixfs to distribute their bundles, once IPFS
gets more robust handling for file metadata.

There would either have to be a mapping into IPFS's CAS system, which may force the implementation to interact oddly with IPFS. IPFS is probably just the beginning in a set of second-generation CAS systems.

@caniszczyk
Copy link
Contributor

A couple of comments from an LF perspective:

  1. ensure there are at least two weeks to see if consensus can be attained, this is a complex topic and there should be input from all members (see section 6(l) https://www.opencontainers.org/governance)

  2. we need clarity on how to amend the scope table, there was confusion last time on the process so we may need to update it to make sure it's clear and well defined. I'll put this in my queue to figure it out "The appropriate mechanism for adding, removing or modifying rows to this table (e.g. creating a proposal for an additional optional layer) is to bring it before the TDC. The TOB can be a source of appeal and/or can discuss if there isn’t a clear consensus in the TDC."

@wking
Copy link
Contributor

wking commented Apr 12, 2016

On Tue, Apr 12, 2016 at 01:00:42PM -0700, Stephen Day wrote:

I'm not clear on how you'd use IPFS to back a generic CAS API,
because IPFS hashes the (protobuf) Merkle objects, not the payload
hashes used in the Docker manifest [5].

This lends somewhat to my larger point: how can we interface with
CAS systems that have disparate hash algorithms? It is doable, but
seems a little out of scope.

I agree with both “doable” (IPFS-hash hints? Then a client can fetch
the hinted file, hash the content, and compare with the canonical hash
in the manifest) and “out of scope” (you probably don't want to bake
hash-hints into the manifest).

Folks distributing over IPFS would
probably just use IPFS's unixfs to distribute their bundles, once IPFS
gets more robust handling for file metadata.

There would either have to be a mapping into IPFS's CAS system,
which may force the implementation to interact oddly with IPFS. IPFS
is probably just the beginning in a set of second-generation CAS
systems.

The manifest format in this repo is just another way to map a
filesystem bundle into CAS. You seem interested in figuring out how
to use IPFS, etc. to resolve manifest hashes, where a client does
something like:

  1. Fetch manifest hash from a registry.
  2. Fetch signatures from Notary 1, etc. to validate the manifest
    hash.
  3. Fetch manifest from generic CAS (CAS-over-HTTPS, IPFS, …)
  4. Interpret manifest following this spec.
  5. Fetch layers, config, etc. from generic CAS.
  6. Construct local filesystem bundle from fetched objects.

Let's call that the “pluggable CAS” approach.

I think we (also?) want switching at the registry level 2, so
instead of mirroring individual CAS objects in the alternative
protocol, you just mirror the whole bundle (possibly with a different
serialization). Then the client would do something like:

  1. Fetch available protocol://hash values from a registry.
  2. Fetch signatures from Notary 1, etc. to validate those
    protocol://hashes.
  3. Of the valid protocol://hashes, pick your favorite protocol.
  4. Construct local filesystem bundle referenced by {hash} using
    {protocol}.

Let's call this the “registered protocol” approach.

The registered protocol approach makes it easy to use IPFS, or the
manifest specified in this repository, or cURLing a single big
tarball, etc., etc., to get your filesystem bundle, while still
sharing the same name-resolution and verification framework with
everyone else. The downside is that you'd need new signatures if the
protocol serialized the image differently (“I assert that
ipfs://QmVc6zuAneK… is a the same as
https+dockerv2://sha-256:01ba4719…”), but I'm fine with that. If you
don't have a reliable signer handy, you can still use the original
signatures with the pluggable CAS approach.

 Subject: Re: [oci-tob] Proposal for an OCI Distribution Format Spec
 Date: Wed, 9 Mar 2016 21:56:40 -0800
 Message-ID: <20160310055640.GA10073@odin.tremily.us>

@wking
Copy link
Contributor

wking commented Apr 12, 2016

On Tue, Apr 12, 2016 at 01:14:44PM -0700, Chris Aniszczyk wrote:

“… bring it before the TDC…”

For this to work, I think we need clarity on whether there is one TDC
or multiple TDCs (e.g. see 1). If there is a single TDC that
manages all OCI Projects, it probably needs it's own repo listing
maintainers/members and contact information (analagous to
opencontainers/tob).

@jonboulle
Copy link
Contributor

When we start looking at the existing formats, the commonality is that their transport, naming, format and runtime are all tightly coupled.

I am not sure what existing formats you're referring, but to take one example: in appc there's no tight coupling between naming and transport, or with runtimes (of which there are multiple). There's also an attempt at a next-generation discovery (naming) spec in abd which decouples transport/naming/format entirely. I don't think either of these specs are perfect but they should serve as evidence that it's possible to define these different aspects of the image container in a way that is pluggable, optional, and composable.

This can be done today, with the current version of the registry and can already be used in developing this specification. The part of the specification "that is currently not defined anywhere" just doesn't need to be to for an image specification to add value.

hmm? How can it be done today in a known way if it's not defined anywhere? you seem to be suggesting that implementers/developers of the spec should just assume they should use the current version of the registry? Maybe I don't understand.

Ideally, one would be able to adapt OCI to their specific environment, rather than be tied to the limitations of distribution implementations today or only those defined in OCI.

+100 on being able to adapt OCI to different environments, but I really don't follow your leap to the second point there about limitations - where is there any hint of this proposal being restrictive or prescriptive? The OP is pretty clear on it being entirely optional for implementers, but provided as a means of avoiding unnecessary fragmentation and making the spec usable.

@wking
Copy link
Contributor

wking commented Apr 13, 2016

On Wed, Apr 13, 2016 at 01:45:06AM -0700, Jonathan Boulle wrote:

… in appc there's no tight coupling between naming and
transport

That has:

… for example, with different scheme names representing different
transport mechanisms…

Which sounds like “registered protocol” approach 1 :).

Ideally, one would be able to adapt OCI to their specific
environment, rather than be tied to the limitations of
distribution implementations today or only those defined in OCI.

+100 on being able to adapt OCI to different environments, but I
really don't follow your leap to the second point there about
limitations - where is there any hint of this proposal being
restrictive or prescriptive? The OP is pretty clear on it being
entirely optional for implementers, but provided as a means of
avoiding unnecessary fragmentation and making the spec usable.

I think the connection is 2:

When you define a target transport, it becomes very easy to have the
specification grow towards that transport, limiting the
possibilities with other transports.

That's “it will be tempting to leak abstractions”, not “it will be
impossible to preserve abstractions”. If defining the optional
CAS-over-HTTPs transport happens in a different repository, with
independent versioning [3](and maybe a test suite that allows you to
certify an implementation against this spec without supporting that
optional layer), I think everyone will be happy. Folks who feel like
that transport is out-of-scope can just ignore the transport
repository.

@jonboulle
Copy link
Contributor

Folks who feel that transport is out-of-scope can also just ignore the optional parts of the spec, or use another spec with which they're fully comfortable. Separate repositories seems like an implementation detail.

@philips
Copy link
Contributor

philips commented Apr 13, 2016

Agreed that a repo is an implementation detail. As a reminder we have other
optional layers in this project that people can ignore: federated naming
and signatures.

On Wed, Apr 13, 2016 at 3:13 PM Jonathan Boulle notifications@github.com
wrote:

Folks who feel that transport is out-of-scope can also just ignore the
optional parts of the spec, or use another spec with which they're fully
comfortable. Separate repositories seems like an implementation detail.


You are receiving this because you commented.

Reply to this email directly or view it on GitHub
#15 (comment)

@wking
Copy link
Contributor

wking commented Apr 13, 2016

On Wed, Apr 13, 2016 at 12:21:28PM -0700, Brandon Philips wrote:

Agreed that a repo is an implementation detail. As a reminder we
have other optional layers in this project that people can ignore:
federated naming and signatures.

Sure, but it's not trivial to pack that into one repository. Folks
have to subscribe/unsubscribe on a per-issue/-PR level, and versioning
is complicated (tags like image-v0.4.0, signatures-v0.1.0, …?). If
everyone is on board with these ideas (the image schema, the CAS
protocol(s), name-to-image-hash lookup, …) being related but separable
concerns, it seems odd to wedge them all into one repo. If you expect
frequent cross-cutting issues (e.g. bumping the CAS protocol to
require image-schema changes), then you probably have the leaky
abstractions @stevvooe and I are worried about.

@stevvooe
Copy link
Contributor

It is an effort that would involve defining a CAS system, which is less than straightforward. I really think this would be a distraction that could unnecessarily complicate the image specification process.

I am not asking us to assume the current registry api version, but it is a sufficient target for getting something up and running (not sure why this isn't listed under "two independent experimental implementations from OCI members"). I'm sure and hope others will have targets that are just as sufficient.

The whole point is that we need to firewall the image specification process from making assumptions about the transport. Even an optional specification will taint this process. Let's define an image spec and see what the community at large does with it.

This doesn't us preclude from building up knowledge to inform a transport specification, but let's get the image specification out the door first.

@philips
Copy link
Contributor

philips commented Apr 13, 2016

On Tue, Apr 12, 2016 at 2:34 PM Stephen Day notifications@github.com
wrote:

The fact that we have the potential to resolve images in N different ways
is a feature of OCI. By going down this route, we risk irrecoverably
coupling identity and locality in the specification. When you define a
target transport, it becomes very easy to have the specification grow
towards that transport, limiting the possibilities with other transports.

People will transport these images over HTTPS. 99% of images today are
transported this way. To not provide an optional recommendation on how this
is done will lead to incompatibility and confusion.

We have other optional layers in this project too. The reason they are
optional is so they don't irrevocably get coupled.

When we start looking at the existing formats, the commonality is that
their transport, naming, format and runtime are all tightly coupled. While
vertical integration of implementations may make sense when solving a
specific problem, doing so in specifications leads to inflexible and
incompatible systems. This needs to be avoided in OCI.

Totally agreed. This is why we make it an optional layer of the
specification. Someone is going to specify how to download these things
over HTTPS and it should be this project.

Such pseudocode can also be adapted to SSH, IPFS, BitTorrent and myriad
other patterns for image storage. Ideally, one would be able to adapt OCI
to their specific environment, rather than be tied to the limitations of
distribution implementations today or only those defined in OCI.

Totally agreed, and we should standardize, as appropriate, that little bit
of pseudocode for those protocols.

In any case I am +1 on making this an optional layer here.

@stevvooe
Copy link
Contributor

@philips And that may continue to be the case. The difference will be in how the image is transported over HTTPS.

Why is it so important that this optional layer be defined at this point in time? Why not at least wait till we have a version of the image specification out before leaping? We already have ways to store images that can be used as guidance to build out the image specification.

@qianzhangxa
Copy link
Contributor

@philips, any updates on this topic? I see this issue has been put into the "post-v1.0.0" milestone, so does that mean image distribution method will be part of OCI image spec after the 1.0.0 release? Thanks.

@vbatts
Copy link
Member

vbatts commented Oct 6, 2016

@qianzhangxa this is certainly an important requirement for folks. It came up numerous times this week at LinuxCon EU. Getting the distribution of images in for v1.0 seems somewhat tangling with getting release of aspects of the Docker distribution API vs creating an entirely new alternative.
This is the large reason we've been pushing back for it to be a post-v1.0.0 definition. It is something that I am also very interested in seeing defined as well.

@Fak3
Copy link

Fak3 commented Oct 18, 2018

any progress on this?

@cyphar
Copy link
Member

cyphar commented Nov 14, 2018

OCI now has imported Docker's distribution specification into distribution-spec. I'm not sure what the plan is for releases of that.

@vbatts
Copy link
Member

vbatts commented Sep 30, 2021

I think this is substantially focused on the https://github.com/opencontainers/distribution-spec
and while folks can and are working on additional ways to serve/fetch images, artifacts, and their corresponding blobs, that is ongoing research and trial.

Closing this issue. Feel free to open a new issue defining additional refinement needed.

@vbatts vbatts closed this as completed Sep 30, 2021
sudo-bmitch pushed a commit to sudo-bmitch/image-spec that referenced this issue Aug 16, 2022
Signed-off-by: Sajay Antony <sajaya@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants