Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listing API Requirements #229

Closed

Conversation

SteveLasker
Copy link
Contributor

@SteveLasker SteveLasker commented Jan 30, 2021

This PR is intended to capture the details we should consider for implementing a list API as discussed here: #222

Signed-off-by: Steve Lasker <stevenlasker@hotmail.com>
Signed-off-by: Steve Lasker <stevenlasker@hotmail.com>
1. A user can get a list of repositories within a given registry, within a specified org and/or namespace.
- `oci-reg list acme-rockets.io/`
- `oci-reg list acme-rockets.io/org1/`
2. A user can get a list of tags within a given registry/namespace.
Copy link
Contributor

@joaodrp joaodrp Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful if the list could (optionally) contain not only the tags name (as-is) but also other useful details such as the underlying manifest digest and media type, and possibly the tag creation timestamp.

It is common to need such details (e.g., showing a tag list in a UI). Looping over a name list and doing multiple requests (one for the manifest and another for the configuration, if any) requires too much time and resources.

Copy link
Contributor Author

@SteveLasker SteveLasker Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joaodrp, this is a great topic. I've created another PR for the show/get-info requirements. I split them out to try and get a consensus on one at a time. But, your point of having to loop through a list to get details on each may be problematic as well. Open to ideas on how we split or merge a subset?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SteveLasker i'm a little confused -- @joaodrp seems to be asking if we could include a format for registry items that allows for metadata to go along with each listed item and you point to a show/get-info requirement which in my understanding doesn't speak to his request at all since the current PR is about the listing requirements. Should we take that to mean you're not open to specifying listed item metadata in this requirements document?

Copy link
Contributor

@joaodrp joaodrp Feb 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created another PR for the show/get-info requirements. I split them out to try and get a consensus on one at a time.

@SteveLasker, thanks. I'll have a look at it and share my opinion there.

But, your point of having to loop through a list to get details on each may be problematic as well

I should clarify: Some of the API features that I'm looking forward to are not feasible to implement on registries that rely on filesystem metadata (like the distribution registry does), at least not in a way that would allow them to be usable. For example, this is why the distribution registry doesn't support pagination on the tag list endpoint.

Doing a recursive walk on a remote blob storage backend and inspecting file by file is not only slow but expensive. This is especially true for large repositories, such as those with thousands of tags, where a single tag list request (with no additional details, just the tag names) can take many seconds to complete.

At some point, these limitations become a major blocker for providers. They can't ask users to wait this much time to see a list of tags in the UI (or get it through the API), nor can they fulfill most feature requests because the implementation would be extremely slow and costly.

As a result, most major registries have to adopt some database to store metadata, such as Harbor with PostgreSQL and Docker Trusted Registry with RethinkDB. This enables them to use fast and flexible query languages and attach additional metadata to each artifact, metadata which is not part of the image spec (e.g., image push date, push/pull count, repository size, etc.). And this is when they start implementing their own custom APIs.

This brings me to a key question: Is the intention to limit the distribution API spec to the information provided by the image spec and contained, e.g., in image manifests? Or do we want to look beyond that (to cover the additional data and features that each provider will otherwise implement in their own way)?

Signed-off-by: Steve Lasker <stevenlasker@hotmail.com>
@SteveLasker SteveLasker changed the title Initial structure for a set of Listing API Requirements Listing API Requirements Feb 2, 2021
Copy link
Contributor

@waynr waynr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can see what you're aiming for with this document in terms of specifying use cases and making sure there is a clear link between requirements gathered from providers/users and the API specs themselves.

Furthermore, my limited understanding from reviewing this and other discussions is that this document isn't meant to be specific to individual listing APIs but generic across listing APIs in a way that ensures consistency between them (ie so we don't end up with subtly different data structure formats, response types, approaches to pagination, etc).

So then the idea would be that we establish these requirements/guidelines, then we can start collaborating on the new API specs themselves? Does this sound right or am I missing something?

I do feel like I might be missing something, but if I'm not I wonder how much benefit this provides over beginning to work on the specs sooner rather than later; for each type of API (list, get, create, delete) we would establish best practices in an initial implementation then either use that as a prototypical example we point to when reviewing subsequent spec PRs or extract a template for future similar API types. For example, we might go through the exercise of collaboratively writing up the tags listing spec, extract a template with all the generalizable parts left in and TODO markers to be replaced when proceeding with subsequent listing api specs

The advantage of I see of moving on to focus directly on the API spec PRs themselves:

  • less opportunity for confusion about the purpose of a api type requirement vs a spec
  • get closer sooner to enabling providers to begin implementing the new specs

Anyway, I'm not feeling too strongly about one process over another -- i'm sure there are just as many advantages of starting with top-level requirements docs; i just thought I'd share an idea for moving forward given that it looks like some of the discussion in #222 is getting contentious and potentially unproductive.

Other than this and a couple questions in-line, it's not clear to me what else I can add here as a representative of a cloud provider and maintainer of distribution since I think the input I really want to give would be more appropriate on an actual manifest or tag listing PR.

By the way, thanks in advance for your consideration of and patience with my admittedly naive questions and input -- this is my first time participating in API spec design for an open source standard.

1. A user can get a list of repositories within a given registry, within a specified org and/or namespace.
- `oci-reg list acme-rockets.io/`
- `oci-reg list acme-rockets.io/org1/`
2. A user can get a list of tags within a given registry/namespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SteveLasker i'm a little confused -- @joaodrp seems to be asking if we could include a format for registry items that allows for metadata to go along with each listed item and you point to a show/get-info requirement which in my understanding doesn't speak to his request at all since the current PR is about the listing requirements. Should we take that to mean you're not open to specifying listed item metadata in this requirements document?

4. Provide filtering by annotation
5. Provide sorting

## Prioritization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these priorities relate to planned spec versions? Or are they more about required vs optional features? Are we expecting a phased delivery of features in registry implementations that are intended to move in lockstep with updates to the spec or are the phases merely a planning detail for designing the spec and we would expect most registries to implement the requirements roughly at the same time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great questions. I was hearing folks say, I need this now. While others might take more time to complete the design. I was just suggesting a way to breakup the design.

House example: You know you want to add a 2-car garage and a deck that has that great city view.
You don't know or care if the garage will have a second story, but you know you want it as a front entry.
You don't know if the deck will have a hot-tub and grill. But, you know the city is in the back of the house, so you need to leave room in the back.

Let's get enough figured out to create the blueprint, and then we can add over time.

@waynr
Copy link
Contributor

waynr commented Feb 4, 2021

Something else occurred to me while writing up my previous comment. I know it's kind of off-topic for this PR, but has there been any consideration for moving toward graphql as the/a query language for the spec? I think it's obviously not appropriate for the v1 spec, but it seems to me like it would be a good candidate for a next major version bump and would largely resolve a number of issues around consistency between endpoints, discoverability of provider-specific funtionality (via graphql introspection).

Again, I am pretty naive when it comes to API design considerations, but if anyone else thinks this might be a good idea I would be willing to help drive discussion about what it might look like in a way that doesn't block a v1 spec.

@SteveLasker
Copy link
Contributor Author

SteveLasker commented Feb 5, 2021

So then the idea would be that we establish these requirements/guidelines, then we can start collaborating on the new API specs themselves? Does this sound right or am I missing something?

Yes, a framework to agree on our goals and approaches. Think of it as a blueprint. Once the blueprint is done, we can go off an work on different parts "of the house", knowing how they all interrelate.

@SteveLasker i'm a little confused -- @joaodrp seems to be asking if we could include a format for registry items that allows for metadata to go along with each listed item and you point to a show/get-info requirement

I think your general question might summarize this. How do we want to define list APIs? Then, as the list APIs return results, how do we define a similar pattern across the APIs for returning results.
I might have split these too granualry as I was just trying to tease out two problems to debate independently. Perhaps these are more conjoined.

Maybe the design question is this:
In the OCI Distribution spec, how do we feel about returning an hierarchy of results? Do we want to support a list, get-info design, or a list with some set of results on each listed object?

Concern: How do registries manage permissions? If you can see the name of a namespace/repo, can you also get/read attributes on that repository?

I know it's kind of off-topic for this PR, but has there been any consideration for moving toward graphql as the/a query language for the spec?

Actually, that's exactly the kind of questions I'm asking. Since we've effectively hit the reset button for listing APIs, how do we want to design a consistent pattern for listing (with filter parameters) and results?

@jonjohnsonjr
Copy link
Contributor

graphql

@stevvooe suggested this a while ago, and I think it would be a big improvement, but I'm worried about broad support for v1, as you mention. I don't have much experience with graphql, but if it's not a huge departure from what currently exists (or imposes too much implementation burden), I'd be interested in seeing what that might look like.

Yes, a framework to agree on our goals and approaches. Think of it as a blueprint. Once the blueprint is done, we can go off an work on different parts "of the house", knowing how they all interrelate.

?

Since we've effectively hit the reset button for listing APIs

I don't think we've really hit the reset button. My proposal is a natural extension/adaptation of what already exists and doesn't introduce any new concepts or technology.

If we could punt all the non-essential requirements (filters, sorting, etc) into an optional graphql thing, that seems nice and tidy? Again, I'd be interested in seeing what that looks like and if it meshes well with existing APIs.

@SteveLasker
Copy link
Contributor Author

SteveLasker commented Feb 5, 2021

Looks like we have a two big discussions here:

  • How should listing payloads return data? Should they return just the item for a subsequent query for details? Or, should they return a json collection, which could be a collection of one item in the simple scenarios?
  • How do we enable a core set of results, with extensibility?

@jonjohnsonjr, you asked what would block the adoption of these new APIs by registry providers.
We would need to express the data that we support today so we can transition to the new apis. Maintaining two apis for the same scenario will likely not get adoption as it would be frustrating for developers to have to query two APIs. The standard one, and the registry proprietary one.

For the results APIs, happy to merge that conversation here, rather than have two PRs for requirements.

On the extensibility model, and to address the question on which properties MUST vs. MAY be returned.
Let's consider one of the requirements for these list APIs is to build a browsing experience that spans multiple registries. A few properties would be interesting. Even if not supported by all registries.
To avoid a browsing experience from having registry specific requests, can we agree on a set of names that registries MAY implement. If they don't, they're just empty. Consider annotations. You MAY add org.opencontainers.image.created, but it's not required.
For the results, you MAY add org.opencontainers.repo.pullCount, or not. The exact name isn't important. I'm asking can we come up with a standard set of properties, which registry operators could add. For instance: vnd.azure.repo.teleportEnabled would be something we would add, but users can get this value from the same oci distribution-spec API.

The graphqql topic is a great discussion for how we support these requirements. I don't have a strong opinion here, other than we support a set of core needs.

@SteveLasker
Copy link
Contributor Author

There was a set of discussions on the Artifact Manifest PR relating to a listing API for finding linked artifacts. The discussion felt more appropriate for our Listing and Result requirement discussions.

perhaps as an alternative the API should return just the digest(s)
For a variety of reasons, we should probably return an entire descriptor instead of just the digest, so clients know how to handle the digest.

When working through Artifact scenarios, we realized the limitations of just the descriptor. Even in the image-only scenario, you didn't know if the result was for a specific platform.
In the artifacts scenarios, we didn't know if the result was a helm chart, a signature, or others. Depending on the list API, you'd really want to filter on a specific type.
I had played with the idea of adding artifactType to the descriptor. In addition to the immediate resistance, it would provide some challenges with changing a descriptor.
The other thing we realized is depending on the artifact type, there was additional information different clients would need. For the notary scenario, there may be several keys associated with an artifact. While the client could page through each descriptor reference, and pull each returned artifact, it would then have to pull a config or blob/layer to understand if it's the ACME Rockets signature or another signature that doesn't match the requirements for a deployment.

There's always a balance of how many round trips a client should have to make for a particular scenario.

  • how much data must be processed on the server to return the result, both data access and compute processing time
  • size of the payload
  • number of API calls required to get actionable details

The challenge we face is the descriptor is non actionable, and all results must be pulled to get the info. The total processing time is longer
Trying to find an appropriate subset is challenging.
The thought around returning the manifest provides the data needed to figure out which artifact to actually pull.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants