Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a "Porchfile" to kpt packages that stores porch-specific metadata #783

Open
kispaljr opened this issue Jul 31, 2024 · 13 comments
Open
Assignees
Labels
area/platform area/porch Porch related issues design Feature or component design task / documentation

Comments

@kispaljr
Copy link
Contributor

kispaljr commented Jul 31, 2024

TL;DR

Add a YAML file named Porchfile to kpt packages that are managed by porch to persistently store porch-specific metadata.

Overview

Porch handles all kinds of metadata that is related to a kpt package, but that is not strictly the content of the package. Historically porch used the Kptfile to persist this kind of metadata together with kpt's metadata. This was natural, since kpt and porch were parts of the same project. Recently porch was moved out from kpt, so we, the porch maintainers no longer have ownership over the Kptfile schema, but some of the new features we are trying to implement needs us to attach new kinds of metadata to the package. Hence this proposal to add our own, porch-specific metadata file, called Porchfile, to porch-managed kpt packages to store porch-specific metadata. This would keep compatibility with kpt, but allow us to add new kinds of porch-specific metadata, at the same time.

Porchfile should contain a valid KRM object, with the local-config annotation. Here is a possible example Porchfile content:

apiVersion: porch.kpt.dev/v1
kind: Porchfile
metadata:
  annotations:
    config.kubernetes.io/local-config: "true"
  name: my-kpt-package
spec:
  readinessGates:
    - req.nephio.org/v1alpha1.Interface.n3
    - workload.nephio.org/v1alpha1.NFDeployment.upf-cluster01
status:
  conditions:
  - message: update condition for initial resource
    reason: workload.nephio.org/v1alpha1.NFDeployment.upf-cluster01
    status: "False"
    type: req.nephio.org/v1alpha1.Interface.n3
  - message: update for condition
    status: "False"
    type: workload.nephio.org/v1alpha1.NFDeployment.upf-cluster01
  lastRender:
    error: ""
    result:
      exitCode: 0  

What kind of metadata should be stored in Porchfile?

Right now porch handles two kinds of package metadata:

  • "persistent metadata" is persisted in git/oci alongside the package. Right now these are mostly stored in the Kptfile (upstream package reference, mutation pipeline, readiness gates, conditions), but some are encoded in the name of a git branch (how workspace names for draft/proposed packages are stored), or in git annotations (how workspace names for published packages are stored).
    When a new Repository is registered in porch and the package revisions are automatically detected, "persistent metadata" parameters of the package already contain their latest values that was set by the last update operation. Furthermore if the same Repository is registered in another porch in another Kubernetes cluster, then the "persistent metadata" values of the same package will eventually be the same in the two separate porch instances, in the two separate PackageRevision objects. Also, changing a persistent metadata value in one porch instance will be visible in all other porch instances that have the same Repository registered in them.
    All-in-all the lifecycle of "persistent metadata" is bound to the lifecycle of the package itself, and their scope spans across all porch instances.

  • "ephemeral metadata" is stored in the etcd of porch's Kubernetes cluster. Some examples would be owner references, labels and annotations of a PackageRevision.
    When a new Repository is registered in porch and the package revisions are automatically detected "ephemeral metadata" parameters are empty, and if these values are changed, the changes are lost when the Repository is unregistered. Furthermore if the same Repository is registered in another porch in another Kubernetes cluster, then changing an ephemral metadata value in one porch instance won't be visible in others.
    So the lifecycle of "ephemeral metadata" is bound to the lifecycle of a single Repository registration, and its scope is limited to one porch instance.

The proposal is to store new kinds of "persitent metadata" in the Porchfile, and maybe migrate the exisisting porch-owned metadata from the Kptfile to the Porchfile to allow future schema changes.

Alternatives considered

Kptfile

A natural alternative to adding a new Porchfile would be to keep adding new fields to the Kptfile itself. However, we can only modify our own version of the Kptfile schema, that is stored in the porch codebase, and that would make porch-managed kpt packages incompatible with the kpt CLI tool. This is a problem for porch developers and users, since we still heavily utilize the kpt command for development and troubleshooting of kpt packages, e.g. to run kpt fn render on a local copy of a kpt package, or deploying a kpt package by kpt live apply.

Metadata store

Besides git/oci, porch currently has another storage backend, the so-called metadata store. But that is designed to store "ephemeral metadata", and this proposal is about the "persistent type metadata" only.

@kispaljr kispaljr added design Feature or component design task / documentation area/porch Porch related issues area/platform labels Jul 31, 2024
@kispaljr
Copy link
Contributor Author

@henderiw
Copy link
Contributor

Ok lets firsts try to highlight the problem we want to solve as this is not clear from this issue.

@kispaljr
Copy link
Contributor Author

kispaljr commented Jul 31, 2024

I will try to rephrase it:
We want to store extra information that is related to a particular kpt package, but that is not strictly its content. All such information/metadata is now stored in the Kptfile (ref to the upstream package, readinessGates, conditions, etc.). Now we encountered a situation, when we want to store a new kind of metadata needed for porch to do its job properly, but we cannot just add a new field to the Kptfile, since kpt is a separate project now and they own the schema of the Kptfile.

So, according to this proposal, porch will to create its own file (Porchfile) that is similar to Kptfile, but separate from it, and whose schema is fully owned by the porch project. This will maintain compatibility between kpt packages managed by porch and the kpt CLI command, and still allow us a place to introduce new kind of data that we want to store in a package.

@henderiw
Copy link
Contributor

henderiw commented Aug 1, 2024

Why does it have to be a file and not just metadata to the PR CRD?

@kispaljr
Copy link
Contributor Author

kispaljr commented Aug 1, 2024

That is what I tried to justify in the original wording of the PR, by mentioning the use case when e.g. a vendor's blueprint repo is registered in multiple customers' porch instances.
But another good example would be if you unregister a repository from porch (by deleting the Repository object) and re-register it. All labels and annotation added to the PackageRevision CRs will be lost after re-registering the repo. And this is fine, but there is types of package metadata that you don't want to lose when you unregister the repo from porch, for example: the reference to the upstream package, or the readinessGates. This kind of metadata should be stored alongside the package in git/oci. And the easiest way to add this information to git/oci is to create a file. I would say that this is reasoning behind having a Kptfile in the first place.

@henderiw
Copy link
Contributor

henderiw commented Aug 1, 2024

There is some conflicting info here:

  1. you are talking about a blueprint, you don't want specific information stored in a common blueprint repo.

and some info specific to porch implementation.

  1. the fact you loose metadata if you deregister a repo it is a bug in porch. The PR is a way to convey information to the user about the result of things that happened. I don't believe it is good practice to store metadata in git. The fact we use the kptfile to convey information like readiness is also not the right way.

@kispaljr
Copy link
Contributor Author

kispaljr commented Aug 1, 2024

Yes. I agree, but the backend storage behind PackageRevision objects is git, isn't it? Similarly how normal CRs are stored in etcd.

As far as I can see, all persistent data presented via PackageRevision objects has to be somehow stored in git. Either encoded in the name of a git branch (how we store workspace names for draft/proposed packages), or in git annotations (like how store workspace names for published packages), or in files in git (that's what we do with readinessGates in Kptfile), or maybe using some other git concept, but definitely in git.

Am I missing something?

[In my understanding the metadata store in porch was designed for ephemeral data, that is known to be lost on unregistering the repo]

@kispaljr
Copy link
Contributor Author

kispaljr commented Aug 26, 2024

For the record: I completely rewrote the original proposal text in the first comment, based on the discussion with @henderiw . I hope the new version is a more concise and digestible description of what we want to add and why.

@tliron
Copy link

tliron commented Aug 26, 2024

I support this idea and indeed implemented it in the TKO experiment, where it's used to keep track of bookkeeping metadata (URL of the original template used) as well as allowing for some optional user-controlled knobs for changing default behavior, such as disabling auto-approval (example).

I don't think "Porchfile" is the best name, though, as hopefully it can be agnostic to any specific implementation.

If we accept the basic idea we can move on to discussing its structure and what should be included.

@kispaljr
Copy link
Contributor Author

By all means, let's find a better name for this file :)

@henderiw
Copy link
Contributor

this is the proper way to solve this, rather than keep hacking. My 2 cents.
#598

@kispaljr
Copy link
Contributor Author

kispaljr commented Aug 27, 2024

I see this as a different, independent problem.
I assume even if PackageRevisions are CRs, we want to keep registering Repositories and auto-discovering the packages in them (i.e. there will be a PackageRevision controller that creates/updates/deletes PackageRevisions based on what is in git). If this is true, then we still have to decide where the "persistent metadata" attached to a package is stored between Repository unregistration and re-registration. In other words we have to decide where the PackageRevision controller will read the persistent metadata values from, during auto-discovery.
Porchfile tries to solve the problem of persisting the kind of metadata whose lifecycle is bound to the lifecycle of a package in git (that I called "persistent metadata"), and not the kind of metadata whose lifecycle is bound to the lifecycle of a PackageRevision KRM object (that I called "ephemeral metadata").

@henderiw
Copy link
Contributor

Ok I reread your new text, but the way to solve this is a bit different in my view and I am of the opinion porch should just care about access to git, not about any specialization, etc. As such a workspace name is a local matter, so it does not need to be persisted and it does not need to be coordinated across clusters.
Wrt condiitons, etc that work is triggered from a PV and the corresponding technologies should report their status, so the reproducibility should come from that chain not porch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform area/porch Porch related issues design Feature or component design task / documentation
Projects
Status: In Progress
Development

No branches or pull requests

4 participants