Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEP-0075] Propose object (dictionary) param and result types 🤓 #479

Merged
merged 1 commit into from
Feb 14, 2022

Conversation

bobcatfish
Copy link
Contributor

This TEP proposes adding support for more structured params and results
by introducing objects (aka dictionaries), and introducing a (small)
subset of json schema syntax for declaring keys in the object.

This refers to TEP-0074 which will be added in a separate commit and
proposes deprecating PipelineResources.

This is based on @skaegi's proposals in tektoncd/pipeline#1393

/type tep

@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 16, 2021
last-updated: '2021-07-14'
authors:
- '@bobcatfish'
- '@skaegi' # I didn't check with @skaegi before adding his name here, but I wanted to credit him b/c he proposed most of this in https://github.com/tektoncd/pipeline/issues/1393 1.5 years ago XD
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skaegi let me know if you'd rather not be associated with this XD

@bobcatfish bobcatfish added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Jul 19, 2021
@bobcatfish bobcatfish changed the title [TEP-0075] Propose object (dicitonary) param and result types 🤓 [TEP-0075] Propose object (dictionary) param and result types 🤓 Jul 19, 2021
@ghost
Copy link

ghost commented Jul 19, 2021

/assign @sbwsg

@tekton-robot tekton-robot assigned ghost Jul 19, 2021
@bobcatfish
Copy link
Contributor Author

/assign sbwsg afrittoli

teps/0075-object-param-and-result-types.md Outdated Show resolved Hide resolved
teps/0075-object-param-and-result-types.md Show resolved Hide resolved
teps/0075-object-param-and-result-types.md Outdated Show resolved Hide resolved
teps/0075-object-param-and-result-types.md Outdated Show resolved Hide resolved
- name: notify-slack
params:
- name: message
value: "about to clone $(params.gitrepo.url) at $(params.gitrepo.commitish)"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One wrinkle wrt dictionary results. Given the variable $(results.foo.path) and the following schema:

results:
- name: foo
  schema:
    type: object
    properties:
      path: {}

What does my variable resolve to? Would we introduce validation to ensure a schema doesn't trample Tekton's built-in variables? Or allow users to override Tekton's vars with their own schema?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing bracket notation ['path'] for dictionary params and results instead would allow us to walk this line without collision. e.g. results.foo.path could be the tekton built-in and results.foo['path'] could be the dictionary entry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we introduce validation to ensure a schema doesn't trample Tekton's built-in variables?

hm this is a weird thing, but maybe not the worst thing, as long as the behavior was clear. i think you identified the best two options:
a. don't allow it
b. choose a precedence order, probably deciding the Task's param overrides the built in

Introducing bracket notation ['path'] for dictionary params and results instead would allow us to walk this line without collision.

It looks like jsonpath supports both:

. operator is the dot-child operator, which you use to denote a child element of the current element.
[ ] is the subscript operator, which you use to denote a child element of the current element (by name or index)

I'm thinking if they are both valid jsonpath, we might end up supporting both eventually? but maybe not, maybe we pick one now and that's good enough - in that case, i think i'm leaning toward [] slightly b/c that would be consistent with array indexing syntax (but maybe that will be dropped from #477 anyway haha)

If we DO want to support both eventually, then im not sure that using []will sidestep the problem you pointed out above, so I think we still need to choose (a) or (b) above - (a) would have less chance of unexpected behavior but (b) would be the most flexible. Maybe we go with (a) and if someone really needs it we could switch to (b)? (harder to go the other way)

One more idea: we could view our current syntax as already allowing dictionary/object replacement, e.g. $(results.foo.path) perhaps this syntax is saying we have a dictionary called results with a value called foo which is also a dictionary with a key called path? In that case using . syntax now would be the most consistent.

Anyway I'll wait to hear your thoughts before updating the TEP @sbwsg !

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to your idea of redefining Tekton's own built-in variables in terms of a dictionary schema, I really like that concept.

The problem I'm now realizing with disallowing a user's schema from overriding tekton's built-ins is that we could accidentally break folks with new releases if we added any new variables: Imagine a user defines a schema with foo: properties: bar: {} and in an update Tekton introduces a new built-in variable of foo.bar. If we took the (a) approach we'd start rejecting the user's dictionary because we've got this new built-in now.

I think precedence order (user's schema wins) is probably the only reasonable solution after all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely this would me we should lean toward prefixing our variable with something that would be reserved (tekton.* or builtin.*) and state as soon as it starts that those are not allowed to be set by the user.

Copy link
Contributor Author

@bobcatfish bobcatfish Aug 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to your idea of redefining Tekton's own built-in variables in terms of a dictionary schema, I really like that concept.

tangent: actually i wonder if this could be an alternative way of handling variable replacement we could pursue in the future (esp if we start supporting more pipeline features via pipeline in a pod) - we could provide a dictionary of values to the executing pod and the entrypoint binary could be in control of the actual replacement 🤔

The problem I'm now realizing with disallowing a user's schema from overriding tekton's built-ins is that we could accidentally break folks with new releases if we added any new variables:

ahhh that's a great point - i agree, ill update the proposal with (b)

Most likely this would me we should lean toward prefixing our variable with something that would be reserved (tekton.* or builtin.*) and state as soon as it starts that those are not allowed to be set by the user.

That's a great idea too 🤔 im trying to understand if that would address the specific problem @sbwsg was pointing out, which seems to be specific to results (params afaik dont have any extra fields, but maybe if we had optional params one day there might be something like params.foo.bound, anyway I think the example @sbwsg is the only one we would run into today, which would be someone defining results.someobject.path - prefixing that whole thing (tekton.results.someobject.path) would have the same problem, the prefix would only help if it came right before path, e.g. results.someobject.tekton.path ? but then we'd have the same collision problem with tekton 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbwsg while writing this out I've realized that I don't think the current collision would actually cause a problem.

For example (I added an extra key somethingelse):

results:
- name: foo
  schema:
    type: object
    properties:
      path: {}
      somethingelse: {}

$(results.foo.path) would only be used in the context of the Task accessing the path to the file where it would write the structured result - there would be no reason to support variable replacement for accessing the keys in the Task writing the result, e.g. what would $(results.foo.somethingelse) resolve to? There'd be no reason to use it within the Task writing the result - it would only be useful in another Task consuming the result via $(tasks.previousTask.results.foo.somethingelse) or $(tasks.previousTask.results.foo.path).

Anyway I think all of this is worth articulating in detail in the proposal - and even though there are no collisions now I think it makes sense to propose a policy for how to handle them in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely this would me we should lean toward prefixing our variable with something that would be reserved (tekton.* or builtin.*) and state as soon as it starts that those are not allowed to be set by the user.

As a shameless plug, I'm working on turning tektoncd/pipeline#3590 into a TEP now.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, very good point!

@ghost
Copy link

ghost commented Jul 23, 2021

I left a bunch of comments but given we're looking at proposal rather than implementation stage I'm really happy to see this get merged!

/approve

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 23, 2021
@jerop
Copy link
Member

jerop commented Aug 2, 2021

/assign

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 2, 2021
teps/0075-object-param-and-result-types.md Show resolved Hide resolved
teps/0075-object-param-and-result-types.md Outdated Show resolved Hide resolved
teps/0075-object-param-and-result-types.md Outdated Show resolved Hide resolved
teps/0075-object-param-and-result-types.md Outdated Show resolved Hide resolved
Comment on lines 218 to 219
* What if a Task tries to write more keys to a result than the result declares it contains?
* The TaskRun would fail (alternatively we could decide to ignore the extra keys)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why would we want them to fail ? allowing "extra" keys means it would be more easy to integrate existing tools into tasks, or use a similar tool (that generate the same output) for different task with different results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll flip this around!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah and actually this lines up with the additionalProperties config in json objects - which looks like it's true by default, i.e. allowing the inclusion of extra fields

Comment on lines 220 to 221
* What if a Task tries to write less keys to a result than the result declares it contains?
* The TaskRun would fail
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have the notion of "optional" keys ? does it make sense ? otherwise, agreeing less keys should probably be make it fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm i tried to figure out how we'd express this with json schema and was surprised to see that by default all fields are optional unless they are explicitly required 🤔

I'm a bit torn because i dont want to augment json schema, but at the same time i feel like defaulting to "required" is more consistent with our existing interfaces. we could allow them all to be optional by default but fail once variable replacement tries to use them? or we could imply a required section that includes all the values when its not present but that feels like a slippery slope 🤔

seems like other ppl have run into similar concerns: https://stackoverflow.com/questions/16204871/what-is-the-difference-between-required-vs-optional-in-json-schema

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that's nice about defaulting to optional is it makes it makes the default behavior safer for adding new fields. This will be particularly useful if we allow reusable schema definitions, since this would allow you to add a new field to a shared schema, then update the Tasks that use the field to now accept it.

e.g. if we wanted to add a new field to a Git resource type, optional fields would let phase the field in over time:

  1. Add the field to the schema
  2. Have producers / results start populating the field
  3. Consumers start using the field once it's populated.

we could allow them all to be optional by default but fail once variable replacement tries to use them?

+1. It's a bit strange to have required optional fields, but I think this makes sense if we support shared schema types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh that's a great point @wlynch , I hadn't thought of that

[pipelines#1393](https://github.com/tektoncd/pipeline/issues/1393#issuecomment-558833526))?

**Proposal**: if possible (can find out with a proof of concept), we deprecated `type:`. We would still default to string
when `schema` is not present, but when [array] or dict/object types are used, we use `schema`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently i'm leaning toward continuing to support "type" and never actually deprecating it - ill add a section on this

@mattmoor
Copy link
Member

Hey I came across this looking for existing TEPs that might cover what I'm after, and this is close.

As an example, I might want to let users specify a step for syncing their source:

  params:
    - name: source-step
      description: The step to execute to fetch source
      schema:
        type: dev.tekton.taskruns.Step # This is a strawperson
  steps:
  - $(params.source-step)
  ... # More literal steps

Or a slightly more sophisticated version might blend these proposals to allow a wrapper that lets the param include volumes (for mounting secrets):

  params:
    - name: sync-source
      description: The step to execute to fetch source
      schema:
        type: object # see question below
        properties:
          step: {
            type: dev.tekton.taskruns.Step
          }
          volumes: {
            type: []dev.tekton.taskruns.Volume
          }
  steps:
  - $(params.sync-source.step)
  ... # More literal steps

  volumes:
  - $(params.sync-source.volumes)

Needing to support substitutions of structured things will make things a bit more complicated, but I think it's doable (and I'm ofc happy to chip in to help make this happen).

Copy link
Member

@wlynch wlynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally forgot to hit submit on this a few days ago... 🤦

Comment on lines 230 to 231
* Using the json schema syntax to describe the structure of the object is a bit verbose and might not be intuitive
to users; see [Alternatives](#alternatives) for discussion of an option where we create our own syntax.
Copy link
Member

@wlynch wlynch Aug 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be out of scope for this TEP, but something that would be very useful is if there was a mechanism for Task authors to reuse existing type definitions instead of defining their own schemas / copy pasting. It might be useful to pull this into a CRD that can be referenced by params. e.g.

apiversion: tekton.dev/v1
kind: ParamSchema
metadata:
  name: GitRepo
spec:
  type: object # see question below
  properties:
    url: {
      type: string # for now, all values are strings, so we could imply it
    }
    path: {} # example of implying type: string

which then could be used in Tasks:

params:
  - name: pull_remote
    schemaRef: GitRepo

This could be the way how we provide well-known types.

Going a step further (but also definitely out of scope for this TEP), this might be a mechanism we could use to support schema refs for composability.

edit: I think this is inline with what @mattmoor is suggesting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was thinking is sort of inline with the idea of SchemeRefs, but built-in because we need to know the type/scheme to support substitution. This feels like a nice way to test the waters with schema refs for fairly well defined things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pursuing schema refs and having known schema types sounds like a great next step after this TEP - I'll include it in a "future work" section in the TEP. Having well known types like GitRepo that ship with Tekton would be really useful (esp for tools built on Tekton, like chains). If these types are built into Tekton I think the next question is what to do when people want to provide their own (also a problem we had with PipelineResources - adding your own PipelineResources required changing the Tekton Controller) - maybe people can make their own ParamSchema instances - and/or maybe we could use something as simple as a ConfigMap?

I think @mattmoor 's idea above goes a bit further than this and suggests being able to use these schemas to define the body of the Tasks themselves, e.g reusable steps which sounds to me like another proposal entirely (which maybe again could leverage this one?) and reminds me of TEP-0054 step reuse - and it also sounds like we're starting to talk about going beyond just "variable replacement" and into the realms of templating 🤔

@tekton-robot tekton-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 21, 2021
@bobcatfish
Copy link
Contributor Author

bobcatfish commented Aug 21, 2021

Thanks for all the feedback so far! There was a lot that was under specified XD It's ready for another look @sbwsg @vdemeester @mattmoor

Questions I'd like to ask you as you review:

  • What do think of the suggested approach in "required vs optional keys" - do you prefer a different approach or have any other ideas?
  • Do you think we should deprecate the type: array syntax or maintain that and the new schema syntax? (see "what to do with the type field")
  • I kind of like Alternative 1 where we introduce a small amount of shorthand on top of json schema (which would actually answer my previous question too) - do you agree or do you think we should just be verbose? (Or even define our own syntax?)

If we wanted to make our adoption of the json schema syntax a less verbose, we could make a couple of changes:

- We could imply `type:string` for keys by default (especially since this is the only initially supported type)
- We could remove the top level schema section and embed the details in the param/result definition.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: ive just realized that having a top level schema section is actually something we added and not a json schema specific thing so i want to update the proposal above to not use the top level schema section, and add it as an alternative instead

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 1, 2022
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 2, 2022
Copy link
Member

@jerop jerop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @bobcatfish, this is exciting 🎉

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 4, 2022
@jerop jerop mentioned this pull request Feb 7, 2022
Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @bobcatfish - I didn't manage to finish my review today but I wanted to share my feedback so far. The TEP looks great up - read up to L349.
I have a few comments / ideas that I'd like to discuss further, possibly for future iterations.
I will complete the review tomorrow.

these Pipelines cannot reliably tell when images are being built, or git repos are being used. Current ways around this
are:

* [PipelineResources](https://github.com/tektoncd/pipeline/blob/main/docs/resources.md) (proposed to be deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecated now 🎉


* [PipelineResources](https://github.com/tektoncd/pipeline/blob/main/docs/resources.md) (proposed to be deprecated
in [TEP-0074](https://github.com/tektoncd/community/pull/480))
* Defining param names with special meaning, for example
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps results are more relevant for chains? 🤔

Comment on lines +96 to +97
* Adding support for nesting, e.g. object params where the values are themselves objects or arrays
(no reason we can't add that later but trying to keep it simple for now)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Just to clarify, when using params / results of type object, keys and values must be strings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor concern is that doing non-nested first and nested later might create more churn in the interfaces of the Tasks and Pipelines. But I think it still makes sense.

Comment on lines +113 to +117
Tekton could also define known interfaces, to make it easier to build these tools. For example:
* Images - Information such as the URL and digest
* Wheels and other package formats (see below)
* Git metadata - e.g. the state of a git repo after a clone, pull, checkout, etc.
* Git repo configuration - e.g. the info you need to connect to a git repo including url, proxy configuration, etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you envision those images will be defined in a machine consumable format?
Tasks may need to refer to such formats by some kind of name / reference, or alternatively tests may need to verify that a task complies with the given known interface, because else external tools that rely on it won't work (like chains).

Comment on lines +124 to +133
results:
- name: sdist_sha
description: sha256 (and filename) of the sdist package
- name: bdist_sha
description: sha256 (and filename) of the bdist package
- name: package_name
description: name of the uploaded package
- name: package_version
description: version of the uploaded package
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

I think this goes in the direction of saying that a task produces some artefact, which is described by an object / dict result. This is the kind of metadata / information that other tools need to handle these results nicely with less need for custom integration and annotations.

Tekton could emit higher level of abstraction events when these kind of results are produced - see the CDEvents project.


## Requirements

* Must be possible to programmatically determine the structure of the object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean at authoring time, i.e. provide a schema?
I think it makes sense, but still I'm a bit torn on this.

On one side, I understand the need to validate inputs and match interfaces, so it definitely should be possible to provide a schema. Not having a schema would be similar in a way to supporting dynamic results, which do not need to be declared in the Task or Pipeline definition.

On the other side, I think that leaving this more open might give our users / pipeline authors a bit more flexibility and make it easier to write tasks. Writing a schema feels like an high bar of entry, even though with the current limitation of dict of strings it's not that complex.

To make an example from another system, GitHub actions allow one to define the outputs in the action metadata (like our results in Tasks), however outputs can be added dynamically as part of the script too - see https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#setting-an-output-parameter.

used and supported. Instead of being an early adopter of CUE we could wait until it is more popular and then consider
using it (we can also delay this decision until we want to add more schema support).

### Matching interfaces and extra keys
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice!! :)

* [Matching interfaces and extra keys](#matching-interfaces-and-extra-keys)
* [Required vs optional keys](#required-vs-optional-keys)

### Defaulting to string types for values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines +271 to +273
This proposal suggests adding a new `properties` section to param and result definition. If we later support more json
schema attribute such as `additionalProperties` and `required`, we'd also support them at the same level as the
`properties` field here. (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be something we add on a second pass, but I really would see value in having these small schemas re-usable / sharable, being able to build a library of them - perhaps in the catalog? - that can be used across tasks and contributed to by third parties.

Reusing these schemas introduces the extra complexity of versioning them, discovering them etc but I think it's still an idea worth exploring as a follow up

**When the Task writes more keys than it declares in its results:**

- The TaskRun will succeed
- The additional keys will not be included in the TaskRun status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could reconsider this one, as I mentioned earlier I would go against a strict interface but it might make it easier to use Tekton. I'd be happy to hear others' ideas about this.

@afrittoli
Copy link
Member

/test pull-community-teps-lint

Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @bobcatfish - this looks great!!

I have left a few comments around the possibility of supporting schema-less objects (in parallel to schema ones) and about reusability of schemas (about which you already have a section in the future-work part).

I'm also a bit wondering how writing a JSON file to produce the result will look like for task authors. A lot of tools can produce JSON files as output, but those outputs are usually nested, so it will require in most cases having access to a shell of some kind.

/approve

We have several options for how we define and handle required and optional keys:

1. Embrace the optional by default behavior of JSON Schema; instead of validating the presences of "required" keys via
jsonschema, we could do the validation based on the declared variable replacement, e.g. say a Task declares it needs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand correctly 🙏

Does this mean that if a Task declares an object input foo with an optional field bar, and then in the code if tries to access $(params.foo.bar) we would then consider bar mandatory, i.e. fail at runtime if the input param does not have a bar field?

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: optional-fields
spec:
  params:
    - name: foo
      type: object
      properties:
        bar: {}
        needed: {}
      required: ["needed"]
  steps:
    - name: first-step
      image: alpine
      env:
      - name: FOO_BAR
        value: $(params.foo.bar)
      script: |
        #!/bin/sh
        set -ex
        set -o pipefail
        if [ -z "$FOO_BAR ]; then
          echo "foo does not have a bar, or it's empty ¯\_(ツ)_/¯"
        fi

To clarify, I think the example above should run successfully if the input foo does not have a bar field defined. We can fail validation at runtime if foo.needed is not passed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that if a Task declares an object input foo with an optional field bar, and then in the code if tries to access $(params.foo.bar) we would then consider bar mandatory, i.e. fail at runtime if the input param does not have a bar field?

yes

To clarify, I think the example above should run successfully if the input foo does not have a bar field defined.

i disagree - what value would be provided for $(params.foo.bar) in that case? (and what about in future if bar is a more complex object?) (feels like this is better resolved via supporting some kind of default, either via https://github.com/tektoncd/community/blob/main/teps/0048-task-results-without-results.md or

the optional vs required by default discussion has been one of the hardest parts of this proposal - @jerop and i have also been discussing this in the context of tektoncd/pipeline#3497

Comment on lines +353 to +359
1. Embrace the optional by default behavior of JSON Schema; instead of validating the presences of "required" keys via
jsonschema, we could do the validation based on the declared variable replacement, e.g. say a Task declares it needs
an object called `foo` with an attribute called `bar`. At runtime if the Task is provided with an object
`foo` that doesn't have an attribute called `bar` this will be okay unless the Task contains variable replacement
that uses it (`$(params.foo.bar)`). Con: weird because effectively we're implying `required` (assuming the fields are
used, and if they aren't used, why are they there) even though the JSON Schema declaration says the fields are
optional.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @bobcatfish comment in tektoncd/pipeline#3497 (comment) that we should distinguish between results and fields within a result.
For fields within an object result, we should rely on the schema (if one exists). If we follow jsonschema approach of optional by default, we should only enforce fields that are marked as required.

The "optional by default" approach is consistent with jsonschema, and it seems more fitting with reusability of object schemas/types. On the down it may push validation into the user steps and it seems a bit inconsistent with the rest of the Tekton API - which may be confusing to users. Still I think the pros surpass the cons.

Having an option to change the default behaviour sounds interesting, but I'm afraid it would reduce reusability, as tasks and schemas developed with on behaviour in mind may stop working if a different behaviour is configured.

Comment on lines +405 to +407
* Using the JSON Schema syntax to describe the structure of the object is a bit verbose but looking at our
[Alternatives](#alternatives) this seems like the best option assuming we want to support more complex types
in the future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be an option where we allow not having a schema.

Comment on lines +439 to +440
url: https://github.com/somerepo
path: ./my/directory/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are default applied individually or as a whole?
I.e. is the param is specified with url but path is not set, will path be taken from the default?

My expectation would be no, so defaults are only applied to the whole object when it's not specified at all.
That might imply that we have to validate default values too. For instance if path was a required field, and the default did not include it, we should fail validation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, i agree! i'll add this in the implementable PR

cat /place/with/actual/cloned/gitrepo.json > $(results.cloned-gitrepo.path)
```

`$(results.cloned-gitrepo.path)` refers to the path at which to write the `clone-gitrepo` object as json (tt does not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: s/tt/it ?

coupled
with [TEP-0044](https://github.com/tektoncd/community/blob/main/teps/0044-decouple-task-composition-from-scheduling.md))
* [Simplicity](https://github.com/tektoncd/community/blob/main/design-principles.md#simplicity)
* Pro: Support for emitting object results builds on [TEP-0076](https://github.com/tektoncd/community/pull/477)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is maybe more about flexibility, however for a user to be able to pass a json dict as a result they'll have to define a schema for the result and the same schema for input params that consume that result (in any).
I think this is perfectly acceptable in cases where users want to have extra validation and observability into the pipelines, but it might be too much for the more casual author. I wonder if we should support schema-less as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting idea but I'm not 100% sure what this would look like, could you provide an example?


[See also the TEP-0076 Array results and indexing](https://github.com/tektoncd/community/pull/477) design evaluation.

* [Reusability](https://github.com/tektoncd/community/blob/main/design-principles.md#reusability):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another improvement in reusability, if we have a schema, would be allowing for schemas to be reused and shared

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes 💯 i think that would be a very likely next step after this

In the above example:

```bash
cat /place/with/actual/cloned/gitrepo.json > $(results.cloned-gitrepo.path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I think this is the only place where it's describe how object results will be produced by users - perhaps it would be worth having a small paragraph about it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's also an example at https://github.com/tektoncd/community/blob/main/teps/0075-object-param-and-result-types.md#emitting-object-results but happy to add more detail there and/or here (ill at least add a link from here back to that section) anything else in particular you'd like to see covered?

Comment on lines +884 to +886
1. Allow definition reuse instead of having to copy and paste them around
2. If we ship a known set of these with the Tekton Pipelines controller, they could define interfaces that tools such
as [Tekton Chains](https://github.com/tektoncd/chains/blob/main/docs/config.md) could rely on and build around.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+100

I didn't realise this was already part of the TEP :)
I agree it can be future work, but I think it would be an important feature to have

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 - @wlynch and his team are planning to create a TEP to dive into this in more detail

@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli, jerop, sbwsg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pritidesai
Copy link
Member

API WG - ready to merge
/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 14, 2022
This TEP proposes adding support for more structured params and results
by introducing objects (aka dictionaries), and introducing a (small)
subset of json schema syntax for declaring keys in the object.

This refers to TEP-0074 which will be added in a separate commit and
proposes deprecating PipelineResources.

This is based on @skaegi's proposals in tektoncd/pipeline#1393
@tekton-robot tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 14, 2022
@bobcatfish
Copy link
Contributor Author

(thanks for the detailed review comments @afrittoli !! if this is merged before they are addressed I'll make sure to address them in the follow-up PR going from proposed to implementable 🙏 )

@afrittoli
Copy link
Member

(thanks for the detailed review comments @afrittoli !! if this is merged before they are addressed I'll make sure to address them in the follow-up PR going from proposed to implementable 🙏 )

Thanks @bobcatfish - it sounds good.
Ready to re-lgtm then?

@jerop
Copy link
Member

jerop commented Feb 14, 2022

re-lgtming

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 14, 2022
@tekton-robot tekton-robot merged commit a1c3003 into tektoncd:main Feb 14, 2022
@xchapter7x
Copy link
Contributor

/area s3c

@tekton-robot tekton-robot added the area/s3c Issues or PRs that are related to Secure Software Supply Chain (S3C) label Mar 4, 2022
bobcatfish added a commit to bobcatfish/pipeline that referenced this pull request Mar 18, 2022
While working on (finally) updating the array result and object
param/result TEPs (tektoncd/community#479
tektoncd/community#477) I realized I hadn't
included an example of how to specify defaults for the new format, so I
looked for an example of how we currently do this for arrays, but we had
none! Hopefully now we do :D
@bobcatfish
Copy link
Contributor Author

@afrittoli finally addressing your feedback:

I'm also a bit wondering how writing a JSON file to produce the result will look like for task authors. A lot of tools can produce JSON files as output, but those outputs are usually nested, so it will require in most cases having access to a shell of some kind.

that's probably true - writing simple JSON by hand isn't too bad (e.g. https://github.com/tektoncd/community/blob/main/teps/0075-object-param-and-result-types.md#emitting-object-results). are there any alternatives youd like to see supported that would make this easier? in #477 @mogsie suggested json text sequences as a simpler way to emit array types - but that wouldn't help much with objects

bobcatfish added a commit to bobcatfish/community that referenced this pull request Mar 18, 2022
Update the TEPs about array and object support (TEP-0075 and TEP-0076)
to be implementable.

Also addressing feedback from @afrittoli in tektoncd#479
tekton-robot pushed a commit to tektoncd/pipeline that referenced this pull request Mar 23, 2022
While working on (finally) updating the array result and object
param/result TEPs (tektoncd/community#479
tektoncd/community#477) I realized I hadn't
included an example of how to specify defaults for the new format, so I
looked for an example of how we currently do this for arrays, but we had
none! Hopefully now we do :D
tekton-robot pushed a commit that referenced this pull request Apr 11, 2022
Update the TEPs about array and object support (TEP-0075 and TEP-0076)
to be implementable.

Also addressing feedback from @afrittoli in #479
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/s3c Issues or PRs that are related to Secure Software Supply Chain (S3C) kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Status: UnAssigned
Status: Implementable
Development

Successfully merging this pull request may close these issues.

10 participants