New Epic: Parameter metadata #772

jmcook1186 · 2024-06-05T11:03:25Z

jmcook1186
Jun 5, 2024
Maintainer

Hi folks,

this epic is a little less settled compared to the others I described earlier on this forum, which means there's a lot of opportunity to influence the direction by engaging with this post. We're refining this in the open so you can comment and help us refine the thinking.

Background

As we have developed IF we have explored many methods for standardizing the units and associated metadata for plugins, but we still haven't settled on a good general purpose solution.

It is important to have some way to verify plugin metadata. The metadata includes at least the list of parameters and return values and their units. The units are critical because two plugins that both return carbon might have units of lbs C / kwH, gCO2eq, kg C etc etc. A plugin later in the pipeline that does some additional transformation on that carbon value needs to know it is getting the value in the expected unit. A plugin expecting gCo2 that receives kgCO2 will return a value with a 1000x error.

Reading the plugin documentation is one simple solution, but there's no guarantee that a plugin's documentation and code develop at the same pace and it could be difficult to determine what metadata was valid for plugin run in the past if the metadata information is only available in documentation that might subsequently have been updated or removed. For verifiability, auditability and re-executability we need an in-code solution.

Our initial solution was to include a list of parameters that can be used by plugins in a file, params.ts that comes bundled with IF. Plugin builders can add to the list by appending to it at runtime or providing a totally new parameter file that overrides our default one. The rationale is that everyone is clear about the units for specific named parameters, i.e. if you want to name a parameter carbon it has to have the units we define in params.ts (unless you provide a new params file, but this is deliberately an advanced feature).

However, this has turned out to be a fairly poor solution as it adds too much friction for plugin developers and it constrains people's ability to build freely on top of IF and build complex pipelines. It also forced us as the IF core team to impose certain standards that turned out to be very difficult to reason about while still keeping IF as general purpose as possible. For example, for the carbon intensity of electricity, do we choose grid/carbon-intensity? Plugins that do things with the carbon intensity of electricity might not need to just work with grid/carbon-intensity, they might also want to work with other names like electric/carbon-intensity, or perhaps a cloud has its own computed carbon intensity factoring in the on-site generation, which they might want to name cloud/carbon-intensity. There is no way to decide on the one name for every “thing” we want to compute.

This means we need a more flexible solution that balances the need for auditable metadata with the ability to build freely and expressively on IF with the minimum of developer friction.

Proposed solution

There are several parts to the proposed solution. First is to remove the params.ts file and the IF logic that checks the params file for parameter metadata. Instead, we can make use of the metadata field we already expose in our plugin interface and move the metadata definitions into the plugins themselves rather than an external file.

export const MyPlugin = (globalConfig: Config): Plugin => { 
  const metadata = { 
    kind: 'execute', 
    outputs: [ 
      energy: { 
        description: 'amount of energy utilised by the component', 
        unit: 'kWh'
        }, 
      ] 
    }; 
    
    . . . 
    
    return execute, metadata; }

The params file is also used to grab the aggregation method that the aggregate feature should use to aggregate the values for a given parameter across time or across components. This can be moved into the aggregate feature config instead and completely removed from the plugin metadata.
This means you only have to provide the information when it is actually needed. e.g.

aggregation: metrics: 
  - 'cpu/utilization' 
  - method: sum 
  - type: 'both’

Once this is done, we can develop an explainer feature that collates the metadata from all the plugins in a given pipeline and output it as a node in the manifest. This means users can always see what units were generated by each plugin in an execution pipeline and check that the units fed from one pipeline to another are consistent. It also opens the door to a future static analysis type feature that auto-audits the unit propagation through a pipeline.

The explainer block should look similar to:

explain: 
  - name: energy-amd
    plugin: amd-chip-energy-computer 
    description: blah blah 
    unit: 'kWh' 
  - name: energy-intel 
    plugin: intel-chip-energy-computer 
    description: blah blah 
    unit: 'kWh'

❓ Q: An alternative to consider is whether it is actually better for explainer to enrich the existing initialize block for each plugin to create a leaner manifest file. I probably lean towards enriching the initialize block personally, just to keep all the plugin metadata and config in one place in the manifest. It might also be good to rename initialize to plugins or similar to remove any naming confusion.

Finally, parameter mapping should be implemented so that we can automatically add the return values of one plugin to the inputs array under the name required by another plugin.
For example, let's say one of our plugins returned cloud/vendor by default, and another plugin required the same information but expected to receive cloud-vendor-name. We could provide a mapping field in the config for the first plugin so that the cloud/vendor data is actually appended to the manifest data as cloud-vendor-name. There are already ways to do this, but the mapping reduces the amount of redundant data in the final manifest.
It could look as follows:

initialize:
  plugins:
    cloud-metadata
      method: CloudMetadata
      path: "@grnsft/if-plugins"
      mapping:
        cloud/vendor: cloud-name

❓ Q: will we REQUIRE metadata for each plugin - will this be a breaking change that will obselete existing plugins?

Tasks

remove params.ts and related logic from IF
Add metadata to all builtins and alert community to add the same to their existing plugins
Add explainer feature that collates metadata from all the plugins in a pipeline and either enriches the initialize info or adds a new block.
Add mapping feature
Add docs for all new features and update tutorials

How you can help

You can read through this post and give feedback in comments, especially if you are a plugin developer affected by these changes. Later, when the specific tasks are available as tickets on our issue board you can let us know if you want to work on one. There may be some that are reserved for core developers, but in general we are keen to open up IF development to the community.

@jawache @zanete @narekhovhannisyan @MariamKhalatova @manushak

jawache · 2024-06-05T11:47:28Z

jawache
Jun 5, 2024
Maintainer

Thanks @jmcook1186 I like the approach of moving the aggregation method to the aggregate featyure, it makes a lot of sense and it's good to be explicit.

Dynamic outputs

Something else to consider is plugins which return a dynamic list of outputs such as the mock-observations and csv-lookup plugins.

They both should know the parameters they will return once they get to read their global config, so they should be able to at least return the names of the parameters in the metadata object.
But they won't have information like description and units :/

We could just leave it as it is, they just don't have description or units, so if you want to do some automatic unit conversion it's not possible, it think that's ok and we should allow users the ability to be succinct in a manifest file. So maybe these plugins need additional (optional) config where you can provide the units and description values? I think it really should be optional.

initialize:
  plugins:
    csv-lookup:
      method: XXXX
      path: "@grnsft/if-plugins"
      global-config:
         metadata:
           cloud/vendor: 
             units: string
             description: blah blah

Units -> Type

We've had criticism before because the work unit doesn't quite describe what we are explaining, e.g. some units would be "string". What if we renamed it as type? Does that make more sense, i'm unsure but there is something not quite right with calling it a unit.

Inputs as well as outputs

I was thinking through this and had a realisation there was a really good reason to document inputs as well as outputs, but i've lost it now 🤦🏽‍♂️ Might come to me later, was a really really good reason to document inputs!

8 replies

jawache Jun 5, 2024
Maintainer

re dynamic return values good point re csv lookup, i hadn't considered that. but agree with your suggestion that the units can be prescribed in the global config. I guess in this case the plugin can be adapted to dynamically set its own metadata in response to the global config so that explainer can always look for metadata in the same place in each plugin regardless of whether the params are known in advance. Actually, maybe that's quite nice. One potential drawback is that the explainer would probably have to run after compute/induce so that the plugin has a chance to self-populate its metadata.

I think they will all have enough information in the main instantiation call, the MyPlugin(globalConfig) call. So when the plugins are loaded they should all have returned a metadata object that has their inputs/outputs, i.e. before the compute call.

jawache Jun 5, 2024
Maintainer

re units -> type Agree with the unit issue, but there's a similar issue with type. The information we really want to surface is the unit the value is expressed in, which will often be nonexistant (for a name, ratio, etc) but will sometimes be critical (e.g. energy in kWh, carbon in gCO2eq). Naming the field type suggests to users to define the variant from the languages type system for the parameter (string, number, boolean etc) which serves the majority of unimportant cases but does not serve the minority of very important case (energy, carbon...).

What about having two fields, one for the type and one for the unit. Type is required, unit is optional. Type is the expected parameter type (string, number, boolean, etc) and the unit is a string representing the unit of measurement. That way, each field has an unambiguous purpose, but if unit doesn't make sense for a parameter, it can just be null.

Interesting idea, one thought i had though is can there be a value to declaring the type? I'm worried if it doesn't have a purpose then people will either not declare it (unless it's a number) OR will declare a wrong value and since its' not used anywhere nothing will break. Maybe perhaps if you return a type of DateTime we will parse your response into a DateTime object and that way we don't have to panic so much about what exact format a DateTime will be returned in, we'll parse it into a common format. I.e. is there useful functionality we can attach to the act of declaring of types other than number to encourage declaring a type other than number? Or maybe it just doesn't matter for now.

josh-swerdlow Jun 8, 2024

re units -> type Agree with the unit issue, but there's a similar issue with type. The information we really want to surface is the unit the value is expressed in, which will often be nonexistant (for a name, ratio, etc) but will sometimes be critical (e.g. energy in kWh, carbon in gCO2eq). Naming the field type suggests to users to define the variant from the languages type system for the parameter (string, number, boolean etc) which serves the majority of unimportant cases but does not serve the minority of very important case (energy, carbon...).

What about having two fields, one for the type and one for the unit. Type is required, unit is optional. Type is the expected parameter type (string, number, boolean, etc) and the unit is a string representing the unit of measurement. That way, each field has an unambiguous purpose, but if unit doesn't make sense for a parameter, it can just be null.

I think being precise with the common-sense understanding of a word should be our guide. Units should only refer to the unit of measurement. I was not aware that there are cases that declare the Unit field as 'string', but it makes sense given how the field was being used. If that is a functionality that must be maintained, I think a separate Type field would be appropriate.

I worry that using null for ratios, scaling factors, etc is confusing. I think null is only good for non-numeric types that can't have a meaningful unit of measurement. Otherwise, there may be ambiguity as to whether someone forgot to assign a unit or if the quantity is unit less. See example,

value: 5.42
type: 'float'
unit: null <-- "is this an error?", "did someone make a mistake in the plugin/manifest/etc?", "is this unit less?"

Is there a 'unit less' or dimensionless unit? If not, there should be.

Furthermore for dimensional analysis checks, dimensionless or unit less would be a bit more understandable and probably work a bit more seamlessly with libraries that do unit conversion. I had looked into this when Andrew and I were thinking about doing this type of work for the hackathon (https://github.com/gentooboontoo/js-quantities).

Perhaps instead of letting Unit be optional it can follow this logic. For a numeric type, a default Unit assignment should be unit less or dimensionless. For non-numeric, it can be null.

I'm not sure if using dimensionless or unit less makes sense for non-numeric types; however, I don't think it's actively confusing, just a bit quirky.

value: 'dc-south'
type: 'string'
unit: 'dimension less'

Thoughts?

jmcook1186 Jun 10, 2024
Maintainer Author

thanks - this is great feeedback. On reflection i agree that null is too ambiguous, and I like simply asserting dimensionless as a valid value for the unit field.

jawache Jun 10, 2024
Maintainer

I agree, dimensionless makes sense, I prefer choices that reduce ambiguity.

andrew-woosnam · 2024-06-07T15:52:30Z

andrew-woosnam
Jun 7, 2024

Love the explainer block & explicit units/descriptions!

Q: will we REQUIRE metadata for each plugin - will this be a breaking change that will obselete existing plugins?

@josh-swerdlow and I agree that this wouldn't be the end of the world to do now, especially as the number of existing plugins is relatively low (compared to where we hope it'll be in the near future). Lots more to think through -- we'll comment more individually soon

2 replies

jawache Jun 10, 2024
Maintainer

IMO forcing everyone to update their plugins will be a heavy ask, esp. since we're not sure if this will be the last one. The conversations we've been having internally is that up to 1.0 we need to stay agile, but after 1.0 we'll implement a proper interface notification and depreciation policy with a timeline for depreciation etc...

My 2c is perhaps we do this for the 1.0 release, batch up this change and any other changes from now to 1.0 and If you want your plugin to work for 1.0 then you MUST update it but until then we'll just let it slide.

josh-swerdlow Jun 12, 2024

That makes sense. If the IF may continue to have updates, it would stink as a developer to be forced to upgrade now and possibly again. I think a proper system to manage the plugins (i.e. deprecation policy) takes a lot less pressure off the devs and let the community have a better experience.

This suggestion sounds good to me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Epic: Parameter metadata #772

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 10 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

New Epic: Parameter metadata #772

jmcook1186 Jun 5, 2024 Maintainer

Tasks

How you can help

Replies: 2 comments · 10 replies

jawache Jun 5, 2024 Maintainer

Dynamic outputs

Units -> Type

Inputs as well as outputs

jawache Jun 5, 2024 Maintainer

jawache Jun 5, 2024 Maintainer

josh-swerdlow Jun 8, 2024

jmcook1186 Jun 10, 2024 Maintainer Author

jawache Jun 10, 2024 Maintainer

andrew-woosnam Jun 7, 2024

jawache Jun 10, 2024 Maintainer

josh-swerdlow Jun 12, 2024

jmcook1186
Jun 5, 2024
Maintainer

Replies: 2 comments 10 replies

jawache
Jun 5, 2024
Maintainer

jawache Jun 5, 2024
Maintainer

jawache Jun 5, 2024
Maintainer

jmcook1186 Jun 10, 2024
Maintainer Author

jawache Jun 10, 2024
Maintainer

andrew-woosnam
Jun 7, 2024

jawache Jun 10, 2024
Maintainer