Build a generic `csv-lookup` plugin #674

zanete · 2024-04-29T10:48:36Z

Sub of: #656

What
Create a new generic csv-lookup plugin in 'if/builtins`

Why

We want to support as many pipelines as possible using generic plugins that can be adapted to many use cases. We currently have sum, multiply, coefficient. We also need to support csv-lookup that enable a user to grab arbitrary data from a given csv file. This would also allow us to deprecate some of our current plugins that grab data from a csv file - instead the file can be hosted somewhere and the data accessed using this generic csv-lookup instead.

Prerequisites/resources
None

SoW (scope of work)

plugin code is added builtins
documentation updated to builtins
documentation is added to if.greensoftware.foundation
unit tests added, giving 100% coverage and passing
manifests are added to if repo demonstrating usage

Plugin details

Config:
The following elements should be included in global config
- filepath: path to a csv file, either on the local filesystem or on the internet
- output: the columns to grab data from and add to output data - should support wildcard (*) or multiple values.
- query: an array of key/value pairs where the key is a column name in the target csv and the value is a parameter from inputs, so that the following configs should work provided cloud/provider, region and instance-type are available in defaults or inputs:

cloud-instance-metadata:
  method: CsvLookup
  path: "builtins"
  global-config:
	filepath: https://some-file.xyz
	query:
	  cloud-provider: cloud/provider
	  region: cloud/region
	  instance-type: cloud/instance-type
	output: "*"

This config contains the location of the target file in filepath. It should download the file if it is not already in the local filesystem. Then the plugin should query the CSV data using the query parameters. The resulting data should be added to the output data as an array using the field name defined in output-parameter.

Plugin logic

The plugin should apply the following logic:

Grabs value from output column where values in query conditions are satisfied
We do NOT need to implement < > AND OR logic in this version - strict matches only are sufficient.
Supports multiple query terms by adding to query section of global config.
Supports wildcard or multiple entries for target column, e.g. outputs: [*] returns values from all columns other than the query parameters.
Uses target column names as keys in output data by default
Supports renaming data in output data by providing an array of names in output field in config, e.g. to grab data from processor-name column in csv and name it processor-model-id in output data:

cloud-instance-metadata:
  method: CsvLookup
  path: "builtins"
  global-config:
	filepath: https://some-file.xyz
	query:
	  cloud-provider: cloud/provider
	  region: cloud/region
	  instance-type: cloud/instance-type
	output: [["processor-name", "processor-model-id"]]

This renaming should also work with multiple entries, such as:

cloud-instance-metadata:
  method: CsvLookup
  path: "builtins"
  global-config:
	filepath: https://some-file.xyz
	query:
	  cloud-provider: cloud/provider
	  region: cloud/region
	  instance-type: cloud/instance-type
	output: [["processor-name", "processor-model-id"], ["tdp","thermal-design-power"]]

or even wildcards, erroring out if the number of columns accessed using the wildcard does not match the number of names provided on the RHS of the colon.

returns NaN for missing data
if there are multiple valid responses for a query, return the FIRST

Acceptance criteria

A plugin called csv-lookup exists in the if-plugins repository
Given (Setup): the csv-lookup plugin exists
When (Action): a user has downloaded and installed if and if-plugins
Then (Assertion): the user should be able to query data from csv files stored locally or online.

Example 1: cloud-metadata

Running this manifest should replicate the functionality of the current cloud-metadata plugin:

csv file has following structure:


year	cloud-provider	cloud-region	cfe-region	em-zone-id	wt-region-id	location	geolocation	cfe-hourly	cfe-annual	power-usage-efficiency	net-carbon	grid-carbon-intensity-24x7	grid-carbon-intensity-consumption	grid-carbon-intensity-marginal	grid-carbon-intensity-production	grid-carbon-intensity
2022	Google Cloud	asia-east1	Taiwan	TW	TW	Taiwan	25.0375,121.5625	0.18			0	453				453

manifest looks like this:

name: csv-demo
description:
tags:
initialize:
  plugins:
    cloud-instance-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: "cloud/provider
          region: "cloud/region"
          instance-type: "cloud/instance-type"
        output: "*"
tree:
  children:
    child:
      pipeline:
        - cloud-region-metadata
        - cloud-instance-metadata
        - extract-processor-name # some regexp plugin magic?
        - tdp-finder
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          cloud/provider: gcp
          cloud/region: asia-east

IF should return this output:

name: csv-demo
description:
tags:
initialize:
  plugins:
    cloud-instance-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: "cloud/provider
          region: "cloud/region"
          instance-type: "cloud/instance-type"
        output: "*"
tree:
  children:
    child:
      pipeline:
        - cloud-region-metadata
        - cloud-instance-metadata
        - extract-processor-name # some regexp plugin magic?
        - tdp-finder
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          cloud/provider: gcp
          cloud/region: asia-east
      outputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          cloud/provider: gcp
          cloud/region: asia-east
          cfe-region: Taiwan
          em-zone-id: TW
          wt-region-id: TW
          location: Taiwan
          geolocation: 25.0375,121.5625
          cfe-hourly: 0.18
          cfe-annual: nan
          power-usage-efficiency: nan
          net-carbon: 0
          grid-carbon-intensity-24x7: 453
          grid-carbon-intensity-consumption: nan
          grid-carbon-intensity-marginal: nan
          grid-carbon-intensity-production: nan
          grid-carbon-intensity: 453

Note this example should also work identically if filepath is a url directing to a file on the internet rather than the local filesystem, in which case the filepath would be a url such as https://some-website.xyz/data.csv.

Example 2: tdp-finder
CSV file looks like this:

name	tdp
AMD A10-9700	65.0

Running this manifest should replicate the functionality of the current tdp-finder plugin:

name: csv-demo
description:
tags:
initialize:
  plugins:
    cloud-instance-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          name: "instance-id"
        output: "tdp"
tree:
  children:
    child:
      pipeline:
        - cloud-instance-metadata
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          instance-id: "AMD A10-9700"

Running ie -m manifest.yml -o output.yml should yield the following:

name: csv-demo
description:
tags:
initialize:
  plugins:
    tdp-finder:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          name: "instance-id"
        output: "tdp"
tree:
  children:
    child:
      pipeline:
        - tdp-finder
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          instance-id: "AMD A10-9700"
      outputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          instance-id: "AMD A10-9700"
          tdp: 65.0

Example 3: region-metadata

CSV looks like this:

year	cloud-provider	cloud-region	cfe-region	em-zone-id	wt-region-id	location	geolocation	cfe-hourly	cfe-annual	power-usage-efficiency	net-carbon	grid-carbon-intensity-24x7	grid-carbon-intensity-consumption	grid-carbon-intensity-marginal	grid-carbon-intensity-production	grid-carbon-intensity
2022	Google Cloud	asia-east1	Taiwan	TW	TW	Taiwan	25.0375,121.5625	0.18			0	453				453

Running the following manifest achieves the expected behaviour of our cloud-region-metadata plugin:

name: csv-demo
description:
tags:
initialize:
  plugins:
    cloud-region-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: "cloud/provider" 
          region: "cloud/region"
        output: "*"
tree:
  children:
    child:
      pipeline:
        - cloud-region-metadata
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          instance-id: "AMD A10-9700"
      outputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          instance-id: "AMD A10-9700"
         cloud-provider: "gcp"
         cloud-region: "asia-east"

This should yield:

name: csv-demo
description:
tags:
initialize:
  plugins:
    cloud-region-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: "cloud/provider" 
          region: "cloud/region"
        output: "*"
tree:
  children:
    child:
      pipeline:
        - cloud-region-metadata
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          instance-id: "AMD A10-9700"
      outputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          instance-id: "AMD A10-9700"
         cloud-provider: "gcp"
         cloud-region: "asia-east"
         cfe-region: Taiwan
         em-zone-id: TW
         wt-region-id: TW
         location: Taiwan
         geolocation: 25.0375,121.5625
         cfe-hourly: 0.18
         cfe-annual: nan
         power-usage-efficiency: nan
         net-carbon: 0
         grid-carbon-intensity-24x7: 453
         grid-carbon-intensity-consumption: nan
         grid-carbon-intensity-marginal: nan
         grid-carbon-intensity-production: nan
         grid-carbon-intensity: 453

Example 4: composite csv-lookups

We should be able to chain the preceding examples together as follows:

name: csv-demo
description:
tags:
initialize:
  plugins:
    cloud-region-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: cloud/provider
          region: cloud/region
        output: "*"
    cloud-instance-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: cloud/provider
          region: cloud/region
          instance-type: cloud/instance-type
        output: "*"
    tdp-finder:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          processor-name: cpu/processor-name
        output:
          tdp: cpu/thermal-design-power
tree:
  children:
    child:
      pipeline:
        - cloud-region-metadata
        - cloud-instance-metadata
        - extract-processor-name # might require regex to isolate first processor from list returned by instance-metadata
        - tdp-finder
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          cloud/provider: gcp
          cloud/region: asia-east

This should return the following (assuming some regex was done top isolate one processor from the returned list and assign it to `cpu/processor-name)

name: csv-demo
description:
tags:
initialize:
  plugins:
    cloud-region-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: cloud/provider
          region: cloud/region
        output: "*"
    cloud-instance-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: cloud/provider
          region: cloud/region
          instance-type: cloud/instance-type
        output: "*"
    tdp-finder:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          processor-name: cpu/processor-name
        output:
          tdp: cpu/thermal-design-power
tree:
  children:
    child:
      pipeline:
        - cloud-region-metadata
        - cloud-instance-metadata
        - extract-processor-name # regex to isolate first processor from list returned by instance-metadata
        - tdp-finder
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          cloud/provider: gcp
          cloud/region: asia-east
      outputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          cpu-cores-available': 52
          cpu-cores-utilized: 1
          cpu-manufacturer: Intel
          cpu-model-name: Intel® Xeon® Platinum 8272CL,Intel® Xeon® 8171M 2.1 GHz,Intel® Xeon® E5-2673 v4 2.3 GHz,Intel® Xeon® E5-2673 v3 2.4 GHz
          cpu/processor-name: Intel® Xeon® Platinum 8272CL
          cpu-tdp: 205
          gpu-count: nan
          gpu-model-name: nan
          gpu-tdp: nan
          memory-available: 2.0
          cloud-provider: "gcp"
          cloud-region: "asia-east"
          cfe-region: Taiwan
          em-zone-id: TW
          wt-region-id: TW
          location: Taiwan
          geolocation: 25.0375,121.5625
          cfe-hourly: 0.18
          cfe-annual: nan
          power-usage-efficiency: nan
          net-carbon: 0
          grid-carbon-intensity-24x7: 453
          grid-carbon-intensity-consumption: nan
          grid-carbon-intensity-marginal: nan
          grid-carbon-intensity-production: nan
          grid-carbon-intensity: 453

Unit tests exists with 100% coverage over csv-lookup
Given (Setup): a user has downloaded and installed if-plugins
When (Action): a user runs npx jest --coverage
Then (Assertion): the coverage report should show that csv-lookup is 100% covered and passing
Documentation exists in plugin readme
Given (Setup): the user visits the if-plugins repository
When the user navigates to src/lib/csv-lookup
Then the user sees a README containing documentation describing the csv-lookup plugin, copying the format from the other plugin readmes.
Link to README documentation exists in if.greensoftware.foundation
Given: the user is on if.greensoftware.foundation
When (Action): they navigate to reference/plugins and find the csv-lookup plugin section
Then (Assertion): they see a link to the plugin readme for the exponent plugin
Example manifests exists
Given: the user has downloaded and installed if
When (Action): the user navigates to if/manifests/plugins
Then (Assertion): they see manifests that include the csv-lookup plugin

The text was updated successfully, but these errors were encountered:

jmcook1186 · 2024-04-29T15:39:18Z

@pazbardanl drafted a ticket - would be grateful for feedback on the design.

zanete · 2024-05-09T10:56:42Z

putting this back in design for @jmcook1186 to review and specify the AC more concretely (as per our call with @jawache )

jmcook1186 · 2024-05-09T15:22:06Z

@jawache should we support more complex queries than the one in the example above? My suggested config syntax only supports single conditions or multiple conditions linked by AND logic, but not OR.

You could do the equivalent of SELECT * FROM processor-names WHERE tdp > 200 AND manufacturer = Intel but NOT the equivalent of SELECT * FROM processor-names WHERE tdp > 200 OR tgp > 300 for example

EDIT: added a more detailed version of this question in the issue description - see big red question mark!

jawache · 2024-05-14T14:46:25Z

Hey @jmcook1186 so given that the driver of this plugin is to replace the cloud-metadata plugin (as well as the tdp-finder), let's use those as the drivers of this AC to make sure that afterwards, we really do end up with something that can replace all of those.

The key thing is to map this out as a pipeline, the inputs to this plugin will be the things it uses to query the CSV, it won't be statically configured normally I imagine.

filepath: https://some-file.xyz or a local file path (very important it should be able to load a file over the interenet with a HTTP://)
query: 
   name-of-column-in-csv: name-of-input-parameter
   name-of-column-in-csv: name-of-input-parameter
outputs:
   name-of-column-in-csv: name-of-output-parameter
   name-of-column-in-csv: name-of-output-parameter

If no outputs are specified then it outputs all the columns in the csv as output params with the sane name

I do like the idea of the sqlite approach to give us some nice SQL queries, but also I do think we need a simpler UX for the very simple use cases (it also makes the config a lot easier to manage)

To refine this one @jmcook1186 I'd prefer if you:

Extracted the cloud metadata csv into their own files (for testing)
Created a config for cloud-instance-metadata (looking up cloud instance data from the cloud metadata csvs)
Created a config for cloud-region-metadata (looking up cloud regional from the cloud metadata csvs)
Created a config for cpu-tdp-metadata (looking up tdp data from processor name from the cloud metadata csvs)

Then let's see what's required to marry all three of those plugins together to mirror the existing cloud metadata functionality.

jmcook1186 · 2024-05-15T10:13:39Z

Ok @jawache - here's a minimal spec for a csv lookup plugin that can replace cloud-metadata, tdp-finder and instance-metadata with the simplest interface I can come up with. It has to support multiple target and multiple selector columns to replace cloud metadata and region metadata plugins.

Inputs

Makes sense to instantiate without global config and re-use per component by passing in new file and query params at the node level, in my opinion.

node-level-config:
  filepath: https://some-file.xyz 
  target-column: ''
  selector-column: ''
  selector-value: ''

Logic

The plugin should apply the following logic:

grabs value from target column where values in target column are equal to values in selector-value

supports multiple selector-column and selector-value by passing arrays, e.g.

 selector-column: ['cloud-provider', 'region'] 
 selector-value: ['gcp', asia-east']

Supports wildcard or multiple entries for target column, e.g. target columns: [*] returns values from all columns other than the target-column
uses target column names as keys in output data
returns NaN for missing data
if there are multiple valid responses for a query, return the FIRST

Examples

cloud instance metadata

The csv file looks like this:

cpu-cores-available	cpu-cores-utilized	cpu-manufacturer	cpu-model-name	cpu-tdp	gpu-count	gpu-model-name	gpu-tdp	instance-class	memory-available
52	1	Intel	Intel® Xeon® Platinum 8272CL,Intel® Xeon® 8171M 2.1 GHz,Intel® Xeon® E5-2673 v4 2.3 GHz,Intel® Xeon® E5-2673 v3 2.4 GHz	205				Standard_A1_v2	2.0

A manifest that executes a lookup for instance metadata looks as follows (uses wildcard to retrieve all columns where target column/value match):

name: csv-demo
description:
tags:
initialize:
  plugins:
    csv-lookup:
      method: CsvLookup
      path: "builtins"
tree:
  children:
    child:
      pipeline:
        - sum
      config:
        csv-lookup:
          filepath: https://some-file.xyz
          target-column: *
          selector-column: 'instance-class'
          selector-value: 'Standard_A1_v2'
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001

should yield the following output:

name: csv-demo
description:
tags:
initialize:
  plugins:
    csv-lookup:
      method: CsvLookup
      path: "builtins"
execution:
 ...
tree:
  children:
    child:
      pipeline:
        - sum
      config:
        csv-lookup:
          filepath: https://some-file.xyz
          target-column: [*]
          selector-column: 'instance-class'
          selector-value: 'Standard_A1_v2'
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001
      outputs:
         - timestamp: 2023-08-06T00:00
           duration: 3600
           cpu/energy: 0.001
           network/energy: 0.001
          cpu-cores-available': 52
          cpu-cores-utilized: 1
          cpu-manufacturer: Intel
          cpu-model-name: Intel® Xeon® Platinum 8272CL,Intel® Xeon® 8171M 2.1 GHz,Intel® Xeon® E5-2673 v4 2.3 GHz,Intel® Xeon® E5-2673 v3 2.4 GHz
          cpu-tdp: 205
          gpu-count: nan
          gpu-model-name: nan
          gpu-tdp: nan
          memory-available: 2.0

tdp-finder

The csv looks as follows:

name	tdp
AMD A10-9700	65.0

A manifest that executes a lookup for tdp looks as follows (uses single target and selector column - simplest case):

name: csv-demo
description:
tags:
initialize:
  plugins:
    csv-lookup:
      method: CsvLookup
      path: "builtins"
tree:
  children:
    child:
      pipeline:
        - sum
      config:
        csv-lookup:
          filepath: https://some-file.xyz
          target-column: 'tdp'
          selector-column: 'name'
          selector-value: 'AMD A10-9700'
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001

This should yield the following outputs:

name: csv-demo
description:
tags:
initialize:
  plugins:
    csv-lookup:
      method: CsvLookup
      path: "builtins"
tree:
  children:
    child:
      pipeline:
        - sum
      config:
        csv-lookup:
          filepath: https://some-file.xyz
          target-column: ['tdp']
          selector-column: 'name'
          selector-value: 'AMD A10-9700'
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001
          tdp: 65.0

cloud region metadata

The csv data looks as follows:


year	cloud-provider	cloud-region	cfe-region	em-zone-id	wt-region-id	location	geolocation	cfe-hourly	cfe-annual	power-usage-efficiency	net-carbon	grid-carbon-intensity-24x7	grid-carbon-intensity-consumption	grid-carbon-intensity-marginal	grid-carbon-intensity-production	grid-carbon-intensity
2022	Google Cloud	asia-east1	Taiwan	TW	TW	Taiwan	25.0375,121.5625	0.18			0	453				453

A manifest that executes a lookup for region metadata is as follows (uses wildcard to select all columns, uses multiple selector columns):

name: csv-demo
description:
tags:
initialize:
  plugins:
    csv-lookup:
      method: CsvLookup
      path: "builtins"
tree:
  children:
    child:
      pipeline:
        - sum
      config:
        csv-lookup:
          filepath: https://some-file.xyz
          target-column: ['*']
          selector-column: ['cloud-provider', 'region']
          selector-value: ['gcp', asia-east']
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001

This should yield the following outputs:

name: csv-demo
description:
tags:
initialize:
  plugins:
    csv-lookup:
      method: CsvLookup
      path: "builtins"
tree:
  children:
    child:
      pipeline:
        - sum
      config:
        csv-lookup:
          filepath: https://some-file.xyz
          target-column: ['*']
          selector-column: ['cloud-provider', 'region']
          selector-value: ['gcp', asia-east']
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001
      outputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001
          cfe-region: Taiwan
          em-zone-id: TW
          wt-region-id: TW
          location: Taiwan
          geolocation: 25.0375,121.5625
          cfe-hourly: 0.18
          cfe-annual: nan
          power-usage-efficiency: nan
          net-carbon: 0
          grid-carbon-intensity-24x7: 453
          grid-carbon-intensity-consumption: nan
          grid-carbon-intensity-marginal: nan
          grid-carbon-intensity-production: nan
          grid-carbon-intensity: 453

If you agree, I'll work this up more thoroughly int he issue description.

jawache · 2024-05-15T13:23:01Z

Hey @jmcook1186 somewhat agree but the key thing is that the inputs to the query will come in as observation inputs, not static configuration.

So to mirror the existing functionality it will be something like so:

name: csv-demo
description:
tags:
initialize:
  plugins:
    cloud-region-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: cloud/provider
          region: cloud/region
        output: "*"
    cloud-instance-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: cloud/provider
          region: cloud/region
          instance-type: cloud/instance-type
        output: "*"
    tdp-finder:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          processor-name: cpu/processor-name
        output:
          tdp: cpu/thermal-design-power
tree:
  children:
    child:
      pipeline:
        - cloud-region-metadata
        - cloud-instance-metadata
        - extract-processor-name # some regexp plugin magic?
        - tdp-finder
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          cloud/provider: gcp
          cloud/region: asia-east

And i'd suggest a slightly different interface to handle the situation when the input parameter names won't match the column headings in the CSV.

      global-config:
        filepath: https://some-file.xyz
        query:
          <column-name-in-csv>: <input-parameter-name>
          <column-name-in-csv>: <input-parameter-name>
        output:
          <column-name-in-csv>: <output-parameter-name>
          <column-name-in-csv>: <output-parameter-name>

jmcook1186 · 2024-05-15T13:29:49Z

Aaah of course - yes, obviously they need to come as inputs so the lookups can chain together. I'm on your page now @jawache - thanks.

zanete · 2024-05-16T15:39:20Z

@manushak please review the AC if it makes sense and if can be sized (in hours)
EDIT: Please disregard, as we discussed in planning, this should be done by @narekhovhannisyan

narekhovhannisyan · 2024-05-27T05:48:18Z

@zanete I forgot the t-shirt sizing table, however, in ordinary units, it will take approx. 2 days (including unit tests, and documentation updates)

zanete · 2024-05-28T09:54:42Z

@jmcook1186 please update the description to say global so there's no discrepancy between the comments and issue description. @narekhovhannisyan and you should have a 10 min sync to make sure it's all interpreted correctly

narekhovhannisyan · 2024-05-28T21:25:41Z

@jawache @jmcook1186 I'm unable to use output: "processor-name": "processor-model-id" or output: ["processor-name", "tdp"]: ["processor-model-id","thermal-design-power"]. jsyaml can't parse both. VScode is erroring out on it too.

console.log(jsYaml.load('output: "processor-name": "processor-model-id"'));

YAMLException: bad indentation of a mapping entry (1:25)

 1 | output: "processor-name": "processor-model-id"
-----------------------------^
    at generateError (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:183:10)
    at throwError (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:187:9)
    at readBlockMapping (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1182:7)
    at composeNode (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1441:12)
    at readDocument (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1625:3)
    at loadDocuments (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1688:5)
    at Object.load (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1714:19)
    at Object.<anonymous> (/Users/admin/Projects/uk/if/_test.ts:30:20)
    at Module._compile (node:internal/modules/cjs/loader:1356:14)
    at Module.m._compile (/Users/admin/.nvm/versions/node/v18.19.0/lib/node_modules/ts-node/src/index.ts:1618:23) {
  reason: 'bad indentation of a mapping entry',
  mark: {
    name: null,
    buffer: 'output: "processor-name": "processor-model-id"\n',
    position: 24,
    line: 0,
    column: 24,
    snippet: ' 1 | output: "processor-name": "processor-model-id"\n' +
      '-----------------------------^'
  }
}

AND

console.log(
  jsYaml.load(
    'output: ["processor-name", "tdp"]: ["processor-model-id","thermal-design-power"]'
  )
);


YAMLException: bad indentation of a mapping entry (1:34)

 1 | output: ["processor-name", "tdp"]: ["processor-model-id","thermal ...
--------------------------------------^
    at generateError (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:183:10)
    at throwError (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:187:9)
    at readBlockMapping (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1182:7)
    at composeNode (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1441:12)
    at readDocument (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1625:3)
    at loadDocuments (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1688:5)
    at Object.load (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1714:19)
    at Object.<anonymous> (/Users/admin/Projects/uk/if/_test.ts:31:10)
    at Module._compile (node:internal/modules/cjs/loader:1356:14)
    at Module.m._compile (/Users/admin/.nvm/versions/node/v18.19.0/lib/node_modules/ts-node/src/index.ts:1618:23) {
  reason: 'bad indentation of a mapping entry',
  mark: {
    name: null,
    buffer: 'output: ["processor-name", "tdp"]: ["processor-model-id","thermal-design-power"]\n',
    position: 33,
    line: 0,
    column: 33,
    snippet: ' 1 | output: ["processor-name", "tdp"]: ["processor-model-id","thermal ...\n' +
      '--------------------------------------^'
  }
}

So my advice is to use arrays and matrix for the same purpose:

output: ["processor-name": "processor-model-id"] // first case

output: [["processor-name", "processor-model-id"],["tdp","thermal-design-power"]]

jmcook1186 · 2024-05-29T12:36:30Z

ok yes, the colon separator is going to be a nightmare for yaml parsing.
Your way is better - provide sub-arrays where the first element is the initial name, the second element is the replacement.

zanete · 2024-05-30T15:39:46Z

Expecting a PR by end of Friday

zanete · 2024-06-03T12:18:03Z

@MariamKhalatova please QA :)

zanete · 2024-06-04T15:36:32Z

@narekhovhannisyan currently fixing issues raised by @MariamKhalatova

zanete · 2024-06-05T15:31:14Z

@MariamKhalatova could you review the bug fixes? 🙏

github-project-automation bot added this to IF Apr 29, 2024

zanete assigned jmcook1186 Apr 29, 2024

zanete added the epic: QA label Apr 29, 2024

zanete moved this to In Design in IF Apr 29, 2024

zanete mentioned this issue Apr 29, 2024

Audit existing plugins to determine what can be replaced with generics #640

Closed

3 tasks

zanete added this to the Plugin Registry milestone Apr 29, 2024

zanete mentioned this issue Apr 29, 2024

Epic - Plugin Cleanup #656

Closed

28 tasks

jmcook1186 moved this from In Design to In Refinement in IF Apr 29, 2024

jmcook1186 mentioned this issue May 1, 2024

Impact Framework Project Updates 2024-05-01 #689

Open

zanete assigned pazbardanl May 7, 2024

zanete moved this from In Refinement to Ready in IF May 7, 2024

zanete unassigned jmcook1186 May 7, 2024

zanete moved this from Ready to In Design in IF May 9, 2024

zanete assigned jmcook1186 and unassigned pazbardanl May 9, 2024

jmcook1186 changed the title ~~Design a generic csv-lookup plugin~~ [WIP]: Design a generic csv-lookup plugin May 10, 2024

zanete added the needs-response The issue has stalled because someone isn’t responding. label May 13, 2024

zanete assigned jawache May 13, 2024

jmcook1186 mentioned this issue May 16, 2024

Impact Framework Project Updates 2024-05-16 #714

Open

jmcook1186 moved this from In Design to In Refinement in IF May 16, 2024

zanete assigned manushak and unassigned jawache May 16, 2024

jmcook1186 assigned narekhovhannisyan and unassigned manushak May 20, 2024

jmcook1186 changed the title ~~[WIP]: Design a generic csv-lookup plugin~~ Design a generic csv-lookup plugin May 20, 2024

jmcook1186 changed the title ~~Design a generic csv-lookup plugin~~ Build a generic csv-lookup plugin May 20, 2024

jmcook1186 mentioned this issue May 23, 2024

Move tdp-finder csvs to its own repo #728

Closed

5 tasks

zanete mentioned this issue May 23, 2024

Summary - Impact Framework #692

Open

5 tasks

zanete unassigned jmcook1186 May 24, 2024

zanete moved this from In Refinement to In Progress in IF May 28, 2024

zanete removed the needs-response The issue has stalled because someone isn’t responding. label May 28, 2024

narekhovhannisyan linked a pull request May 31, 2024 that will close this issue

CSV Lookup builtin plugin #754

Merged

9 tasks

narekhovhannisyan moved this from In Progress to Pending Review in IF Jun 2, 2024

zanete assigned MariamKhalatova Jun 3, 2024

zanete moved this from Pending Review to Testing in IF Jun 3, 2024

zanete moved this from Testing to In Progress in IF Jun 4, 2024

zanete unassigned MariamKhalatova Jun 4, 2024

narekhovhannisyan moved this from In Progress to Pending Review in IF Jun 5, 2024

zanete removed the epic: QA label Jun 5, 2024

jmcook1186 mentioned this issue Jun 5, 2024

Impact Framework Project Updates2024-06-05 #780

Open

zanete moved this from Pending Review to Testing in IF Jun 5, 2024

zanete assigned MariamKhalatova Jun 5, 2024

MariamKhalatova closed this as completed in #754 Jun 5, 2024

github-project-automation bot moved this from Testing to Done in IF Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build a generic `csv-lookup` plugin #674

Build a generic `csv-lookup` plugin #674

zanete commented Apr 29, 2024 •

edited by jmcook1186

Loading

jmcook1186 commented Apr 29, 2024

zanete commented May 9, 2024

jmcook1186 commented May 9, 2024 •

edited

Loading

jawache commented May 14, 2024

jmcook1186 commented May 15, 2024 •

edited

Loading

jawache commented May 15, 2024

jmcook1186 commented May 15, 2024

zanete commented May 16, 2024 •

edited

Loading

narekhovhannisyan commented May 27, 2024

zanete commented May 28, 2024

narekhovhannisyan commented May 28, 2024 •

edited

Loading

jmcook1186 commented May 29, 2024

zanete commented May 30, 2024

zanete commented Jun 3, 2024

zanete commented Jun 4, 2024

zanete commented Jun 5, 2024

Build a generic csv-lookup plugin #674

Build a generic csv-lookup plugin #674

Comments

zanete commented Apr 29, 2024 • edited by jmcook1186 Loading

jmcook1186 commented Apr 29, 2024

zanete commented May 9, 2024

jmcook1186 commented May 9, 2024 • edited Loading

jawache commented May 14, 2024

jmcook1186 commented May 15, 2024 • edited Loading

Inputs

Logic

Examples

cloud instance metadata

tdp-finder

cloud region metadata

jawache commented May 15, 2024

jmcook1186 commented May 15, 2024

zanete commented May 16, 2024 • edited Loading

narekhovhannisyan commented May 27, 2024

zanete commented May 28, 2024

narekhovhannisyan commented May 28, 2024 • edited Loading

jmcook1186 commented May 29, 2024

zanete commented May 30, 2024

zanete commented Jun 3, 2024

zanete commented Jun 4, 2024

zanete commented Jun 5, 2024

Build a generic `csv-lookup` plugin #674

Build a generic `csv-lookup` plugin #674

zanete commented Apr 29, 2024 •

edited by jmcook1186

Loading

jmcook1186 commented May 9, 2024 •

edited

Loading

jmcook1186 commented May 15, 2024 •

edited

Loading

zanete commented May 16, 2024 •

edited

Loading

narekhovhannisyan commented May 28, 2024 •

edited

Loading