Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build a generic csv-lookup plugin #674

Closed
10 tasks
Tracked by #656
zanete opened this issue Apr 29, 2024 · 16 comments · Fixed by #754
Closed
10 tasks
Tracked by #656

Build a generic csv-lookup plugin #674

zanete opened this issue Apr 29, 2024 · 16 comments · Fixed by #754
Assignees
Milestone

Comments

@zanete
Copy link

zanete commented Apr 29, 2024

Sub of: #656

What
Create a new generic csv-lookup plugin in 'if/builtins`

Why

We want to support as many pipelines as possible using generic plugins that can be adapted to many use cases. We currently have sum, multiply, coefficient. We also need to support csv-lookup that enable a user to grab arbitrary data from a given csv file. This would also allow us to deprecate some of our current plugins that grab data from a csv file - instead the file can be hosted somewhere and the data accessed using this generic csv-lookup instead.

Prerequisites/resources
None

SoW (scope of work)

  • plugin code is added builtins
  • documentation updated to builtins
  • documentation is added to if.greensoftware.foundation
  • unit tests added, giving 100% coverage and passing
  • manifests are added to if repo demonstrating usage

Plugin details

  • Config:
    The following elements should be included in global config
    • filepath: path to a csv file, either on the local filesystem or on the internet
    • output: the columns to grab data from and add to output data - should support wildcard (*) or multiple values.
    • query: an array of key/value pairs where the key is a column name in the target csv and the value is a parameter from inputs, so that the following configs should work provided cloud/provider, region and instance-type are available in defaults or inputs:
cloud-instance-metadata:
  method: CsvLookup
  path: "builtins"
  global-config:
	filepath: https://some-file.xyz
	query:
	  cloud-provider: cloud/provider
	  region: cloud/region
	  instance-type: cloud/instance-type
	output: "*"

This config contains the location of the target file in filepath. It should download the file if it is not already in the local filesystem. Then the plugin should query the CSV data using the query parameters. The resulting data should be added to the output data as an array using the field name defined in output-parameter.

Plugin logic

The plugin should apply the following logic:

  • Grabs value from output column where values in query conditions are satisfied
  • We do NOT need to implement < > AND OR logic in this version - strict matches only are sufficient.
  • Supports multiple query terms by adding to query section of global config.
  • Supports wildcard or multiple entries for target column, e.g. outputs: [*] returns values from all columns other than the query parameters.
  • Uses target column names as keys in output data by default
  • Supports renaming data in output data by providing an array of names in output field in config, e.g. to grab data from processor-name column in csv and name it processor-model-id in output data:
cloud-instance-metadata:
  method: CsvLookup
  path: "builtins"
  global-config:
	filepath: https://some-file.xyz
	query:
	  cloud-provider: cloud/provider
	  region: cloud/region
	  instance-type: cloud/instance-type
	output: [["processor-name", "processor-model-id"]]

This renaming should also work with multiple entries, such as:

cloud-instance-metadata:
  method: CsvLookup
  path: "builtins"
  global-config:
	filepath: https://some-file.xyz
	query:
	  cloud-provider: cloud/provider
	  region: cloud/region
	  instance-type: cloud/instance-type
	output: [["processor-name", "processor-model-id"], ["tdp","thermal-design-power"]]

or even wildcards, erroring out if the number of columns accessed using the wildcard does not match the number of names provided on the RHS of the colon.

  • returns NaN for missing data
  • if there are multiple valid responses for a query, return the FIRST

Acceptance criteria

  • A plugin called csv-lookup exists in the if-plugins repository
    Given (Setup): the csv-lookup plugin exists
    When (Action): a user has downloaded and installed if and if-plugins
    Then (Assertion): the user should be able to query data from csv files stored locally or online.

Example 1: cloud-metadata

Running this manifest should replicate the functionality of the current cloud-metadata plugin:

csv file has following structure:

year cloud-provider cloud-region cfe-region em-zone-id wt-region-id location geolocation cfe-hourly cfe-annual power-usage-efficiency net-carbon grid-carbon-intensity-24x7 grid-carbon-intensity-consumption grid-carbon-intensity-marginal grid-carbon-intensity-production grid-carbon-intensity
2022 Google Cloud asia-east1 Taiwan TW TW Taiwan 25.0375,121.5625 0.18 0 453 453

manifest looks like this:

name: csv-demo
description:
tags:
initialize:
  plugins:
    cloud-instance-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: "cloud/provider
          region: "cloud/region"
          instance-type: "cloud/instance-type"
        output: "*"
tree:
  children:
    child:
      pipeline:
        - cloud-region-metadata
        - cloud-instance-metadata
        - extract-processor-name # some regexp plugin magic?
        - tdp-finder
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          cloud/provider: gcp
          cloud/region: asia-east

IF should return this output:

name: csv-demo
description:
tags:
initialize:
  plugins:
    cloud-instance-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: "cloud/provider
          region: "cloud/region"
          instance-type: "cloud/instance-type"
        output: "*"
tree:
  children:
    child:
      pipeline:
        - cloud-region-metadata
        - cloud-instance-metadata
        - extract-processor-name # some regexp plugin magic?
        - tdp-finder
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          cloud/provider: gcp
          cloud/region: asia-east
      outputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          cloud/provider: gcp
          cloud/region: asia-east
          cfe-region: Taiwan
          em-zone-id: TW
          wt-region-id: TW
          location: Taiwan
          geolocation: 25.0375,121.5625
          cfe-hourly: 0.18
          cfe-annual: nan
          power-usage-efficiency: nan
          net-carbon: 0
          grid-carbon-intensity-24x7: 453
          grid-carbon-intensity-consumption: nan
          grid-carbon-intensity-marginal: nan
          grid-carbon-intensity-production: nan
          grid-carbon-intensity: 453

Note this example should also work identically if filepath is a url directing to a file on the internet rather than the local filesystem, in which case the filepath would be a url such as https://some-website.xyz/data.csv.

Example 2: tdp-finder
CSV file looks like this:

name tdp
AMD A10-9700 65.0

Running this manifest should replicate the functionality of the current tdp-finder plugin:

name: csv-demo
description:
tags:
initialize:
  plugins:
    cloud-instance-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          name: "instance-id"
        output: "tdp"
tree:
  children:
    child:
      pipeline:
        - cloud-instance-metadata
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          instance-id: "AMD A10-9700"

Running ie -m manifest.yml -o output.yml should yield the following:

name: csv-demo
description:
tags:
initialize:
  plugins:
    tdp-finder:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          name: "instance-id"
        output: "tdp"
tree:
  children:
    child:
      pipeline:
        - tdp-finder
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          instance-id: "AMD A10-9700"
      outputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          instance-id: "AMD A10-9700"
          tdp: 65.0

Example 3: region-metadata

CSV looks like this:

year cloud-provider cloud-region cfe-region em-zone-id wt-region-id location geolocation cfe-hourly cfe-annual power-usage-efficiency net-carbon grid-carbon-intensity-24x7 grid-carbon-intensity-consumption grid-carbon-intensity-marginal grid-carbon-intensity-production grid-carbon-intensity
2022 Google Cloud asia-east1 Taiwan TW TW Taiwan 25.0375,121.5625 0.18 0 453 453

Running the following manifest achieves the expected behaviour of our cloud-region-metadata plugin:

name: csv-demo
description:
tags:
initialize:
  plugins:
    cloud-region-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: "cloud/provider" 
          region: "cloud/region"
        output: "*"
tree:
  children:
    child:
      pipeline:
        - cloud-region-metadata
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          instance-id: "AMD A10-9700"
      outputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          instance-id: "AMD A10-9700"
         cloud-provider: "gcp"
         cloud-region: "asia-east"

This should yield:

name: csv-demo
description:
tags:
initialize:
  plugins:
    cloud-region-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: "cloud/provider" 
          region: "cloud/region"
        output: "*"
tree:
  children:
    child:
      pipeline:
        - cloud-region-metadata
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          instance-id: "AMD A10-9700"
      outputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          instance-id: "AMD A10-9700"
         cloud-provider: "gcp"
         cloud-region: "asia-east"
         cfe-region: Taiwan
         em-zone-id: TW
         wt-region-id: TW
         location: Taiwan
         geolocation: 25.0375,121.5625
         cfe-hourly: 0.18
         cfe-annual: nan
         power-usage-efficiency: nan
         net-carbon: 0
         grid-carbon-intensity-24x7: 453
         grid-carbon-intensity-consumption: nan
         grid-carbon-intensity-marginal: nan
         grid-carbon-intensity-production: nan
         grid-carbon-intensity: 453

Example 4: composite csv-lookups

We should be able to chain the preceding examples together as follows:

name: csv-demo
description:
tags:
initialize:
  plugins:
    cloud-region-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: cloud/provider
          region: cloud/region
        output: "*"
    cloud-instance-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: cloud/provider
          region: cloud/region
          instance-type: cloud/instance-type
        output: "*"
    tdp-finder:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          processor-name: cpu/processor-name
        output:
          tdp: cpu/thermal-design-power
tree:
  children:
    child:
      pipeline:
        - cloud-region-metadata
        - cloud-instance-metadata
        - extract-processor-name # might require regex to isolate first processor from list returned by instance-metadata
        - tdp-finder
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          cloud/provider: gcp
          cloud/region: asia-east

This should return the following (assuming some regex was done top isolate one processor from the returned list and assign it to `cpu/processor-name)

name: csv-demo
description:
tags:
initialize:
  plugins:
    cloud-region-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: cloud/provider
          region: cloud/region
        output: "*"
    cloud-instance-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: cloud/provider
          region: cloud/region
          instance-type: cloud/instance-type
        output: "*"
    tdp-finder:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          processor-name: cpu/processor-name
        output:
          tdp: cpu/thermal-design-power
tree:
  children:
    child:
      pipeline:
        - cloud-region-metadata
        - cloud-instance-metadata
        - extract-processor-name # regex to isolate first processor from list returned by instance-metadata
        - tdp-finder
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          cloud/provider: gcp
          cloud/region: asia-east
      outputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          cpu-cores-available': 52
          cpu-cores-utilized: 1
          cpu-manufacturer: Intel
          cpu-model-name: Intel® Xeon® Platinum 8272CL,Intel® Xeon® 8171M 2.1 GHz,Intel® Xeon® E5-2673 v4 2.3 GHz,Intel® Xeon® E5-2673 v3 2.4 GHz
          cpu/processor-name: Intel® Xeon® Platinum 8272CL
          cpu-tdp: 205
          gpu-count: nan
          gpu-model-name: nan
          gpu-tdp: nan
          memory-available: 2.0
          cloud-provider: "gcp"
          cloud-region: "asia-east"
          cfe-region: Taiwan
          em-zone-id: TW
          wt-region-id: TW
          location: Taiwan
          geolocation: 25.0375,121.5625
          cfe-hourly: 0.18
          cfe-annual: nan
          power-usage-efficiency: nan
          net-carbon: 0
          grid-carbon-intensity-24x7: 453
          grid-carbon-intensity-consumption: nan
          grid-carbon-intensity-marginal: nan
          grid-carbon-intensity-production: nan
          grid-carbon-intensity: 453
  • Unit tests exists with 100% coverage over csv-lookup
    Given (Setup): a user has downloaded and installed if-plugins
    When (Action): a user runs npx jest --coverage
    Then (Assertion): the coverage report should show that csv-lookup is 100% covered and passing

  • Documentation exists in plugin readme
    Given (Setup): the user visits the if-plugins repository
    When the user navigates to src/lib/csv-lookup
    Then the user sees a README containing documentation describing the csv-lookup plugin, copying the format from the other plugin readmes.

  • Link to README documentation exists in if.greensoftware.foundation
    Given: the user is on if.greensoftware.foundation
    When (Action): they navigate to reference/plugins and find the csv-lookup plugin section
    Then (Assertion): they see a link to the plugin readme for the exponent plugin

  • Example manifests exists
    Given: the user has downloaded and installed if
    When (Action): the user navigates to if/manifests/plugins
    Then (Assertion): they see manifests that include the csv-lookup plugin

@jmcook1186
Copy link
Contributor

@pazbardanl drafted a ticket - would be grateful for feedback on the design.

@jmcook1186 jmcook1186 moved this from In Design to In Refinement in IF Apr 29, 2024
@zanete zanete moved this from In Refinement to Ready in IF May 7, 2024
@zanete zanete moved this from Ready to In Design in IF May 9, 2024
@zanete zanete assigned jmcook1186 and unassigned pazbardanl May 9, 2024
@zanete
Copy link
Author

zanete commented May 9, 2024

putting this back in design for @jmcook1186 to review and specify the AC more concretely (as per our call with @jawache )

@jmcook1186
Copy link
Contributor

jmcook1186 commented May 9, 2024

@jawache should we support more complex queries than the one in the example above? My suggested config syntax only supports single conditions or multiple conditions linked by AND logic, but not OR.

You could do the equivalent of SELECT * FROM processor-names WHERE tdp > 200 AND manufacturer = Intel but NOT the equivalent of SELECT * FROM processor-names WHERE tdp > 200 OR tgp > 300 for example

EDIT: added a more detailed version of this question in the issue description - see big red question mark!

@jmcook1186 jmcook1186 changed the title Design a generic csv-lookup plugin [WIP]: Design a generic csv-lookup plugin May 10, 2024
@zanete zanete added the needs-response The issue has stalled because someone isn’t responding. label May 13, 2024
@jawache
Copy link
Contributor

jawache commented May 14, 2024

Hey @jmcook1186 so given that the driver of this plugin is to replace the cloud-metadata plugin (as well as the tdp-finder), let's use those as the drivers of this AC to make sure that afterwards, we really do end up with something that can replace all of those.

The key thing is to map this out as a pipeline, the inputs to this plugin will be the things it uses to query the CSV, it won't be statically configured normally I imagine.

filepath: https://some-file.xyz or a local file path (very important it should be able to load a file over the interenet with a HTTP://)
query: 
   name-of-column-in-csv: name-of-input-parameter
   name-of-column-in-csv: name-of-input-parameter
outputs:
   name-of-column-in-csv: name-of-output-parameter
   name-of-column-in-csv: name-of-output-parameter
  • If no outputs are specified then it outputs all the columns in the csv as output params with the sane name

I do like the idea of the sqlite approach to give us some nice SQL queries, but also I do think we need a simpler UX for the very simple use cases (it also makes the config a lot easier to manage)

To refine this one @jmcook1186 I'd prefer if you:

  • Extracted the cloud metadata csv into their own files (for testing)
  • Created a config for cloud-instance-metadata (looking up cloud instance data from the cloud metadata csvs)
  • Created a config for cloud-region-metadata (looking up cloud regional from the cloud metadata csvs)
  • Created a config for cpu-tdp-metadata (looking up tdp data from processor name from the cloud metadata csvs)

Then let's see what's required to marry all three of those plugins together to mirror the existing cloud metadata functionality.

@jmcook1186
Copy link
Contributor

jmcook1186 commented May 15, 2024

Ok @jawache - here's a minimal spec for a csv lookup plugin that can replace cloud-metadata, tdp-finder and instance-metadata with the simplest interface I can come up with. It has to support multiple target and multiple selector columns to replace cloud metadata and region metadata plugins.

Inputs

Makes sense to instantiate without global config and re-use per component by passing in new file and query params at the node level, in my opinion.

node-level-config:
  filepath: https://some-file.xyz 
  target-column: ''
  selector-column: ''
  selector-value: '' 

Logic

The plugin should apply the following logic:

  • grabs value from target column where values in target column are equal to values in selector-value
  • supports multiple selector-column and selector-value by passing arrays, e.g.
     selector-column: ['cloud-provider', 'region'] 
     selector-value: ['gcp', asia-east']
    
  • Supports wildcard or multiple entries for target column, e.g. target columns: [*] returns values from all columns other than the target-column
  • uses target column names as keys in output data
  • returns NaN for missing data
  • if there are multiple valid responses for a query, return the FIRST

Examples

cloud instance metadata

The csv file looks like this:

cpu-cores-available cpu-cores-utilized cpu-manufacturer cpu-model-name cpu-tdp gpu-count gpu-model-name gpu-tdp instance-class memory-available
52 1 Intel Intel® Xeon® Platinum 8272CL,Intel® Xeon® 8171M 2.1 GHz,Intel® Xeon® E5-2673 v4 2.3 GHz,Intel® Xeon® E5-2673 v3 2.4 GHz 205 Standard_A1_v2 2.0

A manifest that executes a lookup for instance metadata looks as follows (uses wildcard to retrieve all columns where target column/value match):

name: csv-demo
description:
tags:
initialize:
  plugins:
    csv-lookup:
      method: CsvLookup
      path: "builtins"
tree:
  children:
    child:
      pipeline:
        - sum
      config:
        csv-lookup:
          filepath: https://some-file.xyz
          target-column: *
          selector-column: 'instance-class'
          selector-value: 'Standard_A1_v2'
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001

should yield the following output:

name: csv-demo
description:
tags:
initialize:
  plugins:
    csv-lookup:
      method: CsvLookup
      path: "builtins"
execution:
 ...
tree:
  children:
    child:
      pipeline:
        - sum
      config:
        csv-lookup:
          filepath: https://some-file.xyz
          target-column: [*]
          selector-column: 'instance-class'
          selector-value: 'Standard_A1_v2'
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001
      outputs:
         - timestamp: 2023-08-06T00:00
           duration: 3600
           cpu/energy: 0.001
           network/energy: 0.001
          cpu-cores-available': 52
          cpu-cores-utilized: 1
          cpu-manufacturer: Intel
          cpu-model-name: Intel® Xeon® Platinum 8272CL,Intel® Xeon® 8171M 2.1 GHz,Intel® Xeon® E5-2673 v4 2.3 GHz,Intel® Xeon® E5-2673 v3 2.4 GHz
          cpu-tdp: 205
          gpu-count: nan
          gpu-model-name: nan
          gpu-tdp: nan
          memory-available: 2.0

tdp-finder

The csv looks as follows:

name tdp
AMD A10-9700 65.0

A manifest that executes a lookup for tdp looks as follows (uses single target and selector column - simplest case):

name: csv-demo
description:
tags:
initialize:
  plugins:
    csv-lookup:
      method: CsvLookup
      path: "builtins"
tree:
  children:
    child:
      pipeline:
        - sum
      config:
        csv-lookup:
          filepath: https://some-file.xyz
          target-column: 'tdp'
          selector-column: 'name'
          selector-value: 'AMD A10-9700'
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001

This should yield the following outputs:

name: csv-demo
description:
tags:
initialize:
  plugins:
    csv-lookup:
      method: CsvLookup
      path: "builtins"
tree:
  children:
    child:
      pipeline:
        - sum
      config:
        csv-lookup:
          filepath: https://some-file.xyz
          target-column: ['tdp']
          selector-column: 'name'
          selector-value: 'AMD A10-9700'
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001
          tdp: 65.0

cloud region metadata

The csv data looks as follows:

year cloud-provider cloud-region cfe-region em-zone-id wt-region-id location geolocation cfe-hourly cfe-annual power-usage-efficiency net-carbon grid-carbon-intensity-24x7 grid-carbon-intensity-consumption grid-carbon-intensity-marginal grid-carbon-intensity-production grid-carbon-intensity
2022 Google Cloud asia-east1 Taiwan TW TW Taiwan 25.0375,121.5625 0.18 0 453 453

A manifest that executes a lookup for region metadata is as follows (uses wildcard to select all columns, uses multiple selector columns):

name: csv-demo
description:
tags:
initialize:
  plugins:
    csv-lookup:
      method: CsvLookup
      path: "builtins"
tree:
  children:
    child:
      pipeline:
        - sum
      config:
        csv-lookup:
          filepath: https://some-file.xyz
          target-column: ['*']
          selector-column: ['cloud-provider', 'region']
          selector-value: ['gcp', asia-east']
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001

This should yield the following outputs:

name: csv-demo
description:
tags:
initialize:
  plugins:
    csv-lookup:
      method: CsvLookup
      path: "builtins"
tree:
  children:
    child:
      pipeline:
        - sum
      config:
        csv-lookup:
          filepath: https://some-file.xyz
          target-column: ['*']
          selector-column: ['cloud-provider', 'region']
          selector-value: ['gcp', asia-east']
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001
      outputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001
          cfe-region: Taiwan
          em-zone-id: TW
          wt-region-id: TW
          location: Taiwan
          geolocation: 25.0375,121.5625
          cfe-hourly: 0.18
          cfe-annual: nan
          power-usage-efficiency: nan
          net-carbon: 0
          grid-carbon-intensity-24x7: 453
          grid-carbon-intensity-consumption: nan
          grid-carbon-intensity-marginal: nan
          grid-carbon-intensity-production: nan
          grid-carbon-intensity: 453

If you agree, I'll work this up more thoroughly int he issue description.

@jawache
Copy link
Contributor

jawache commented May 15, 2024

Hey @jmcook1186 somewhat agree but the key thing is that the inputs to the query will come in as observation inputs, not static configuration.

So to mirror the existing functionality it will be something like so:

name: csv-demo
description:
tags:
initialize:
  plugins:
    cloud-region-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: cloud/provider
          region: cloud/region
        output: "*"
    cloud-instance-metadata:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          cloud-provider: cloud/provider
          region: cloud/region
          instance-type: cloud/instance-type
        output: "*"
    tdp-finder:
      method: CsvLookup
      path: "builtins"
      global-config:
        filepath: https://some-file.xyz
        query:
          processor-name: cpu/processor-name
        output:
          tdp: cpu/thermal-design-power
tree:
  children:
    child:
      pipeline:
        - cloud-region-metadata
        - cloud-instance-metadata
        - extract-processor-name # some regexp plugin magic?
        - tdp-finder
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          cloud/provider: gcp
          cloud/region: asia-east

And i'd suggest a slightly different interface to handle the situation when the input parameter names won't match the column headings in the CSV.

      global-config:
        filepath: https://some-file.xyz
        query:
          <column-name-in-csv>: <input-parameter-name>
          <column-name-in-csv>: <input-parameter-name>
        output:
          <column-name-in-csv>: <output-parameter-name>
          <column-name-in-csv>: <output-parameter-name>

@jmcook1186
Copy link
Contributor

Aaah of course - yes, obviously they need to come as inputs so the lookups can chain together. I'm on your page now @jawache - thanks.

@jmcook1186 jmcook1186 moved this from In Design to In Refinement in IF May 16, 2024
@zanete zanete assigned manushak and unassigned jawache May 16, 2024
@zanete
Copy link
Author

zanete commented May 16, 2024

@manushak please review the AC if it makes sense and if can be sized (in hours)
EDIT: Please disregard, as we discussed in planning, this should be done by @narekhovhannisyan

@jmcook1186 jmcook1186 changed the title [WIP]: Design a generic csv-lookup plugin Design a generic csv-lookup plugin May 20, 2024
@jmcook1186 jmcook1186 changed the title Design a generic csv-lookup plugin Build a generic csv-lookup plugin May 20, 2024
@narekhovhannisyan
Copy link
Member

@zanete I forgot the t-shirt sizing table, however, in ordinary units, it will take approx. 2 days (including unit tests, and documentation updates)

@zanete
Copy link
Author

zanete commented May 28, 2024

@jmcook1186 please update the description to say global so there's no discrepancy between the comments and issue description. @narekhovhannisyan and you should have a 10 min sync to make sure it's all interpreted correctly

@zanete zanete moved this from In Refinement to In Progress in IF May 28, 2024
@zanete zanete removed the needs-response The issue has stalled because someone isn’t responding. label May 28, 2024
@narekhovhannisyan
Copy link
Member

narekhovhannisyan commented May 28, 2024

@jawache @jmcook1186 I'm unable to use output: "processor-name": "processor-model-id" or output: ["processor-name", "tdp"]: ["processor-model-id","thermal-design-power"]. jsyaml can't parse both. VScode is erroring out on it too.

console.log(jsYaml.load('output: "processor-name": "processor-model-id"'));

YAMLException: bad indentation of a mapping entry (1:25)

 1 | output: "processor-name": "processor-model-id"
-----------------------------^
    at generateError (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:183:10)
    at throwError (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:187:9)
    at readBlockMapping (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1182:7)
    at composeNode (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1441:12)
    at readDocument (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1625:3)
    at loadDocuments (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1688:5)
    at Object.load (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1714:19)
    at Object.<anonymous> (/Users/admin/Projects/uk/if/_test.ts:30:20)
    at Module._compile (node:internal/modules/cjs/loader:1356:14)
    at Module.m._compile (/Users/admin/.nvm/versions/node/v18.19.0/lib/node_modules/ts-node/src/index.ts:1618:23) {
  reason: 'bad indentation of a mapping entry',
  mark: {
    name: null,
    buffer: 'output: "processor-name": "processor-model-id"\n',
    position: 24,
    line: 0,
    column: 24,
    snippet: ' 1 | output: "processor-name": "processor-model-id"\n' +
      '-----------------------------^'
  }
}

AND

console.log(
  jsYaml.load(
    'output: ["processor-name", "tdp"]: ["processor-model-id","thermal-design-power"]'
  )
);


YAMLException: bad indentation of a mapping entry (1:34)

 1 | output: ["processor-name", "tdp"]: ["processor-model-id","thermal ...
--------------------------------------^
    at generateError (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:183:10)
    at throwError (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:187:9)
    at readBlockMapping (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1182:7)
    at composeNode (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1441:12)
    at readDocument (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1625:3)
    at loadDocuments (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1688:5)
    at Object.load (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1714:19)
    at Object.<anonymous> (/Users/admin/Projects/uk/if/_test.ts:31:10)
    at Module._compile (node:internal/modules/cjs/loader:1356:14)
    at Module.m._compile (/Users/admin/.nvm/versions/node/v18.19.0/lib/node_modules/ts-node/src/index.ts:1618:23) {
  reason: 'bad indentation of a mapping entry',
  mark: {
    name: null,
    buffer: 'output: ["processor-name", "tdp"]: ["processor-model-id","thermal-design-power"]\n',
    position: 33,
    line: 0,
    column: 33,
    snippet: ' 1 | output: ["processor-name", "tdp"]: ["processor-model-id","thermal ...\n' +
      '--------------------------------------^'
  }
}

So my advice is to use arrays and matrix for the same purpose:

output: ["processor-name": "processor-model-id"] // first case

output: [["processor-name", "processor-model-id"],["tdp","thermal-design-power"]]

@jmcook1186
Copy link
Contributor

ok yes, the colon separator is going to be a nightmare for yaml parsing.
Your way is better - provide sub-arrays where the first element is the initial name, the second element is the replacement.

@zanete
Copy link
Author

zanete commented May 30, 2024

Expecting a PR by end of Friday

@narekhovhannisyan narekhovhannisyan linked a pull request May 31, 2024 that will close this issue
9 tasks
@narekhovhannisyan narekhovhannisyan moved this from In Progress to Pending Review in IF Jun 2, 2024
@zanete zanete moved this from Pending Review to Testing in IF Jun 3, 2024
@zanete
Copy link
Author

zanete commented Jun 3, 2024

@MariamKhalatova please QA :)

@zanete
Copy link
Author

zanete commented Jun 4, 2024

@narekhovhannisyan currently fixing issues raised by @MariamKhalatova

@zanete zanete moved this from Testing to In Progress in IF Jun 4, 2024
@narekhovhannisyan narekhovhannisyan moved this from In Progress to Pending Review in IF Jun 5, 2024
@zanete zanete removed the epic: QA label Jun 5, 2024
@zanete
Copy link
Author

zanete commented Jun 5, 2024

@MariamKhalatova could you review the bug fixes? 🙏

@zanete zanete moved this from Pending Review to Testing in IF Jun 5, 2024
@github-project-automation github-project-automation bot moved this from Testing to Done in IF Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

7 participants