-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build a generic csv-lookup
plugin
#674
Comments
@pazbardanl drafted a ticket - would be grateful for feedback on the design. |
putting this back in design for @jmcook1186 to review and specify the AC more concretely (as per our call with @jawache ) |
@jawache should we support more complex queries than the one in the example above? My suggested config syntax only supports single conditions or multiple conditions linked by AND logic, but not OR. You could do the equivalent of EDIT: added a more detailed version of this question in the issue description - see big red question mark! |
csv-lookup
plugincsv-lookup
plugin
Hey @jmcook1186 so given that the driver of this plugin is to replace the cloud-metadata plugin (as well as the tdp-finder), let's use those as the drivers of this AC to make sure that afterwards, we really do end up with something that can replace all of those. The key thing is to map this out as a pipeline, the inputs to this plugin will be the things it uses to query the CSV, it won't be statically configured normally I imagine.
I do like the idea of the sqlite approach to give us some nice SQL queries, but also I do think we need a simpler UX for the very simple use cases (it also makes the config a lot easier to manage) To refine this one @jmcook1186 I'd prefer if you:
Then let's see what's required to marry all three of those plugins together to mirror the existing cloud metadata functionality. |
Ok @jawache - here's a minimal spec for a csv lookup plugin that can replace InputsMakes sense to instantiate without global config and re-use per component by passing in new file and query params at the node level, in my opinion.
LogicThe plugin should apply the following logic:
Examplescloud instance metadataThe csv file looks like this:
A manifest that executes a lookup for instance metadata looks as follows (uses wildcard to retrieve all columns where target column/value match): name: csv-demo
description:
tags:
initialize:
plugins:
csv-lookup:
method: CsvLookup
path: "builtins"
tree:
children:
child:
pipeline:
- sum
config:
csv-lookup:
filepath: https://some-file.xyz
target-column: *
selector-column: 'instance-class'
selector-value: 'Standard_A1_v2'
inputs:
- timestamp: 2023-08-06T00:00
duration: 3600
cpu/energy: 0.001
network/energy: 0.001 should yield the following output: name: csv-demo
description:
tags:
initialize:
plugins:
csv-lookup:
method: CsvLookup
path: "builtins"
execution:
...
tree:
children:
child:
pipeline:
- sum
config:
csv-lookup:
filepath: https://some-file.xyz
target-column: [*]
selector-column: 'instance-class'
selector-value: 'Standard_A1_v2'
inputs:
- timestamp: 2023-08-06T00:00
duration: 3600
cpu/energy: 0.001
network/energy: 0.001
outputs:
- timestamp: 2023-08-06T00:00
duration: 3600
cpu/energy: 0.001
network/energy: 0.001
cpu-cores-available': 52
cpu-cores-utilized: 1
cpu-manufacturer: Intel
cpu-model-name: Intel® Xeon® Platinum 8272CL,Intel® Xeon® 8171M 2.1 GHz,Intel® Xeon® E5-2673 v4 2.3 GHz,Intel® Xeon® E5-2673 v3 2.4 GHz
cpu-tdp: 205
gpu-count: nan
gpu-model-name: nan
gpu-tdp: nan
memory-available: 2.0 tdp-finderThe csv looks as follows:
A manifest that executes a lookup for tdp looks as follows (uses single target and selector column - simplest case): name: csv-demo
description:
tags:
initialize:
plugins:
csv-lookup:
method: CsvLookup
path: "builtins"
tree:
children:
child:
pipeline:
- sum
config:
csv-lookup:
filepath: https://some-file.xyz
target-column: 'tdp'
selector-column: 'name'
selector-value: 'AMD A10-9700'
inputs:
- timestamp: 2023-08-06T00:00
duration: 3600
cpu/energy: 0.001
network/energy: 0.001 This should yield the following outputs: name: csv-demo
description:
tags:
initialize:
plugins:
csv-lookup:
method: CsvLookup
path: "builtins"
tree:
children:
child:
pipeline:
- sum
config:
csv-lookup:
filepath: https://some-file.xyz
target-column: ['tdp']
selector-column: 'name'
selector-value: 'AMD A10-9700'
inputs:
- timestamp: 2023-08-06T00:00
duration: 3600
cpu/energy: 0.001
network/energy: 0.001
tdp: 65.0 cloud region metadataThe csv data looks as follows:
A manifest that executes a lookup for region metadata is as follows (uses wildcard to select all columns, uses multiple selector columns): name: csv-demo
description:
tags:
initialize:
plugins:
csv-lookup:
method: CsvLookup
path: "builtins"
tree:
children:
child:
pipeline:
- sum
config:
csv-lookup:
filepath: https://some-file.xyz
target-column: ['*']
selector-column: ['cloud-provider', 'region']
selector-value: ['gcp', asia-east']
inputs:
- timestamp: 2023-08-06T00:00
duration: 3600
cpu/energy: 0.001
network/energy: 0.001 This should yield the following outputs: name: csv-demo
description:
tags:
initialize:
plugins:
csv-lookup:
method: CsvLookup
path: "builtins"
tree:
children:
child:
pipeline:
- sum
config:
csv-lookup:
filepath: https://some-file.xyz
target-column: ['*']
selector-column: ['cloud-provider', 'region']
selector-value: ['gcp', asia-east']
inputs:
- timestamp: 2023-08-06T00:00
duration: 3600
cpu/energy: 0.001
network/energy: 0.001
outputs:
- timestamp: 2023-08-06T00:00
duration: 3600
cpu/energy: 0.001
network/energy: 0.001
cfe-region: Taiwan
em-zone-id: TW
wt-region-id: TW
location: Taiwan
geolocation: 25.0375,121.5625
cfe-hourly: 0.18
cfe-annual: nan
power-usage-efficiency: nan
net-carbon: 0
grid-carbon-intensity-24x7: 453
grid-carbon-intensity-consumption: nan
grid-carbon-intensity-marginal: nan
grid-carbon-intensity-production: nan
grid-carbon-intensity: 453 If you agree, I'll work this up more thoroughly int he issue description. |
Hey @jmcook1186 somewhat agree but the key thing is that the inputs to the query will come in as observation inputs, not static configuration. So to mirror the existing functionality it will be something like so: name: csv-demo
description:
tags:
initialize:
plugins:
cloud-region-metadata:
method: CsvLookup
path: "builtins"
global-config:
filepath: https://some-file.xyz
query:
cloud-provider: cloud/provider
region: cloud/region
output: "*"
cloud-instance-metadata:
method: CsvLookup
path: "builtins"
global-config:
filepath: https://some-file.xyz
query:
cloud-provider: cloud/provider
region: cloud/region
instance-type: cloud/instance-type
output: "*"
tdp-finder:
method: CsvLookup
path: "builtins"
global-config:
filepath: https://some-file.xyz
query:
processor-name: cpu/processor-name
output:
tdp: cpu/thermal-design-power
tree:
children:
child:
pipeline:
- cloud-region-metadata
- cloud-instance-metadata
- extract-processor-name # some regexp plugin magic?
- tdp-finder
inputs:
- timestamp: 2023-08-06T00:00
duration: 3600
cpu/energy: 0.001
cloud/provider: gcp
cloud/region: asia-east And i'd suggest a slightly different interface to handle the situation when the input parameter names won't match the column headings in the CSV.
|
Aaah of course - yes, obviously they need to come as inputs so the lookups can chain together. I'm on your page now @jawache - thanks. |
@manushak please review the AC if it makes sense and if can be sized (in hours) |
csv-lookup
plugincsv-lookup
plugin
csv-lookup
plugincsv-lookup
plugin
@zanete I forgot the |
@jmcook1186 please update the description to say global so there's no discrepancy between the comments and issue description. @narekhovhannisyan and you should have a 10 min sync to make sure it's all interpreted correctly |
@jawache @jmcook1186 I'm unable to use console.log(jsYaml.load('output: "processor-name": "processor-model-id"'));
YAMLException: bad indentation of a mapping entry (1:25)
1 | output: "processor-name": "processor-model-id"
-----------------------------^
at generateError (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:183:10)
at throwError (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:187:9)
at readBlockMapping (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1182:7)
at composeNode (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1441:12)
at readDocument (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1625:3)
at loadDocuments (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1688:5)
at Object.load (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1714:19)
at Object.<anonymous> (/Users/admin/Projects/uk/if/_test.ts:30:20)
at Module._compile (node:internal/modules/cjs/loader:1356:14)
at Module.m._compile (/Users/admin/.nvm/versions/node/v18.19.0/lib/node_modules/ts-node/src/index.ts:1618:23) {
reason: 'bad indentation of a mapping entry',
mark: {
name: null,
buffer: 'output: "processor-name": "processor-model-id"\n',
position: 24,
line: 0,
column: 24,
snippet: ' 1 | output: "processor-name": "processor-model-id"\n' +
'-----------------------------^'
}
} AND console.log(
jsYaml.load(
'output: ["processor-name", "tdp"]: ["processor-model-id","thermal-design-power"]'
)
);
YAMLException: bad indentation of a mapping entry (1:34)
1 | output: ["processor-name", "tdp"]: ["processor-model-id","thermal ...
--------------------------------------^
at generateError (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:183:10)
at throwError (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:187:9)
at readBlockMapping (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1182:7)
at composeNode (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1441:12)
at readDocument (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1625:3)
at loadDocuments (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1688:5)
at Object.load (/Users/admin/Projects/uk/if/node_modules/js-yaml/lib/loader.js:1714:19)
at Object.<anonymous> (/Users/admin/Projects/uk/if/_test.ts:31:10)
at Module._compile (node:internal/modules/cjs/loader:1356:14)
at Module.m._compile (/Users/admin/.nvm/versions/node/v18.19.0/lib/node_modules/ts-node/src/index.ts:1618:23) {
reason: 'bad indentation of a mapping entry',
mark: {
name: null,
buffer: 'output: ["processor-name", "tdp"]: ["processor-model-id","thermal-design-power"]\n',
position: 33,
line: 0,
column: 33,
snippet: ' 1 | output: ["processor-name", "tdp"]: ["processor-model-id","thermal ...\n' +
'--------------------------------------^'
}
} So my advice is to use arrays and matrix for the same purpose: output: ["processor-name": "processor-model-id"] // first case
output: [["processor-name", "processor-model-id"],["tdp","thermal-design-power"]] |
ok yes, the colon separator is going to be a nightmare for yaml parsing. |
Expecting a PR by end of Friday |
@MariamKhalatova please QA :) |
@narekhovhannisyan currently fixing issues raised by @MariamKhalatova |
@MariamKhalatova could you review the bug fixes? 🙏 |
Sub of: #656
What
Create a new generic
csv-lookup
plugin in 'if/builtins`Why
We want to support as many pipelines as possible using generic plugins that can be adapted to many use cases. We currently have
sum
,multiply
,coefficient
. We also need to supportcsv-lookup
that enable a user to grab arbitrary data from a given csv file. This would also allow us to deprecate some of our current plugins that grab data from a csv file - instead the file can be hosted somewhere and the data accessed using this generic csv-lookup instead.Prerequisites/resources
None
SoW (scope of work)
builtins
builtins
if.greensoftware.foundation
if
repo demonstrating usagePlugin details
The following elements should be included in global config
filepath
: path to a csv file, either on the local filesystem or on the internetoutput
: the columns to grab data from and add tooutput
data - should support wildcard (*
) or multiple values.query
: an array of key/value pairs where the key is a column name in the target csv and the value is a parameter frominputs
, so that the following configs should work providedcloud/provider
,region
andinstance-type
are available indefaults
orinputs
:This config contains the location of the target file in
filepath
. It should download the file if it is not already in the local filesystem. Then the plugin should query the CSV data using the query parameters. The resulting data should be added to the output data as an array using the field name defined inoutput-parameter
.Plugin logic
The plugin should apply the following logic:
output
column where values inquery
conditions are satisfiedquery
section of global config.outputs: [*]
returns values from all columns other than the query parameters.output
field in config, e.g. to grab data fromprocessor-name
column in csv and name itprocessor-model-id
in output data:This renaming should also work with multiple entries, such as:
or even wildcards, erroring out if the number of columns accessed using the wildcard does not match the number of names provided on the RHS of the colon.
Acceptance criteria
csv-lookup
exists in theif-plugins
repositoryGiven (Setup): the
csv-lookup
plugin existsWhen (Action): a user has downloaded and installed
if
andif-plugins
Then (Assertion): the user should be able to query data from csv files stored locally or online.
Example 1: cloud-metadata
Running this manifest should replicate the functionality of the current
cloud-metadata
plugin:csv file has following structure:
manifest looks like this:
IF should return this output:
Note this example should also work identically if filepath is a url directing to a file on the internet rather than the local filesystem, in which case the filepath would be a url such as
https://some-website.xyz/data.csv
.Example 2: tdp-finder
CSV file looks like this:
Running this manifest should replicate the functionality of the current
tdp-finder
plugin:Running
ie -m manifest.yml -o output.yml
should yield the following:Example 3: region-metadata
CSV looks like this:
Running the following manifest achieves the expected behaviour of our
cloud-region-metadata
plugin:This should yield:
Example 4: composite csv-lookups
We should be able to chain the preceding examples together as follows:
This should return the following (assuming some regex was done top isolate one processor from the returned list and assign it to `cpu/processor-name)
Unit tests exists with 100% coverage over
csv-lookup
Given (Setup): a user has downloaded and installed
if-plugins
When (Action): a user runs
npx jest --coverage
Then (Assertion): the coverage report should show that
csv-lookup
is 100% covered and passingDocumentation exists in plugin readme
Given (Setup): the user visits the
if-plugins
repositoryWhen the user navigates to
src/lib/csv-lookup
Then the user sees a README containing documentation describing the
csv-lookup
plugin, copying the format from the other plugin readmes.Link to README documentation exists in if.greensoftware.foundation
Given: the user is on if.greensoftware.foundation
When (Action): they navigate to
reference/plugins
and find thecsv-lookup
plugin sectionThen (Assertion): they see a link to the plugin readme for the exponent plugin
Example manifests exists
Given: the user has downloaded and installed
if
When (Action): the user navigates to
if/manifests/plugins
Then (Assertion): they see manifests that include the
csv-lookup
pluginThe text was updated successfully, but these errors were encountered: