Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(forge): Add internal metrics capability #3607

Open
lucas-manuel opened this issue Nov 3, 2022 · 15 comments
Open

feat(forge): Add internal metrics capability #3607

lucas-manuel opened this issue Nov 3, 2022 · 15 comments
Labels
A-testing Area: testing C-forge Command: forge T-feature Type: feature
Milestone

Comments

@lucas-manuel
Copy link

lucas-manuel commented Nov 3, 2022

Component

Forge

Describe the feature you would like

When running invariant tests with an actor-based pattern, there is currently a lack of visibility for:

  1. How many times a function is called
  2. What code paths within the function are reached
  3. Other general metrics (e.g., how many LPs deposit total)

Having a cheatcode that would allow storing this information with arbitrary keys would be useful to be able to render this info in a summary table at the end of a fuzzing campaign.

@gakonst mentioned [prometheus] metrics as a good reference for this.

Examples:

vm.counter("numberOfLps", numberOfLps++);

vm.counter(string.concat("loan_", vm.toString(loanNumber), "_payments"), payments[loan]);

Additional context

No response

@lucas-manuel lucas-manuel added the T-feature Type: feature label Nov 3, 2022
@gakonst
Copy link
Member

gakonst commented Nov 4, 2022

cc @onbjerg @mattsse I'm still a bit abstract on this but I was thinking of exposing an API via cheatcodes similar to https://docs.rs/metrics/latest/metrics/, and on the CLI we would collect all these metrics and custom log them in a table

@gakonst
Copy link
Member

gakonst commented Nov 4, 2022

@lucas-manuel can you give an example of how your ideal reporting would look like? a table? something else? maybe there should be plots like this https://docs.rs/tui/latest/tui/widgets/struct.Chart.html for stuff on how it evolves over time (e.g. price as volatility happens)? makes me think that this may allow creating automatic simulation / stress test reports like Gauntlet Network does.

@lucas-manuel
Copy link
Author

lucas-manuel commented Nov 4, 2022

@gakonst Yeah personally I think the lowest hanging fruit would be to log in a table similar to --gas-report. I'd mainly be using this for debugging purposes so it'd be useful as a numerical log output.

Going forward though for the more sophisticated use cases we discussed it would be interesting to export to some sort of JSON that could be used to generate more visual reporting (could be added to CI as an artifact for example).

@gakonst
Copy link
Member

gakonst commented Nov 4, 2022

@FrankieIsLost @transmissions11 had some thoughts on this which I'd love if they shared in the thread :)

@transmissions11
Copy link
Contributor

transmissions11 commented Nov 5, 2022

Great idea to give users programmable insight into their invariant campaigns, been a big advocate of this for a small while now.

IMO giving devs a better understanding of the coverage of their invariant tests and tools to effectively debug them is far more valuable than building smarter fuzzers after a certain point, because once a weakness is identified its not too hard to guide the fuzzer towards it, as opposed to a genius fuzzer thats a total black box from a dev's perspective, which offers little insight into how secure a piece of code is and whether a dev can be confident in the coverage of the run. Humans and fuzzers should work tandem!

In terms of actual design:

  • A cheatcode to enter data into basically a time series database a la prometheus seems like a great way to go
  • Ideally we export tables by default, but also provide the option for timestamped JSON, etc.
    • Timestampped JSON export would enable auditors and such to generate charts like this (ref: Gearbox Fuzzing): Screen Shot 2022-11-04 at 9 12 42 PM
  • Would be nice to have metric types beyond just counter. For inspiration, prometheus offers 3 core types:
    • Gauge
    • Counter
    • Histogram
  • For tracking things like number of lps, I think it would be easiest on devs if we added a postCall hook of sorts (imagine setUp but called after every random call in an invariant run), where devs can just bulk query and update all their storage variables
  • However, there are still situations where calling the cheatcode dynamically in a test would be preferred (like tracking details about a specific mapping key which is hard to identify retroactively), so good to have the option to call dynamically in tests and contracts as well
  • One other thing we might do to make it easier on devs to track a bunch of data on their invariant runs would be some sort of cheatcode or flag to auto assign a counter/gauge/histogram to a function, state variable, etc
    • That way if you wanted to check how many times a loan was originated, etc, you wouldn't necessarily have to write a separate contract instrumented will calls to vm.counter()
    • Maybe we should just pass the address called on and the calldata passed to the postCall hook and that would enable something like this? I think there's some interesting design space here
    • Some will certainly still choose to instrument, or do echinda style wrappers, but I think its nice to provide the quick and dirty option for devs who dont require such granularity
  • Finally, for the use case where devs want to instrument their contracts with calls to these counters and cannot read storage vars from test files:
    • We need to provide a way to get the current value of a guage/counter/etc
    • API could look something like this:
      • vm.incrementCounter("numberOfLps", 1)
      • vm.setCounter("numberOfLps", vm.readCounter("numberOfLps") + 1)
    • IMO this API is also just generally less of a PITA than maintaining your own storage variables.
    • Furthermore, isn't test state wiped from time to time during invariant campaigns? This has the benefit of enabling persistent state throughout a long campaign.

WDYT?

@rkrasiuk rkrasiuk added A-cheatcodes Area: cheatcodes C-forge Command: forge labels Nov 7, 2022
@horsefacts
Copy link

In addition to these more complex metrics, it would be very helpful to see a breakdown of calls/reverts by target contract + selector. For example, something like:

╭───────────────────────────────────────────┬─────────╮─────────╮
│ test/actors/Swapper.sol:Swapper contract  ┆         │         │
╞═══════════════════════════════════════════╪═════════╡═════════╡
│ Function Name                             ┆ calls   │ reverts │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤╌╌╌╌╌╌╌╌╌┤
│ swapGooForGobblers                        ┆ 1024    │ 612     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤╌╌╌╌╌╌╌╌╌┤
│ swapGobblersForGoo                        ┆ 1024    │ 12      │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤╌╌╌╌╌╌╌╌╌┤
│ swapGobblersForGobblers                   ┆ 1024    │ 64      │
╰───────────────────────────────────────────┴─────────╯─────────╯

This would be very helpful for writing and debugging new actor contracts.

@lucas-manuel
Copy link
Author

lucas-manuel commented Nov 15, 2022

@transmissions11

Love that Gearbox chart 👀

Yeah I agree that it would be better to design this without the need to persist storage within the contracts anywhere for this purpose. I like the suggestions of incrementCounter and setCounter. I also like the idea of a postCall hook, we've been hacking that together with modifiers inside the actors contracts but would definitely be better to have an official framework for it haha.

What are our next steps here?

I also completely agree with the summary table idea @horsefacts

@odyslam
Copy link
Contributor

odyslam commented Nov 16, 2022

Second gakonst for using Prometheus as the engine/standard and then we can either:

  • output charts using some charting library
  • serve them via an endpoint that can be scraped by Prometheus and visualize them/analyze them in Grafana.

@PaulRBerg
Copy link
Contributor

@lucas-manuel Until this issue gets implemented, do you know any way to output the call summary only at the end of the invariant tests run?

I saw that in your invariants-example repo, you have defined this function:

function invariant_call_summary() external view {
    console.log("\nCall Summary\n");
    ///
}

But, if I understand this correctly, this test function will be executed at the end of each run, which will slow down test runs.

Is there any way to output the call summary at the end of the invariant tests run? As ar as I know, there is no "after all" hook in Foundry (a function opposite to setUp), but maybe there is another way?

@gakonst
Copy link
Member

gakonst commented Jan 27, 2023

But, if I understand this correctly, this test function will be executed at the end of each run, which will slow down test runs.

Yeah it'll slow it down, but a bit only I would assume. We should probably add that kind of function invariant_summary() hook as an "after all".

@PaulRBerg
Copy link
Contributor

It depends on how many console.log statements there are .. we have quite a few and the run time is noticeably slower with the invariant summary function defined. Ended up renaming it to start with something other than invariant_ so that we can selectively activate it when we want to get a report.

hook as an "after all"

Would be super helpful.

@grandizzy
Copy link
Collaborator

grandizzy commented Mar 27, 2024

considering @transmissions11 comment -

I think a nice and easy way to have such metrics is by using OpenTelemetry (OTLP) open source standard for logs, metrics, and traces as it already provides crates to facilitate such integration, see rust bindings and crates

A big pro of this is that we can easily integrate with forge and support not only Prometheus but other tools at the same time by

  • using tracing_opentelemetry crate to collect custom forge test events (metrics layer export OTLP counter and histogram metrics from specific test events)
  • use opentelemetry-otlp exporter to push metrics to a collector (that is scraped after by tools like Prometheus), or directly to receivers like Jaeger / Prometheus / others

Didn't put too much thoughts on UX but at a first call there'll be

  • a cheatcote vm.createTestEvent() to record test data (which use event! macro to publish a new forge test event)
  • the metrics subscriber which receives the event and convert to an opentelemetry metric
  • the periodic metrics reader which exports in bulk to OTLP endpoint through metrics exporter

lmk what you guys think about this approach, think a PoC with such can be done quite easy but wouldn't spend time on it if not of interest. thanks

@grandizzy
Copy link
Collaborator

grandizzy commented Mar 30, 2024

I made a quick PoC, see https://github.com/grandizzy/foundry/tree/metrics/crates/metrics/examples#metrics-demo for reference (Code adds a metrics crate, there's no config yet, cheatcodes not in their own metrics group and better UX needed - for a quick view of code changes pls see grandizzy@919e84b)
The PoC use OTEL collector to record metrics simultaneously in a file and by sending them to a local carbon endpoint, then showed in Grafana dashboard

  • metrics recorded are: number of times a selector was called, count of reverts (per revert type) during campaign, addresses fuzzed in campaign and number of times they were used (per selector) using vm.incrementMetrics(label) cheatcode
  • statistics are displayed in Grafana dashboard as

grafana

List of exporters that can be used by otel config file can be found here

I see three use cases / dimensions that can be accomplished by having such

  1. Basic, understand how campaigns are performing
  • case 1: dumping metrics in file for campaign review
  • case 2: writing to a backend and visualize campaign statistics (carbon/Grafana, Prometheus, others, see exporters)
  1. Actions performed based on collected test metrics:
  • case 3: integrate in a CI pipeline and stop/restart campaign if it violates certain thresholds (number of unique values less than threshold, number of reverts greater than threshold, times a selector was hit less/greater than threshold). Ex: export to kafka, AWS Kinesis, others, see exporters
  1. Adapt fuzzing campaigns based on collected metrics:
  • if PR feat(fuzz): ability to declare fuzz test fixtures #7428 accepted it will introduce the concept of test fixtures / data sets used in fuzzing, in first phase as inline forge-config: fixture config
  • next step for fixtures would be to have cheatcodes for loading test data sets from file/URL
  • then metrics can be used to provide custom input for fuzzing campaigns

For example: metrics are collected and exported to AWS Kinesis, or Apache Kafka, etc. then campaign metrics are processed and

  1. saved in persistent layer (S3, etc) for historical analysis
  2. campaign metrics are used to generate new datasets that will be used by further campaigns. Rules to apply could be to ensure values are unique across several campaigns or repetable to some extent or new datasets derived from already used values, etc.

any feedback appreciated, thank you

@grandizzy
Copy link
Collaborator

grandizzy commented Apr 3, 2024

@Evalir we can continue discussion here, here a quick overview of what the changes to have such metrics would be master...grandizzy:foundry:metrics see also prev comments re sample and other use cases this could be used at.
Wanted to get your thoughts first before polishing code and issuing a PR as it introduce some new deps (if not comfortable with due to supply chain attack events, it can be build / cfged on demand). If all good I can do the PR + update foundry book with how can be used / examples.

@grandizzy
Copy link
Collaborator

Adding on OpenTelemetry adoption / usage - Grafana just announced their open source collector (Grafana Alloy) https://grafana.com/blog/2024/04/09/grafana-alloy-opentelemetry-collector-with-prometheus-pipelines/

@zerosnacks zerosnacks added the A-testing Area: testing label Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Area: testing C-forge Command: forge T-feature Type: feature
Projects
Status: Todo
Development

No branches or pull requests

9 participants