feat(forge): Add internal metrics capability #3607

lucas-manuel · 2022-11-03T20:57:49Z

Component

Forge

Describe the feature you would like

When running invariant tests with an actor-based pattern, there is currently a lack of visibility for:

How many times a function is called
What code paths within the function are reached
Other general metrics (e.g., how many LPs deposit total)

Having a cheatcode that would allow storing this information with arbitrary keys would be useful to be able to render this info in a summary table at the end of a fuzzing campaign.

@gakonst mentioned [prometheus] metrics as a good reference for this.

Examples:

vm.counter("numberOfLps", numberOfLps++);

vm.counter(string.concat("loan_", vm.toString(loanNumber), "_payments"), payments[loan]);

Additional context

No response

gakonst · 2022-11-04T01:07:25Z

cc @onbjerg @mattsse I'm still a bit abstract on this but I was thinking of exposing an API via cheatcodes similar to https://docs.rs/metrics/latest/metrics/, and on the CLI we would collect all these metrics and custom log them in a table

gakonst · 2022-11-04T01:09:25Z

@lucas-manuel can you give an example of how your ideal reporting would look like? a table? something else? maybe there should be plots like this https://docs.rs/tui/latest/tui/widgets/struct.Chart.html for stuff on how it evolves over time (e.g. price as volatility happens)? makes me think that this may allow creating automatic simulation / stress test reports like Gauntlet Network does.

lucas-manuel · 2022-11-04T02:47:01Z

@gakonst Yeah personally I think the lowest hanging fruit would be to log in a table similar to --gas-report. I'd mainly be using this for debugging purposes so it'd be useful as a numerical log output.

Going forward though for the more sophisticated use cases we discussed it would be interesting to export to some sort of JSON that could be used to generate more visual reporting (could be added to CI as an artifact for example).

gakonst · 2022-11-04T19:34:31Z

@FrankieIsLost @transmissions11 had some thoughts on this which I'd love if they shared in the thread :)

transmissions11 · 2022-11-05T04:25:50Z

Great idea to give users programmable insight into their invariant campaigns, been a big advocate of this for a small while now.

IMO giving devs a better understanding of the coverage of their invariant tests and tools to effectively debug them is far more valuable than building smarter fuzzers after a certain point, because once a weakness is identified its not too hard to guide the fuzzer towards it, as opposed to a genius fuzzer thats a total black box from a dev's perspective, which offers little insight into how secure a piece of code is and whether a dev can be confident in the coverage of the run. Humans and fuzzers should work tandem!

In terms of actual design:

A cheatcode to enter data into basically a time series database a la prometheus seems like a great way to go
Ideally we export tables by default, but also provide the option for timestamped JSON, etc.
- Timestampped JSON export would enable auditors and such to generate charts like this (ref: Gearbox Fuzzing):
Would be nice to have metric types beyond just counter. For inspiration, prometheus offers 3 core types:
- Gauge
- Counter
- Histogram
For tracking things like number of lps, I think it would be easiest on devs if we added a postCall hook of sorts (imagine setUp but called after every random call in an invariant run), where devs can just bulk query and update all their storage variables
However, there are still situations where calling the cheatcode dynamically in a test would be preferred (like tracking details about a specific mapping key which is hard to identify retroactively), so good to have the option to call dynamically in tests and contracts as well
One other thing we might do to make it easier on devs to track a bunch of data on their invariant runs would be some sort of cheatcode or flag to auto assign a counter/gauge/histogram to a function, state variable, etc
- That way if you wanted to check how many times a loan was originated, etc, you wouldn't necessarily have to write a separate contract instrumented will calls to vm.counter()
- Maybe we should just pass the address called on and the calldata passed to the postCall hook and that would enable something like this? I think there's some interesting design space here
- Some will certainly still choose to instrument, or do echinda style wrappers, but I think its nice to provide the quick and dirty option for devs who dont require such granularity
Finally, for the use case where devs want to instrument their contracts with calls to these counters and cannot read storage vars from test files:
- We need to provide a way to get the current value of a guage/counter/etc
- API could look something like this:
  - vm.incrementCounter("numberOfLps", 1)
  - vm.setCounter("numberOfLps", vm.readCounter("numberOfLps") + 1)
- IMO this API is also just generally less of a PITA than maintaining your own storage variables.
- Furthermore, isn't test state wiped from time to time during invariant campaigns? This has the benefit of enabling persistent state throughout a long campaign.

WDYT?

horsefacts · 2022-11-13T23:07:46Z

In addition to these more complex metrics, it would be very helpful to see a breakdown of calls/reverts by target contract + selector. For example, something like:

╭───────────────────────────────────────────┬─────────╮─────────╮
│ test/actors/Swapper.sol:Swapper contract  ┆         │         │
╞═══════════════════════════════════════════╪═════════╡═════════╡
│ Function Name                             ┆ calls   │ reverts │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤╌╌╌╌╌╌╌╌╌┤
│ swapGooForGobblers                        ┆ 1024    │ 612     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤╌╌╌╌╌╌╌╌╌┤
│ swapGobblersForGoo                        ┆ 1024    │ 12      │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤╌╌╌╌╌╌╌╌╌┤
│ swapGobblersForGobblers                   ┆ 1024    │ 64      │
╰───────────────────────────────────────────┴─────────╯─────────╯

This would be very helpful for writing and debugging new actor contracts.

lucas-manuel · 2022-11-15T20:55:03Z

@transmissions11

Love that Gearbox chart 👀

Yeah I agree that it would be better to design this without the need to persist storage within the contracts anywhere for this purpose. I like the suggestions of incrementCounter and setCounter. I also like the idea of a postCall hook, we've been hacking that together with modifiers inside the actors contracts but would definitely be better to have an official framework for it haha.

What are our next steps here?

I also completely agree with the summary table idea @horsefacts

odyslam · 2022-11-16T18:17:18Z

Second gakonst for using Prometheus as the engine/standard and then we can either:

output charts using some charting library
serve them via an endpoint that can be scraped by Prometheus and visualize them/analyze them in Grafana.

PaulRBerg · 2023-01-26T19:55:53Z

@lucas-manuel Until this issue gets implemented, do you know any way to output the call summary only at the end of the invariant tests run?

I saw that in your invariants-example repo, you have defined this function:

function invariant_call_summary() external view {
    console.log("\nCall Summary\n");
    ///
}

But, if I understand this correctly, this test function will be executed at the end of each run, which will slow down test runs.

Is there any way to output the call summary at the end of the invariant tests run? As ar as I know, there is no "after all" hook in Foundry (a function opposite to setUp), but maybe there is another way?

gakonst · 2023-01-27T22:44:34Z

But, if I understand this correctly, this test function will be executed at the end of each run, which will slow down test runs.

Yeah it'll slow it down, but a bit only I would assume. We should probably add that kind of function invariant_summary() hook as an "after all".

PaulRBerg · 2023-01-28T10:15:50Z

It depends on how many console.log statements there are .. we have quite a few and the run time is noticeably slower with the invariant summary function defined. Ended up renaming it to start with something other than invariant_ so that we can selectively activate it when we want to get a report.

hook as an "after all"

Would be super helpful.

grandizzy · 2024-03-27T12:20:33Z

considering @transmissions11 comment -

I think a nice and easy way to have such metrics is by using OpenTelemetry (OTLP) open source standard for logs, metrics, and traces as it already provides crates to facilitate such integration, see rust bindings and crates

A big pro of this is that we can easily integrate with forge and support not only Prometheus but other tools at the same time by

using tracing_opentelemetry crate to collect custom forge test events (metrics layer export OTLP counter and histogram metrics from specific test events)
use opentelemetry-otlp exporter to push metrics to a collector (that is scraped after by tools like Prometheus), or directly to receivers like Jaeger / Prometheus / others

Didn't put too much thoughts on UX but at a first call there'll be

a cheatcote vm.createTestEvent() to record test data (which use event! macro to publish a new forge test event)
the metrics subscriber which receives the event and convert to an opentelemetry metric
the periodic metrics reader which exports in bulk to OTLP endpoint through metrics exporter

lmk what you guys think about this approach, think a PoC with such can be done quite easy but wouldn't spend time on it if not of interest. thanks

grandizzy · 2024-03-30T15:17:38Z

I made a quick PoC, see https://github.com/grandizzy/foundry/tree/metrics/crates/metrics/examples#metrics-demo for reference (Code adds a metrics crate, there's no config yet, cheatcodes not in their own metrics group and better UX needed - for a quick view of code changes pls see grandizzy@919e84b)
The PoC use OTEL collector to record metrics simultaneously in a file and by sending them to a local carbon endpoint, then showed in Grafana dashboard

metrics recorded are: number of times a selector was called, count of reverts (per revert type) during campaign, addresses fuzzed in campaign and number of times they were used (per selector) using vm.incrementMetrics(label) cheatcode
statistics are displayed in Grafana dashboard as

List of exporters that can be used by otel config file can be found here

I see three use cases / dimensions that can be accomplished by having such

Basic, understand how campaigns are performing

case 1: dumping metrics in file for campaign review
case 2: writing to a backend and visualize campaign statistics (carbon/Grafana, Prometheus, others, see exporters)

Actions performed based on collected test metrics:

case 3: integrate in a CI pipeline and stop/restart campaign if it violates certain thresholds (number of unique values less than threshold, number of reverts greater than threshold, times a selector was hit less/greater than threshold). Ex: export to kafka, AWS Kinesis, others, see exporters

Adapt fuzzing campaigns based on collected metrics:

if PR feat(fuzz): ability to declare fuzz test fixtures #7428 accepted it will introduce the concept of test fixtures / data sets used in fuzzing, in first phase as inline forge-config: fixture config
next step for fixtures would be to have cheatcodes for loading test data sets from file/URL
then metrics can be used to provide custom input for fuzzing campaigns

For example: metrics are collected and exported to AWS Kinesis, or Apache Kafka, etc. then campaign metrics are processed and

saved in persistent layer (S3, etc) for historical analysis
campaign metrics are used to generate new datasets that will be used by further campaigns. Rules to apply could be to ensure values are unique across several campaigns or repetable to some extent or new datasets derived from already used values, etc.

any feedback appreciated, thank you

grandizzy · 2024-04-03T14:34:54Z

@Evalir we can continue discussion here, here a quick overview of what the changes to have such metrics would be master...grandizzy:foundry:metrics see also prev comments re sample and other use cases this could be used at.
Wanted to get your thoughts first before polishing code and issuing a PR as it introduce some new deps (if not comfortable with due to supply chain attack events, it can be build / cfged on demand). If all good I can do the PR + update foundry book with how can be used / examples.

grandizzy · 2024-04-09T19:18:27Z

Adding on OpenTelemetry adoption / usage - Grafana just announced their open source collector (Grafana Alloy) https://grafana.com/blog/2024/04/09/grafana-alloy-opentelemetry-collector-with-prometheus-pipelines/

lucas-manuel added the T-feature Type: feature label Nov 3, 2022

lucas-manuel mentioned this issue Nov 3, 2022

meta: invariant testing features #3412

Closed

26 tasks

rkrasiuk added A-cheatcodes Area: cheatcodes C-forge Command: forge labels Nov 7, 2022

mds1 mentioned this issue Jan 26, 2023

feat(forge): invariant tests instrumentation with function call summaries #4186

Closed

PaulRBerg mentioned this issue Feb 3, 2023

feat: Add invariant testing foundry-rs/book#760

Merged

lucas-manuel mentioned this issue Feb 7, 2023

feat: Add "after-all" hook for testing #4300

Closed

PaulRBerg mentioned this issue Feb 14, 2023

Lower the invariant test run times sablier-labs/v2-core#338

Closed

mds1 mentioned this issue Feb 28, 2023

meta(invariants): tracking issue for improvements to invariant testing #4438

Open

PaulRBerg mentioned this issue May 23, 2023

Invariant tests function breakdown #5011

Closed

grandizzy mentioned this issue Mar 5, 2024

feat(invariants): real-time runs counter #7302

Closed

grandizzy mentioned this issue Apr 3, 2024

fix: do not flood dictionary with data dependent on fuzz inputs #7552

Merged

grandizzy mentioned this issue Jun 8, 2024

meta(fuzzer): tracking issue for fuzzer improvements #8076

Open

zerosnacks added this to the v1.0.0 milestone Jul 26, 2024

zerosnacks removed the A-cheatcodes Area: cheatcodes label Aug 2, 2024

zerosnacks added the A-testing Area: testing label Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(forge): Add internal metrics capability #3607

feat(forge): Add internal metrics capability #3607

lucas-manuel commented Nov 3, 2022 •

edited

Loading

gakonst commented Nov 4, 2022

gakonst commented Nov 4, 2022 •

edited

Loading

lucas-manuel commented Nov 4, 2022 •

edited

Loading

gakonst commented Nov 4, 2022

transmissions11 commented Nov 5, 2022 •

edited

Loading

horsefacts commented Nov 13, 2022

lucas-manuel commented Nov 15, 2022 •

edited

Loading

odyslam commented Nov 16, 2022

PaulRBerg commented Jan 26, 2023

gakonst commented Jan 27, 2023

PaulRBerg commented Jan 28, 2023

grandizzy commented Mar 27, 2024 •

edited

Loading

grandizzy commented Mar 30, 2024 •

edited

Loading

grandizzy commented Apr 3, 2024 •

edited

Loading

grandizzy commented Apr 9, 2024

feat(forge): Add internal metrics capability #3607

feat(forge): Add internal metrics capability #3607

Comments

lucas-manuel commented Nov 3, 2022 • edited Loading

Component

Describe the feature you would like

Additional context

gakonst commented Nov 4, 2022

gakonst commented Nov 4, 2022 • edited Loading

lucas-manuel commented Nov 4, 2022 • edited Loading

gakonst commented Nov 4, 2022

transmissions11 commented Nov 5, 2022 • edited Loading

In terms of actual design:

horsefacts commented Nov 13, 2022

lucas-manuel commented Nov 15, 2022 • edited Loading

odyslam commented Nov 16, 2022

PaulRBerg commented Jan 26, 2023

gakonst commented Jan 27, 2023

PaulRBerg commented Jan 28, 2023

grandizzy commented Mar 27, 2024 • edited Loading

grandizzy commented Mar 30, 2024 • edited Loading

grandizzy commented Apr 3, 2024 • edited Loading

grandizzy commented Apr 9, 2024

lucas-manuel commented Nov 3, 2022 •

edited

Loading

gakonst commented Nov 4, 2022 •

edited

Loading

lucas-manuel commented Nov 4, 2022 •

edited

Loading

transmissions11 commented Nov 5, 2022 •

edited

Loading

lucas-manuel commented Nov 15, 2022 •

edited

Loading

grandizzy commented Mar 27, 2024 •

edited

Loading

grandizzy commented Mar 30, 2024 •

edited

Loading

grandizzy commented Apr 3, 2024 •

edited

Loading