Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transaction ms per computation metrics #4615

Merged
merged 4 commits into from
Nov 16, 2023

Conversation

janezpodhostnik
Copy link
Contributor

This adds a new histogram for the ratio of time spent on a transaction vs computation used.

The metric is normalised by the original normalisation factor used when the computation weights were decided, which is 9999 computation per 200 milliseconds. Which means the histogram should be a bell curve around 1 (in reality its not, its closer to 3, but that is why we need to recalibrate the weights at some point).

This histogram will tell us how well our mathematical model for computation is performing.

We already have enough info in the logs to observe this information, but its really resource intensive trying to draw graphs from data from logs.

@codecov-commenter
Copy link

codecov-commenter commented Aug 10, 2023

Codecov Report

Merging #4615 (6955c43) into master (2528190) will increase coverage by 3.98%.
Report is 493 commits behind head on master.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #4615      +/-   ##
==========================================
+ Coverage   56.25%   60.23%   +3.98%     
==========================================
  Files         653       51     -602     
  Lines       64699     6579   -58120     
==========================================
- Hits        36396     3963   -32433     
+ Misses      25362     2125   -23237     
+ Partials     2941      491    -2450     
Flag Coverage Δ
unittests 60.23% <ø> (+3.98%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 600 files with indirect coverage changes

@janezpodhostnik janezpodhostnik force-pushed the janez/ms-per-computation-metrics branch from a2b2127 to 6955c43 Compare August 10, 2023 18:00
@@ -28,6 +28,10 @@ const DefaultTransactionExpiryBuffer = 30
// DefaultMaxTransactionGasLimit is the default maximum value for the transaction gas limit.
const DefaultMaxTransactionGasLimit = 9999

// EstimatedComputationPerMillisecond is the approximate number of computation units that can be performed in a millisecond.
// this was calibrated during the Variable Transaction Fees: Execution Effort FLIP https://github.com/onflow/flow/pull/753
const EstimatedComputationPerMillisecond = 9999.0 / 200.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this estimated as MaxTransactionGasLimit / AverageDurationForTransactionWithMaxGas?

If the accuracy is not that important, why not 10000 / 200 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this estimated as MaxTransactionGasLimit / AverageDurationForTransactionWithMaxGas?

yes. The AverageDurationForTransactionWithMaxGas is an old measurement (the one done during the calibration of the execution weights)

If the accuracy is not that important, why not 10000 / 200 ?

Its not that important, but I do't know if we benefit from rounding here.

@@ -405,6 +406,14 @@ func NewExecutionCollector(tracer module.Tracer) *ExecutionCollector {
Buckets: []float64{50, 100, 500, 1000, 5000, 10000},
})

transactionNormalizedTimePerComputation := promauto.NewHistogram(prometheus.HistogramOpts{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add comments about how to read this metrics?

Say the EstimatedComputationPerMillisecond is 50, then a transaction_computation_per_ms more than 1 means the EN took less time to finish executing this tx?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment

module/metrics/execution.go Outdated Show resolved Hide resolved
@janezpodhostnik janezpodhostnik force-pushed the janez/ms-per-computation-metrics branch from 6955c43 to 8bfccfc Compare August 16, 2023 19:17
module/metrics/execution.go Outdated Show resolved Hide resolved
if compUsed > 0 {
// normalize so the value should be around 1
ec.transactionNormalizedTimePerComputation.Observe(
(float64(dur.Milliseconds()) / float64(compUsed)) * flow.EstimatedComputationPerMillisecond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't quite get this math. According to this comment "Value below 1 means the transaction was executed faster than estimated (is using less resources then estimated)"

So if EstimatedComputationPerMillisecond is 50, it means normally, running for 10ms should use 500 gas. For the metrics to be less than 1, the actual compUsed should be higher than 500, but in that case, isn't it uses more resource instead of less resource?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the Variable execution effort flip the execution effort weights were set so that a transaction with max computation (9999 c) would run for 200 ms. and that is the EstimatedComputationPerMillisecond (=9999 c/ 200 ms).

If a transaction runs for 10 ms and uses 500 computation that means NormalizedTimePerComputation = (10 ms / 500 c) * ( 9999 c / 200 ms) = 0.9999 which means its was run as fast as we would expect it to run. A transaction using 500 c but running 20 ms would have a NormalizedTimePerComputation of about 2 which means it took twice as long as expected. But if it ran wit 5 ms it would have a value of 0.5 which would mean it ran twice as fast as expected.

For the metrics to be less than 1, the actual compUsed should be higher than 500, but in that case, isn't it uses more resource instead of less resource?

the amount of time spent on the transaction is the resource here. So if a transaction ran for 10 ms that is the resources it used. Computation used is an estimate for time spent. We estimate transactions that run 10 ms to be worth about 500 computation. If instead the transaction was worth 1000 computation than we ran it faster then expected (it was estimated to run for 20ms). If the transaction was worth 250 computation than it ran slower than expected (it was estimated to run 5ms).

Co-authored-by: Leo Zhang <zhangchiqing@gmail.com>
@@ -739,6 +749,11 @@ func (ec *ExecutionCollector) ExecutionTransactionExecuted(
ec.transactionExecutionTime.Observe(float64(dur.Milliseconds()))
ec.transactionConflictRetries.Observe(float64(numConflictRetries))
ec.transactionComputationUsed.Observe(float64(compUsed))
if compUsed > 0 {
// normalize so the value should be around 1
ec.transactionNormalizedTimePerComputation.Observe(
Copy link
Contributor

@sideninja sideninja Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the EstimatedComputationPerMillisecond will be recalibrated will the historic data be invalid compared to the new data? Will there be a new histogram created at that point? What I'm trying to get across is that if that value is changed and then we track historic data we might think we improved the computation but in fact we just changed the weights. Not sure if that's relevant tho since I don't have context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes EstimatedComputationPerMillisecond does depend on how we do the calibration. When it changes we have to keep that in mind when comparing old transactions and new transactions.

when comparing transaction performance we should Ideally compare actually transaction execution time over comparing computation, because computation is just a model for execution time.

Copy link
Contributor

@sideninja sideninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@janezpodhostnik janezpodhostnik added this pull request to the merge queue Nov 15, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Nov 15, 2023
@janezpodhostnik janezpodhostnik added this pull request to the merge queue Nov 16, 2023
Merged via the queue into master with commit fa477d9 Nov 16, 2023
83 checks passed
@janezpodhostnik janezpodhostnik deleted the janez/ms-per-computation-metrics branch November 16, 2023 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Execution Cadence Execution Team Improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants